Akash's Blog

Tuesday, February 25, 2025

System Design Series: Resilience

Introduction

Resilience can be defined as an ability of the system to recover from the failures, disruptions or any kind of event which impacts the proper functioning.

In real world, any system is likely going to fail once a while due to various known or unknown reasons. We need to see what we can do to make our system get over with such situation, how we can avoid any harm to the system and how we can get back to the business quickly.

Approach

Resilience can be achieved using different approaches listed below,

Fault Tolerance
System continues to work even if any software or hardware fails (fully / partially).

e.g. In load balancing if one server fails, another server is created or load is distributed in remaining instnaces.

Redundancy
Redundancy or duplication ensures backup and also helps in recovering from failure quickly.

e.g. Create another instance using database replication and use it if original instance fails.

Monitoring

Continuos tracking and monitoring helps in early detection of problems.

e.g. Continuously track the system to ensure expected health and take automatic actions or alert if given health criteria is not met.

Disaster Recovery

Restore the system back after any disaster.

e.g. Regular back ups to avoid data loss or keep it to minimum.

Self Healing

System itself can automatically correct it if any kind of issues.

e.g. Automatic scaling of AWS, if one instance fails automatically create new instance and transfer trafic of unhealthy one to new instance.

Tuesday, February 18, 2025

System Design Series: Availability and Consistency

Introduction

We wish to build a system which is always working and giving accurate responses. However, in world of distributed systems it's not achievable, we have to trade-off based on the need of the user and business.

For banking system you must prioritise data consistency over availability, it's okay if it's not available for few minutes but it's not at all acceptable if it gives inaccurate results. In contrary, tiktok must be highly available otherwise people may loose interest but it's okay if user don't see newly uploaded video immediately.

We will understand why we have to trade-off and why can't we have both using the CAP theorem.

CAP Theorem

CAP stands for,

Consistency : Each read receives latest write.
Availability : Each requests receives non-error response, not necessarily latest data.
Partition Tolerance : Continue to work even if communication failure between nodes.

CAP Theorem states that we can achieve only two of these in distributed systems. Based on our requirement we need to trade-off one of these to achieve the desired results.

We can have any of these system according to the CAP Theorem,

CP : Consistent and Partition Tolerant.
AP : Available and Partition Tolerant.
AC : Available and Consistent (Not Possible in Distributed Environment.)

In distributed environment, it's technically impossible to be available all the time and return latest data on all reads, because if network partition (communication failure) happens, system has two choices, either it can fail (or return error) or return stale data which ultimately breaks the cosistency law.

There are different patterns for consistency and availability, which are listed below.

Consistency Patterns

Weak Consistency: Write may or may not be seen by reads. (e.g. Video Call)
Eventual Consistency: Write will be soon visible to reads. (e.g. Email)
Strong Consistency: Write is immediately visible. (e.g. File System)

Availability Patterns

Replication: Replicate data in additional component using Master-Master or Master-Slave setup.
Fail-over: Stand by instance to take over if original instance fails using Active-Active or Active Passive set up.

Availability of system is defined using percentage. For example, system is 99.9% available which is said to be availability of three 9's. If system is 90% available, which roughly mean that in a year it won't be available for ~ 36.5 days, in month it won't be available for ~3 days, in a day it won't be available for ~2.4 Hours.

Tuesday, February 11, 2025

System Design Series: Scalability

Introduction

Scalability means system's capability of handling more work.

Consider an example of a website where there is one server handling 1000 users. Due to some reason more and more users are opening the website and suddenly the number of users increased to 50,000. However, the system is not capable of handling more than 50,000 users. In this case if load further increases, the system will fail to serve the users, it may crash.

We can solve this problem primarily in two ways. We can either scale the system Vertically or Horizontally.

Vertical Scaling

When we increase the capacity of the server to handle more work is considered as vertical scaling.

Vertical Scaling

Pros
Easy to maintain as there will be fewer components in the system.
Cons
Capacity increase comes with additional cost which increases rapidly for large scale systems.
There is an upper limit till which only you can scale.
Single Point of Failure.

Horizontal Scaling

When we increase the number of servers to handle increasing work, is considered as horizontal scaling. By increasing the number of servers we can (evenly) distribute the load amongst these servers to handle more work using the load balancer.

Horizontal Scaling

Pros

Can solve single point of failure problem.
Highly scalable and available.
Comparatively cheaper as few small servers are cheaper compared to one high end server.

Cons

Added complexity in deployment and maintenance.

Monday, February 10, 2025

Devoxx Belgium 2024 : Java Performance Update

Introduction

A high level summary and commentry on the aforementioned Java 24 performance update video. Java is not just updating it's performance and usability in general but moving way faster year on year. The Java 25 is likely to be released somewhere in 2025.

This video is of last year Devoxx but recently posted on Java Youtube channel and as a curios Java follower I found it very interesting, thus I decided to write about it on my blog. This video initially gives summary of Java projects, performance metrics, challenges faced in performance and then shared the recent and future performance improvements. If you don't want to get into initial summary and focus on recent performance improvements you can start from here.

Ongoing Java Projects

Speaker summarized the ongoing Java projects which are listed below,

Amber : Small Java Features.
Babylon: Extend reach of Java to SQL, ML Models and GPUs.
Leyden: Improve start up time and reduce memory foot print.
Lilliput: Reduce size of Java Object Header on 64 bit architecture.
Loom: Lightweight concurrency.
Panama: Integration between Java and system level programming languages.
Valhalla: Augmenting java object model with value objects.

Metrics / Challenges / Tools

In this section speaker briefed about the different metrics one need to consider while checking for performance. Performance needs to be looked from wholestic view which must include usage of memory, CPU, threads, cache and power. Also, start up and warm up time needs to be considered while measuring the performance.

While talking about challenges speaker emphasized that we can not get accurate performance statistics on our laptops it's better to do it on dedicated server post warm up when application is ready to serve. That too one should not consider the result after very few runs, performance results needs to be collected after thousands of executions. Also, suggested to use System::nanoTime instead of System::currentTimeMillis for better accuracy while checking the performance.

Speaker mentioned about JMH (Java Microbenchmark Harness) which can be used to get micro level benchmarking using @Benchmark annotation. He further talks about the internal tools which Java team uses to check the benchmarks in different platforms which gives them an idea of how much the performance improve or degraded in respective platform.

Performance Improvements

JDK-8318446: C2 - "MergeStores"

This is an internal fix to the Java Hotspot Compiler (C2), improvement in terms of array store operations.

Array store operation refers to process of writing values to the array index. Earlier the compiler was merging primitive stores such a way which was causing assertion failures and even incorrect optimizations.

Using Unsafe and BALE (Internal to JDK, jdk.internal.util.ByteArrayLittleEndian) were alternatives but both are not good considering Unsafe is something we should move away from and BALE may slow down things.

JDK-8340821: FFM API Bulk Operations

This includes performance enhancement in Foreign Function & Memory (FFM) API. The basic idea was to handle specific memory segments code directly in Java instead of using native code. By reducing the overhead of converting native code to Java.

JDK-8180450: Secondary Super Cache Scaling

This was a bug in secondary super cache. Specifically in instanceof operation when multiple threads frequently checks the type in succession the cache was becoming unstanble and overall performance was hammered in rare workloads.

JDK-8336856: String Concatenation

Before this enhancement Java would do extra calculations especially with different data types like int, long, double etc. which was slowing down the overall concatenation. After this change Java has reduced unneccessary work to improve the overall performance.

JEP-474: ZGC

This change in ZGC (Z Garbage Collector) aims to enhance performance by focusing on generational garbage collection, which manages short-lived and long-lived objects more efficiently. Moving in a direction to make generational GC mode as default.

JEP-450: Compact Header

Reduce object header size to save ~20% memory. Considering, approx header sizes are around 12-16 bytes and object sizes are 32-64 bytes.

Summay

This video was having many things which were new to me. It helped me in getting more understanding of internal workings and progress happening around the performance aspect. I tried to simplify these updates as much as possible. I may dig down into each going forward for more in depth learning and better understanding.

Apparently, there are other improvements also mentioned which will be the focus for upcoming months on top of these. These updates may look abstract and at first, may sound not making huge impact but by taking a close look it's moving in a direction of high optimization and efficiency.

Sunday, February 9, 2025

Three at Thoughtworks

Introduction

It's been three years since I became a part of Thoughtworks. I wanted to write about my experience after my first work anniversary, but I felt one year was too little and five years too much to reflect on my journey in an organization. So, I decided to write about it at the sweet spot between one and five years.

Sometimes, I regret not joining this company earlier in my career. Perhaps I wouldn't have made it back then, but anyway, I am here now and truly enjoying my journey of growth. This post is not about claiming whether Thoughtworks is a good or bad place to work—rather, it's about my journey in an organization that has been in the tech industry for more than three decades. These are my personal views and experiences, which may differ from others working at Thoughtworks, but I hope many will find them relatable.

Life Before Thoughtworks

Before joining Thoughtworks, I worked in two organizations—one service-based and the other product-based. In both, I learned a lot and kept improving day by day.

I feel my first organization, SPEC India, played a key role in shaping my interest in programming. I am always thankful to them for their support and guidance—without those early experiences, I wouldn't have found my direction. Due to certain personal reasons, I had to switch to Infostretch (QMetry), which is now acquired by SmartBear. Here, I developed a better understanding of software products and gained exposure to client communication and leadership.

By the time I considered joining Thoughtworks, I believed I knew a lot. My confidence, or rather ego, kept thriving. I was doing things well, but I was doing them the way my previous organizations expected me to. You may ask, what's wrong with that? The answer is nothing, but at the same time, not everything was right either.

The Interview Process

When I decided to switch job in 2021, I applied to multiple companies, including Thoughtworks. The interview process was tiring, to be honest, but things are a bit different now as per my knowledge. It took me five rounds to get through, starting with:

Code assignment
Code pairing
Two technical rounds to assess depth and breadth
Leadership and cultural alignment

The interviews challenged my core skills and tested my abilities to the limit, making me realize how little I knew and how much more I should know. The expectation was not to answer everything correctly, but I struggled when I couldn't answer multiple questions—especially when my depth and breadth of knowledge were tested. I wouldn't say I completely failed, but I definitely didn't succeed with flying colors.

After each round, I received detailed feedback from the recruiter, helping me understand my standing. I remained somewhat positive about making it through, and eventually, I did. I received the offer and accepted it without any negotiations.

I joined Thoughtworks on December 31, 2021, keeping my promise to myself that I wouldn't be in the same company by the end of 2021.

Life at Thoughtworks

Journey So Far

My journey at Thoughtworks has been filled with learnings and challenges, yet I feel like it's just the beginning. I'm on a path to becoming a better professional, and Thoughtworks is playing a key role in shaping that.

Immersion

Immersion is a four-day program that every new Thoughtworker attends. It gave me a glimpse of the Thoughtworks way of working, and I got to learn from the journeys and experiences of other Thoughtworkers. The program prepared me for the role of a consultant and helped me understand Thoughtworks' culture.

It may sound like a formal, instruction-heavy program, but it was fun, interactive, and filled with games, insights, and refreshing conversations, making it very engaging.

Dev Bootcamp

The Dev Bootcamp was organized for all new developers to introduce them to TDD, Pair Programming, XP Practices, and Trunk-Based Development. The trainers handheld us through the process, ensuring we understood the whys, whats, and hows really well.

Culture

As I started interacting more, I realized that multiple things are deeply rooted in the Thoughtworks culture. People willingly follow these principles, such as:

Being the sailor of your own ship of ambitions
Being open to giving and receiving feedback
Being ready to take on different roles

However, coming from a different work environment, I initially felt a slowdown in productivity. I was doing less work compared to my previous organizations, but my quality of work and engagement gradually improved. The culture here expects you to balance work and continuous learning.

Community

I'm not sure if I can go into too much detail in this post, but I can summarize my experience.

Thoughtworks has several thriving communities covering technical and non-technical skill sets. Anyone is free to join and contribute. It’s not just about learning new things—it’s also about collaborating with like-minded people.

Apart from learning, these communities help you solve real-world problems. If you ever face challenges on a project, you can reach out to the global Thoughtworks community for help.

People

Since joining, I’ve had the opportunity to work with and learn from many talented people. I've gained insights from their perspectives and also helped others to whatever extent I could.

The people at Thoughtworks are not fundamentally different, but the environment they are placed in brings out their best potential. The feedback culture here helps maintain team health and improve collaboration.

Overall, I have found the people to be cooperative, inclusive, and supportive so far.

Work

The work at Thoughtworks can be challenging, depending on the assignment. I struggled to cope with the diverse tech stacks and the variety of domains I had to study.

Each project was unique and left a different impression on me. The pace is fast, and you are continuously challenged to be adaptable and flexible.

Summary

Thoughtworks has changed the way I perceive technology and consulting. Although I faced challenges adapting, the company provides access to resources and people that help you find solutions faster through connections and community.

That said, Thoughtworks is not a perfect company—no company is. There are ups and downs, just like anywhere else. However, one commendable aspect is its strong stance on diversity, inclusion, and equality, which has been a core value from the very beginning.

Labels

Archive

Tuesday, February 25, 2025

System Design Series: Resilience

Introduction

Approach

Tuesday, February 18, 2025

System Design Series: Availability and Consistency

Introduction

We wish to build a system which is always working and giving accurate responses. However, in world of distributed systems it's not achievable, we have to trade-off based on the need of the user and business.

We will understand why we have to trade-off and why can't we have both using the CAP theorem.

CAP Theorem

Consistency Patterns

Availability Patterns

Tuesday, February 11, 2025

System Design Series: Scalability

Introduction

Vertical Scaling

When we increase the capacity of the server to handle more work is considered as vertical scaling.

Vertical ScalingProsEasy to maintain as there will be fewer components in the system.ConsCapacity increase comes with additional cost which increases rapidly for large scale systems.There is an upper limit till which only you can scale.Single Point of Failure.

Horizontal Scaling

Monday, February 10, 2025

Devoxx Belgium 2024 : Java Performance Update

Introduction

Ongoing Java Projects

Metrics / Challenges / Tools

Performance Improvements

Summay

Sunday, February 9, 2025

Three at Thoughtworks

Introduction

Life Before Thoughtworks

The Interview Process

Life at Thoughtworks

Journey So Far

Immersion

Dev Bootcamp

Culture

Community

People

Work

Summary

Vertical Scaling

Pros
Easy to maintain as there will be fewer components in the system.
Cons
Capacity increase comes with additional cost which increases rapidly for large scale systems.
There is an upper limit till which only you can scale.
Single Point of Failure.