Akash's Blog

_

Tuesday, February 18, 2025

System Design Series: Availability and Consistency

Introduction

We wish to build a system which is always working and giving accurate responses. However, in world of distributed systems it's not achievable, we have to trade-off based on the need of the user and business. 

For banking system you must prioritise data consistency over availability, it's okay if it's not available for few minutes but it's not at all acceptable if it gives inaccurate results. In contrary, tiktok must be highly available otherwise people may loose interest but it's okay if user don't see newly uploaded video immediately. 

We will understand why we have to trade-off and why can't we have both using the CAP theorem.

CAP Theorem

CAP stands for,
  • Consistency : Each read receives latest write.
  • Availability  : Each requests receives non-error response, not necessarily latest data.
  • Partition Tolerance : Continue to work even if communication failure between nodes.
CAP Theorem states that we can achieve only two of these in distributed systems. Based on our requirement we need to trade-off one of these to achieve the desired results.

We can have any of these system according to the CAP Theorem,
  • CP : Consistent and Partition Tolerant.
  • AP : Available and Partition Tolerant.
  • AC : Available and Consistent (Not Possible in Distributed Environment.)
In distributed environment, it's technically impossible to be available all the time and return latest data on all reads, because if network partition (communication failure) happens, system has two choices, either it can fail (or return error) or return stale data which ultimately breaks the cosistency law.
There are different patterns for consistency and availability, which are listed below.

Consistency Patterns

  • Weak Consistency: Write may or may not be seen by reads. (e.g. Video Call)
  • Eventual Consistency: Write will be soon visible to reads.  (e.g. Email)
  • Strong Consistency: Write is immediately visible. (e.g. File System)

Availability Patterns

  • Replication: Replicate data in additional component using Master-Master or Master-Slave setup.
  • Fail-over: Stand by instance to take over if original instance fails using Active-Active or Active Passive set up.

Availability of system is defined using percentage. For example, system is 99.9% available which is said to be availability of three 9's. If system is 90% available, which roughly mean that in a year it won't be available for ~ 36.5 days, in month it won't be available for ~3 days, in a day it won't be available for ~2.4 Hours.


↑ Back to Top