C for Consistency : Inconsistencies about “consistency” in data systems

There are countless resources explaining CAPl, there are a lot of confusions around this subject. CAP has everything to do with distributed systems, NoSQL data stores happen to be one type of distributed system but surprisingly CAP has been misunderstood after the popularity of NoSQL data stores. IMHO no one explains these better than these gentlemen Peter Bailis (Stanford), Martin Kleppmann (Cambridge), Kyle Kingsbury (in that order). Most of the concepts I have captured here are lessons from them, some are even quoted directly from their blogs with right attributions.

Ok, Some context…

Consistency is one of the most overloaded terms in database world. It could mean different things to people depending on their background and experience.

  1. Is it the “C” in the CAP  and Terms in Distributed Systems Camp (Linearizability) vs Database Camp (Serializability, strict-serializability, Strong consistency, Weaker consistencies like Eventual Consistency and Strong Eventual Consistency)?
  2. Is it the “C” in ACID ?

 Has no body attempted to demystify this ? No, there has been some attempts. Here are a couple

I am only documenting what I learned from Peter and Martin about the C in CAP and ACID

C in CAP

I had to learn new terminologies, Linearizability: it is a guarantee about single operations on single objectsLet’s take an example and see howLinearizability plays out distributed counter DC, simultaneously writers can increment or decrement and readers can get the latest value. Linearizability guarantees a real-time (i.e., wall-clock) the order of operations done on DC. 

Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write. Linearizability for read and write operations is the “C,” or “consistency” and (is synonymous with the term “atomic consistency in ACID).”

Serializability: it is a guarantee about the serial ordering of transactions or groups of one or more operations over one or more objects. Serializability is the traditional “I,” or isolation, in ACID.

“Combining serializability and linearizability yields strict serializability: transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. For example, say I begin and commit transaction T1, which writes to item x, and you later begin and commit transaction T2, which reads from x. A database providing strict serializability for these transactions will place T1 before T2 in the serial ordering, and T2 will read T1’s write. A database providing serializability (but not strict serializability) could order T2 before T1.”