Wide row data modelling with Apache Cassandra

This is a 4 min read

I have always been intrigued by the performance claims of Apache cassandra. So, I wanted to put the whole “wide rows” and the performance edge claims that wide-row data model said to offer to the test. Rumour has it, Facebook hired ex-Amazon engineers who wrote Dynamo  to build cassandra. Anyways, a sound starting point is to have a nice use-case to test with.  So I took the real world scenario of messenger platforms and thought what if cassandra is tasked to be the source of truth for the entire platform right from user profiles to actual messages. In fact there are several claims by Datastax that chat apps can use cassandra as source of truth¹, after reading these I felt all the more validated about my choice of the use case. Now, more concretely speaking,  I wanted to imagine what if Whatsapp used cassandra. Also, the focus of this post is to only look at the chat messages.

But, the usecase I chose was totally legitimate up until I chose whatsapp. Because whatsapp doesn’t store any of user messages, the messages just ‘pass-thru’ the server to the devices. The local store of the device is the canonical data store for messages. One small variant is, for the chatters offline the messages are encrypted and stored in the server for a period of 30 days before it is discarded².

So the usecase here is contrived just for evaluating the power of cassandra. For avoiding any confusions and since we are building a variant of whatsapp let me call it Whatsapp– (Whatsapp minus minus) for the rest of this post.

Architecture and data model for whatsapp– with cassandra

Since it is a contrived usecase for now lets forget the semantics of how actually whatsapp worked in terms storing the messages and make some assumptions and assertions.

1. Whatsapp– stores all the new chats (individual and group), the messages in cassandra.

2. Cassandra tables (column families) CDC (change data capture), i.e. you can subscribe to the changes i.e inserts, updates, deletes of a particular table. CDC mechanism provides this out of the box, so you don’t need additional writes into distributed pub/sub infrastructure like Kafka. But for cassandra this feature is under development and NOT available as of cassandra 3.7. Check out this user story in JIRA for more details. So essentially you can query these CDC tables, for the filter conditions you have mentioned in WHERE clause if records are exist, it will be returned if not, the query blocks until new records arrive. So there can be server components blocking with queries, if the record arrives you can write it back to the message delivery back bone like GCM or APNs or 3rd party services like AWS SNS. (I understand companies like netflix have solved this problem of low-latency CDC systems from cassandra by tapping into the snapshots of the SST data files of the respective tables, but for sake of simplicity I am suggesting to leverage the yet-to-be released CDC feature of cassandra.BTW, Netflix calls it “Aegisthus” there CDC system for cassandra, I thought this would be an interesting side-bar.)

3. If you say, delete and re-install the app you can still recover chat messages from the server, unlike whatsapp. The same chat message data model should yield to this use case as well in a performant way. I will get to that shortly.

Whatsapp– architecture 

screen-shot-2016-10-01-at-2-41-11-pm

But without further ado here is the heart of the matter, the data model for the live chat and how data is stored. Just to be clear about the chatid , for every new chat i.e when the user hits the new chat icon (highlighted in the below figure) an unique uuid is generated. It gets sent over to the server and the recipient(s) for facilitating the chat. Rest of the columns on the table should be pretty straightforward (assuming you are a whatsapp user). Also, I am sure you would have noticed the flexible data types like lists and maps. Couple of things about the physical storage, this is important because when you are attempting to get the older messages from the server i.e not through CDC mechanism we need a similar user experience. 

screen-shot-2016-10-01-at-8-13-57-pm

img_2059

 

This is where the real power of wide-rows come in. As you see the rowkey of the table is the composite key made out of chatid  and chatdate . There is also a third column mentioned as a part of the “primary key clause” and also in the “clustering order by clause”. This basically means rows are uniquely identifiable by (or partitioned by in cassandra terms) chatid + chatdate and ordered by msgsenttoserver timestamp in descending order³. Also, the choice of the rowkey serves a purpose, any chat (individual or group) may not hit the 2 billion max messages per day. With all these information lets look at how the data is physically stored and be sure it is not stored the way you think.

Below figure shows it. As you can see all records are stored sequentially in disk for a rowkey. In our case a days worth data of a chat can be read with a single disk seek and fetch. You don’t even have to do a order by in memory as the clustering key has helped to store the data in desc order in the disk. In case you are wondering the “timeuuid” data type is a unqiue timestamp to the micro-second precision.

screen-shot-2016-10-01-at-8-02-53-pm

 

..Thats a wrap, I think

Here is the code that I used to load a million dummy messages to a 3-node cluster using node.js. (I understand bulk write will yield a better write performance but the idea of the post is to test how wide row data modelling makes reading easy). I used Instacluster, a managed cassandra provider. I put together a web page with one scrollable text area, queried the chatlive table using CQL,  paginated 10 messages at time on scroll and loaded the same into the text area. I don’t have exact performance metrics with me but it was It was impressive without too much visible lag.

References

  1. https://academy.datastax.com/resources/getting-started-killrchat-example-data-model-messaging
  2. https://www.quora.com/How-long-does-WhatsApp-keep-the-delivered-data-like-images-or-etc-in-the-server
  3. http://www.planetcassandra.org/blog/we-shall-have-order/