Modernising data architecture for enterprises

Prelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject.  First step in Big data solutions consulting like any…

Wide row data modelling with Apache Cassandra

I have always been intrigued by the performance claims of Apache cassandra. So, I wanted to put the whole "wide rows" and the performance edge claims that wide-row data model said to offer to the test. Rumour has it, Facebook hired ex-Amazon engineers who wrote Dynamo  to build cassandra. Anyways, a sound starting point is to…

Terminology confusion: Column Stores and Column oriented databases

This is my attempt to clear the air in the subjects of Column Stores and Column oriented databases (both at terminology and at understanding level). I will be talking a bit about how terrible is the idea of grouping column oriented databases as flavour of NoSQL data stores. What is a column store really ? There is no scope…

How does the Log-Structured-Merge-Tree work?

If you are wondering why should you care about LSM Tree, In one of my previous posts Art of choosing a datastore , I have briefly touched upon LSM-Trees. But this writeup is the best out there if you want to learn the inner workings of a LSM-Tree. How does the Log-Structured-Merge-Tree work? This was Quora answer by David Jeske….

Art of choosing a datastore

Update 3,Nov 2016:  When I wrote this post, there we lot of opinions/comments (in my older blog) about how I am wrong in thinking about choosing NoSQL datastore is very much like choosing a data structure when writing a program. Here is an excerpt from Nathan Marz's book "Big Data: principles and best practices of scalable real-time data systems" that…

C for Consistency : Inconsistencies about “consistency” in data systems

There are countless resources explaining CAPl, there are a lot of confusions around this subject. CAP has everything to do with distributed systems, NoSQL data stores happen to be one type of distributed system but surprisingly CAP has been misunderstood after the popularity of NoSQL data stores. IMHO no one explains these better than these…