Art of choosing a datastore

This is a 8 min readUpdate 3,Nov 2016:  When I wrote this post, there we lot of opinions/comments (in my older blog) about how I am wrong in thinking about choosing NoSQL datastore is very much like choosing a data structure when writing a program. Here is an excerpt from Nathan Marz’s book “Big Data: principles and best practices of scalable real-time data systems” that… Continue reading

C for Consistency : Inconsistencies about “consistency” in data systems

This is a 2 min readThere are countless resources explaining CAPl, there are a lot of confusions around this subject. CAP has everything to do with distributed systems, NoSQL data stores happen to be one type of distributed system but surprisingly CAP has been misunderstood after the popularity of NoSQL data stores. IMHO no one explains these better than these… Continue reading

Please don’t call Kafka as a messaging system

This is a 3 min readUpdate, 11/Nov/2016: Originally this post was titled “Please don’t call Kafka as a messaging system”, I had to change it as some people went “what else would you call it ?”. The Kafka tagline used to be something like “Message processing rethought”,  but looks like later they changed  it to “A Distributed streaming platform”. So, I am changing the… Continue reading

Internals of Spark Streaming

This is a 3 min readSome context… As the title of the post suggests, this is not a Spark streaming primer. Frankly, this post is written for an audience who seeks to enhance a foundation of knowledge that has already been established on Spark and Spark streaming. I also find a surprising number of developers programming in Spark streaming without knowing the inner… Continue reading

What you need to know before writing Streaming APIs

This is a 2 min readWhat are Streaming APIs? Streaming APIs are not to be confused with multimedia streaming API services like Netflix or Youtube. Industry is starting to use a newer breed of REST APIs called the Streaming APIs to offer a “high-throughput” pipeline to receive curated data. With these APIs you can capture information in real time. It’s… Continue reading

What wikipedia can’t tell you about Apache storm and Apache spark streaming

This is a 1 min readI am seeing a lot of questions around Spark streaming and Storm in Quora. When to choose what and what are their performances, reliability and support like. There are a lot of comparisons as usual available in the web , if you google around you could find. But instead comparing them side by side I thought of talking… Continue reading