“Exactly-once” with a Kafka-Storm Integration

This is a 4 min readUpdate 4, Nov 2016: When I first wrote this post it was outright mockery and contempt. But the Google Data flow paper (The Unified google framework for Batch (FlumeJava) and Stream processing (MillWheel)) and the Google MillWheel paper clearly explains that this is exactly the same approach google team has taken to solve the duplicate events problem…. Continue reading

Art of choosing a datastore

This is a 7 min readUpdate 3,Nov 2016:  When I wrote this post, there we lot of opinions/comments (in my older blog) about how I am wrong in thinking about choosing NoSQL datastore is very much like choosing a data structure when writing a program. Here is an excerpt from Nathan Marz’s book “Big Data: principles and best practices of scalable real-time data systems” that… Continue reading

C for Consistency : Inconsistencies about “consistency” in data systems

This is a 2 min readThere are countless resources explaining CAPl, there are a lot of confusions around this subject. CAP has everything to do with distributed systems, NoSQL data stores happen to be one type of distributed system but surprisingly CAP has been misunderstood after the popularity of NoSQL data stores. IMHO no one explains these better than these… Continue reading

Please don’t call Kafka as a messaging system

This is a 4 min readUpdate, 11/Nov/2016: Originally this post was titled “Please don’t call Kafka as a messaging system”, I had to change it as some people went “what else would you call it ?”. The Kafka tagline used to be something like “Message processing rethought”,  but looks like later they changed  it to “A Distributed streaming platform”. So, I am changing the… Continue reading

Internals of Spark Streaming

This is a 3 min readSome context… As the title of the post suggests, this is not a Spark streaming primer. Frankly, this post is written for an audience who seeks to enhance a foundation of knowledge that has already been established on Spark and Spark streaming. I also find a surprising number of developers programming in Spark streaming without knowing the inner… Continue reading

What you need to know before writing Streaming APIs

This is a 2 min readWhat are Streaming APIs? Streaming APIs are not to be confused with multimedia streaming API services like Netflix or Youtube. Industry is starting to use a newer breed of REST APIs called the Streaming APIs to offer a “high-throughput” pipeline to receive curated data. With these APIs you can capture information in real time. It’s… Continue reading