“Exactly-once” with a Kafka-Storm Integration

This is a 4 min readUpdate 4, Nov 2016: When I first wrote this post it was outright mockery and contempt. But the Google Data flow paper (The Unified google framework for Batch (FlumeJava) and Stream processing (MillWheel)) and the Google MillWheel paper clearly explains that this is exactly the same approach google team has taken to solve the duplicate events problem…. Continue reading

Art of choosing a datastore

This is a 7 min readUpdate 3,Nov 2016:  When I first wrote this post, there were a lot of opinions/comments (in my older blog) about how I am wrong in thinking about choosing a datastore is almost like choosing a data structure when writing a program. Here is an excerpt from Nathan Marz’s book “Big Data: principles and best practices of… Continue reading

Please don’t call Kafka as a messaging system

This is a 4 min readUpdate, 11/Nov/2016: Originally this post was titled “Please don’t call Kafka as a messaging system”, I had to change it as some people went “what else would you call it ?”. The Kafka tagline used to be something like “Message processing rethought”,  but looks like later they changed  it to “A Distributed streaming platform”. So, I am changing the… Continue reading

Internals of Spark Streaming

This is a 3 min readSome context… As the title of the post suggests, this is not a Spark streaming primer. Frankly, this post is written for an audience who seeks to enhance a foundation of knowledge that has already been established on Spark and Spark streaming. I also find a surprising number of developers programming in Spark streaming without knowing the inner… Continue reading

What you need to know before writing Streaming APIs

This is a 2 min readWhat are Streaming APIs? Streaming APIs are not to be confused with multimedia streaming API services like Netflix or Youtube. Industry is starting to use a newer breed of REST APIs called the Streaming APIs to offer a “high-throughput” pipeline to receive curated data. With these APIs you can capture information in real time. It’s… Continue reading

What wikipedia can’t tell you about Apache storm and Apache spark streaming

This is a 1 min readI am seeing a lot of questions around Spark streaming and Storm in Quora. When to choose what and what are their performances, reliability and support like. There are a lot of comparisons as usual available in the web , if you google around you could find. But instead comparing them side by side I thought of talking… Continue reading

What you didn’t know about Real-time notification systems

This is a 2 min readI have been intrigued by Event Notification systems for a long time now, In fact this started from my programming days in legacy environments like iSeries. So I started working on a toy project which evolved into a solid project. I thought I will muse about that recent project the RealTimeNotification. But before going into the details of the… Continue reading