Continuous streaming integration with streaming data platforms

Someone asked me in Quora "Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?" Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is…

Experiences with Kafka and exactly-once processing in IoT apps

Some context on message brokers and delivery guarantees (If you have fair amount of experiences with message processing and delivery guarantees please skip to the next part of this post.) Message delivery guarantee is one of the canonical requirements for message brokers and they are very relevant for all types of brokers: the ones based on queue semantics and the ones based on…

Why Zookeeper is always configured with odd number of nodes ?

Someone in asked me  "Why Zookeeper is always configured with odd number of nodes ?". Well, thats a great question but sad part is, not even many practitioners, even those who use Zookeeper in production can explain it simply. I will try to keep this really simple, I promise. ZooKeeper (ZK) is a highly-available, highly-reliable and…

“Exactly-once” with a Kafka-Storm Integration

Update 4, Nov 2016: When I first wrote this post it was outright mockery and contempt. But the Google Data flow paper (The Unified google framework for Batch (FlumeJava) and Stream processing (MillWheel)) and the Google MillWheel paper clearly explains that this is exactly the same approach google team has taken to solve the duplicate events problem….

Please don’t call Kafka as a messaging system

Update, 11/Nov/2016: Originally this post was titled "Please don't call Kafka as a messaging system", I had to change it as some people went "what else would you call it ?". The Kafka tagline used to be something like "Message processing rethought",  but looks like later they changed  it to "A Distributed streaming platform". So, I am changing the…

What you should know about the design of IoT Platforms

IoT-PaaS started reaching the masses, I have already captured the list of both freely and commercially available IoT platforms in my previous post here.  But I wanted to concentrate on the elements of a successful IoT platform in a post. But again  it will be a too much for a single post to cover.Let me break it down…