Continuous streaming integration with streaming data platforms

This is a 4 min readSomeone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading

Experiences with Kafka and exactly-once processing in IoT apps

This is a 5 min readSome context on message brokers and delivery guarantees (If you have fair amount of experiences with message processing and delivery guarantees please skip to the next part of this post.) Message delivery guarantee is one of the canonical requirements for message brokers and they are very relevant for all types of brokers: the ones based on queue semantics and the ones… Continue reading

Why Zookeeper is always configured with odd number of nodes ?

This is a 2 min readSomeone in asked me  “Why Zookeeper is always configured with odd number of nodes ?”. Well, thats a great question but sad part is, not even many practitioners, even those who use Zookeeper in production can explain it simply. I will try to keep this really simple, I promise. ZooKeeper (ZK) is a highly-available, highly-reliable and… Continue reading

“Exactly-once” with a Kafka-Storm Integration

This is a 4 min readUpdate 4, Nov 2016: When I first wrote this post it was outright mockery and contempt. But the Google Data flow paper (The Unified google framework for Batch (FlumeJava) and Stream processing (MillWheel)) and the Google MillWheel paper clearly explains that this is exactly the same approach google team has taken to solve the duplicate events problem…. Continue reading

Please don’t call Kafka as a messaging system

This is a 3 min readUpdate, 11/Nov/2016: Originally this post was titled “Please don’t call Kafka as a messaging system”, I had to change it as some people went “what else would you call it ?”. The Kafka tagline used to be something like “Message processing rethought”,  but looks like later they changed  it to “A Distributed streaming platform”. So, I am changing the… Continue reading

What you should know about the design of IoT Platforms

This is a 4 min readIoT-PaaS started reaching the masses, I have already captured the list of both freely and commercially available IoT platforms in my previous post here.  But I wanted to concentrate on the elements of a successful IoT platform in a post. But again  it will be a too much for a single post to cover.Let me break it down… Continue reading