Someone in Quora.com asked me “Why Zookeeper is always configured with odd number of nodes ?”. Well, thats a great question but sad part is, not even many practitioners, even those who use Zookeeper in production can explain it simply. I will try to keep this really simple, I promise. ZooKeeper (ZK) is a highly-available, highly-reliable and… Continue reading
Posts Tagged → Storm
Introducing FunnelCloud – A lightweight abstraction atop Apache Storm
Idea of building a light weight abstraction on top of storm is to bring the best of micro-batching and processing flexibility of storm.FunnelCloud also has few added practical features. Gwen Shapira, Confluent explains the value of micro-batching and how it improves the throughput in distributed architecture where n/w roundtrips are inevitable. Here is the full post. Let’s say due… Continue reading
“Exactly-once” with a Kafka-Storm Integration
Update 4, Nov 2016: When I first wrote this post it was outright mockery and contempt. But the Google Data flow paper (The Unified google framework for Batch (FlumeJava) and Stream processing (MillWheel)) and the Google MillWheel paper clearly explains that this is exactly the same approach google team has taken to solve the duplicate events problem…. Continue reading
Beginner’s guide to Fast Data
What is Fast data ? Fast data is becoming a catch-phrase as we speak. If you are hearing for the first time, please don’t worry. We are going to talk about it detail in this post. (But I am going to assume some big data background from you). Let me start by graphically telling the… Continue reading
What wikipedia can’t tell you about Apache storm and Apache spark streaming
I am seeing a lot of questions around Spark streaming and Storm in Quora. When to choose what and what are their performances, reliability and support like. There are a lot of comparisons as usual available in the web , if you google around you could find. But instead comparing them side by side I thought of talking… Continue reading