What the heck is Apache ZooKeeper anyway ?

This is a 2 min read

Here is an attempt to intuitively explain how ZooKeeper works and how it can used.

1. At a High level

ZooKeeper is a service for sure – that provides access to clients to a tree like structure or a hierarchical namespace as ZooKeeper documentation says. So why we need this tree ? Of course for storing data and hence it is called a “data tree”. Each node of the tree is called a zNode. It uses the standard UNIX notation for file system paths. For example, we use /A/B/C to denote the path to zNode C, where C has B as its parent and B has A as its parent. Here is a sample.

Screen Shot 2015-09-19 at 11.23.23 pm

That looks very much like a UNIX file system, right? But here is the terminology

  1. Each node in the tree is called a zNode
  2. Every zNode in the tree is identified by a path
  3. zNode types – persistent and ephemeral
  4. Each zNode will store a value or data and may be child nodes
  5. Cannot rename zNodes
  6. We can add/remove watchers to zNodes.

That’s it, someone rightly said ZooKeeper is Feature-light and you could build recipes on top of it. Take a look at netflix’s curator framework (now a Apache project) – some said this makes using ZooKeeper so much easy from a client perspective.

2. One level deeper

So lets consider another example where goal is to store some configurations in a <K,V> format and make it available across cluster of machines. The <K,V> should be persistent aka disk based and should be HA or Replicated and fault tolerant. ZooKeeper is a natural for this use case.

Screen Shot 2015-09-19 at 11.34.53 pm

Want to try this ? – install ZooKeeper locally, get into the bin directory and run the following command

Once zNodes are created with the desired path you can use GET and SET to use it as a distributed <K,V> store or hashmap.You can install netflix exhibitor – a UI based supervisor for ZooKeeper with which you can visualize data stored.

 3. Now how Kafka uses ZooKeeper?

As of v0.8 Kafka uses zookeeper for storing variety configurations and use them across the cluster in a distributed fashion Lets take 2 simple use cases for which Kafka maintains values in ZooKeeper

  1. Topics under a broker – /brokers/topics/[topic]
  2. Next Offset for a Consumer/Topic/Partition combination – /consumers/[groupId]/offsets/[topic]/[partitionId]

Screen Shot 2015-09-19 at 11.42.34 pm

Now think about “distributed-ness” Of course configurations like these are replicated and distributed throughout the ZooKeeper ensemble – Leader node and Follower nodes.