Understanding Zookeeper by Doing

Wednesday, Jan 29, 2020| Tags: k8s, kubernetes, zookeeper, kafka, distributed systems

Software ecosystem of distrubuted systems is vast and each of the pieces do a specific task. Many a times you will find many systems that provide similar functionality.

Let’s take a look at the problem of distributed coordination today, and one of the most popular systems to help solve it is zookeeper. Part of the hadoop ecosystem, zookeeper is very popular. Let’s understand what it is and how it works.

At its core zookeeper is kind of a file system that has nodes and nodes store data. You can watch the changes to the nodes and take appropriate actions. Generally you would use zooker when you are building a distributed system that has multiple nodes and they need to coordinate with each other.

First thing we need to do is to install zookeeper. You can follow the official guide at https://zookeeper.apache.org/doc/r3.5.5/zookeeperStarted.html or you can follow the below steps to install zookeeper on your kubernetes cluster.

When trying out new stuff I like to use kubernetes as its easy to install prepackaged software using helm charts or operators. I have an EKS cluster running on AWS. If you would like to use kubernetes and want to spin up an EKS cluster you can create on using the command

eksctl create cluster --managed

Head on to https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html to know more about eksctl

I will install zookeeper using helm 3.

search for zookeeper in helm repository using

helm search repo zookeeper


incubator/zookeeper     2.1.3           3.5.5       Centralized service for maintaining configurati…
incubator/kafka         0.20.8          5.0.1       Apache Kafka is publish-subscribe messaging ret…
stable/kafka-manager    2.2.0     A tool for managing Apache Kafka.

helm 3 installations are stored in namespaces. Lets crreate a namespace for zookeeper and switch to it.

kubectl create namespace zookeeper
kubectl config get-contexts
CURRENT   NAME                                                  CLUSTER                                               AUTHINFO                                              NAMESPACE
*         prabhat@basic3.us-west-2.eksctl.io                    basic3.us-west-2.eksctl.io                            prabhat@basic3.us-west-2.eksctl.io                    default
          prabhat@f1.us-east-2.eksctl.io                        f1.us-east-2.eksctl.io                                prabhat@f1.us-east-2.eksctl.io
          prabhat@ireland14.eu-west-1.eksctl.io                 ireland14.eu-west-1.eksctl.io                         prabhat@ireland14.eu-west-1.eksctl.io
kubectl config set-context prabhat@basic3.us-west-2.eksctl.io  --namespace=zookeeper
helm install zookeeper incubator/zookeeper

It will take a couple minutes and you will have a stateful application with 3 pods will be running.

kubectl get pods
zookeeper-0   1/1     Running   2          3m
zookeeper-1   1/1     Running   1          2m
zookeeper-2   1/1     Running   0          1m

Let’s look at the service created for zookeeper

kubectl get services
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
zookeeper            ClusterIP           2181/TCP                     3m
zookeeper-headless   ClusterIP   None                   2181/TCP,3888/TCP,2888/TCP   3m

You can connect your laptop to the cluster using openvpn for easy development. Find details here

Now you can run the below code and it will show you the structure of information stored in zookeeper.

sample output of ready.py


You can store upto 1 MB of data in the nodes. You also have the option of watching the particular nodes to benotified whenever they change or a child is added to them. Watching is what enables distributed coordination in great part.

In order to create a new node you can run the below code:

After running the write script you can run the read script again to see the changes or make modifications to it to watch the nodes.

This ends short experiment on zookeeper.

Bonus : etcd has been real popular recently for distributed coordination. If you are starting a new project you would want to use etcd instead of zookeeper. Its written in go and is much easier to use.