Getting Started with Apache Pulsar on Kubernetes

16 July 2021

Apache Pulsar is a distributed pub-sub messaging system that I have recently enjoyed testing for some projects of mine. It has grown very much on me and am now preferring it over Apache Kafka. In this post we will go over installing and getting started with an Apache Pulsar instance running on Kubernetes.

Though there are many similarities and differences between Pulsar and Kafka, one key architectural difference that makes the biggest difference for me is the decoupling of brokers and the storage layer. You can find many great articles online comparing them to make the right choice for your needs.

Apache Pulsar has amazing documentation with concepts and architecture details that I highly recommend reading. In this post we will use helm to install Pulsar, so let’s go to the official docs for Pulsar on Kubernetes here. The steps are very clearly described in the docs so I will not be repeating them here, but let’s go over an example of a helm install command to get us started:

helm upgrade --install pulsar apache/pulsar --timeout 360m --set initialize=true --set namespace=data-infra-pulsar --set volumes.persistence=true --set bookkeeper.volumes.journal.size=32Gi --set bookkeeper.volumes.ledgers.size=160Gi --set affinity.anti_affinity=false
  • --timeout 360m – Gives enough time for the installation to complete in case some things are slow
  • --set initialize=true – Needed to be set if this is the first time installing
  • --set namespace=<ns> – Specify the Kubernetes namespace to install into
  • --set affinity.anti_affinity=false – Needed if installing on a single-node Kubernetes cluster
  • --set volumes.persistence=true – Persist the data in a PVC (turn off if not needed in your case)
  • --set bookkeeper.volumes.journal.size=32Gi – Specify the size of the journal. When a message is written to a bookie, the message is first written to a journal file–WAL
  • --set bookkeeper.volumes.ledgers.size=160Gi – Specify the size of the ledgers

When the installation completes, you should have a proxy LoadBalancer Kubernetes service running at port 6650. The IP of that service can be used in the serviceURL or adminURL of your clients to access the Pulsar instance. Example from an installation of mine:

String serviceUrl = "pulsar://10.8.56.51:6650";
String adminUrl = "http://10.8.56.51";

Let the publishing and subscribing begin!


comments powered by Disqus