The premise is very simple: in the world of disparate technologies where one does not works or integrates well together, Couchbase & Confluent Kafka are amazing products and are extremely complementary to each other. Couchbase is linearly scalable, distributed NoSQL JSON database. Its primary use case is for any application/web service which requires single digit ms latency read/write/update response. It can be used as System of Record(SoR) or Caching layer for fast mutating transient data or offloading Db2/Oracle/SQL Server etc so that downstream services can consume data from Couchbase.

Confluent Kafka is full-fledged distributed streaming platform which is also linearly scalable and capable of handling trillions of events in a day. Confluent Platform makes it easy to build real-time data pipelines and streaming applications by integrating data from multiple sources and locations into a single, central Event Streaming Platform for your company.

In this blog post we will cover how seamlessly we can move data out Couchbase and push into a Confluent kafka topic as replication event.

Couchbase Kafka connector transfers documents from Couchbase efficiently and reliably using Couchbaseā€™s internal replication protocol, DCP. Every change to or deletion of the document generates a replication event, which is then sent to the configured Kafka topic.

Kafka connector can be used to move data out of Couchbase and move data from kafka to Couchbase using sink connector. For this blog purpose we will be configuring and using source connector.

At high level the architecture diagram looks like below

Pre-requisites

Couchbase Cluster running version 5.X or above. Download Couchbase here

Couchbase kafka connector. Download Couchbase kafka connector here

Confluent Kafka. Download Confluent Kafka here

Configuring Couchbase cluster is outside of the scope of this blog post. However we will be discussing configuring Confluent kafka and Couchbase kafka connector to move data out of Couchbase.

Configuring Confluent Kafka

Untar the package downloaded above on a VM/pod. For the purpose of this blog, I have deployed an Ubuntu Pod in kubernetes cluster running on GKE.

Make sure before installing confluent kafka machines needs to have java 8 version.

Install/start kafka

kafka has following processes, which all should be up.

Pod running confluent kafka can be exposed via NodePort service to the local machine/laptop. App pod file is here. Service yaml file is here

Port-forward the service on local port 9021

Hit the URL: http://localhost:9021

Access Confluent KafkaĀ UI

Configuring Couchbase Kafka Connector

Unzip the package downloaded from above

Edit file quickstart-couchbase-source.properties with (atleast) following information

Cluster Connection string

bucket name and bucket access credentials

Note: Enter credentials for bucket you want to move data too. In my example, I am using travel-sample bucket, with bucket user credentials.

Export the variable CONFLUENT_HOME

Start the kafka connector

When connector is started it created a kafka topic with the name cb-topic and we can see all documents from Couchbase travel-sample bucket get transferred to kafka topic cb-topic as events

Events in kafka topicĀ cb-topic

Conclusion

In the matter of minutes one can integrate Couchbase and Confluent Kafka. Ease of use, deployment and supportability are key factors in using technology. In this blog post we saw that one can seamlessly move data out of Couchbase into a kafka topic. Once data is in kafka topic, then using KSQL one can create real-time stream processing applications matching business needs.

References:

  1. https://docs.confluent.io/current/quickstart/ce-quickstart.html#ce-quickstart
  2. https://docs.couchbase.com/kafka-connector/3.4/quickstart.html
  3. https://docs.couchbase.com/kafka-connector/3.4/source-configuration-options.html

 

 

Author

Posted by Ram Dhakne

Ram Dhakne is Solutions Consultant - US West at Couchbase. He currently helps Enterprise customers with their digital innovations journey and helping them adopt NoSQL technologies. His current interests are running persistent applications like Couchbase NoSQL server on Kubernetes clusters running on AKS, GKE, ACS and OpenShift, securing end-to-end on kubernetes. In his past life has worked on IaaS platforms (AWS, GCP, Azure & Private Clouds), Enterprise Backup Target Products & Backup Applications.

Leave a reply