Kafka Connector 3 Developer Preview 1

I’m glad to announce the first developer preview of the next major iteration of our integration with Kafka. This version is based on a new library for DCP, and supports the Kafka Connect framework. In this post I will show how it could be integrated to relay data from Couchbase to HDFS.

Here I'll show steps for CentOS/Fedora Linux distributions. The steps on other OSs are going to be similar. First, install Confluent Platform (http://docs.confluent.io/3.0.0/installation.html#rpm-packages-via-yum) and download the Couchbase zip archive with connector integration http://packages.couchbase.com/clients/kafka/3.0.0-DP1/kafka-connect-couchbase-3.0.0-DP1.zip

To register the connector, just extract the contents to the default class path, for example on CentOS (Fedora) it is /usr/share/java:

unzip kafka-connect-couchbase-3.0.0-DP1.zip
sudo cp -a kafka-connect-couchbase-3.0.0-DP1/share /usr/

1 2	unzip kafka-connect-couchbase-3.0.0-DP1.zip sudo cp -a kafka-connect-couchbase-3.0.0-DP1/share /usr/

Now run the Confluent Control Center and all dependent services. Read more about what these commands do at Confluent's quickstart guide

sudo zookeeper-server-start /etc/kafka/zookeeper.properties
sudo kafka-server-start /etc/kafka/server.properties
sudo schema-registry-start /etc/schema-registry/schema-registry.properties
sudo connect-distributed /etc/kafka/connect-distributed.properties
sudo control-center-start /etc/confluent-control-center/control-center.properties

sudo zookeeper-server-start /etc/kafka/zookeeper.properties

sudo kafka-server-start /etc/kafka/server.properties

sudo schema-registry-start /etc/schema-registry/schema-registry.properties

sudo connect-distributed /etc/kafka/connect-distributed.properties

sudo control-center-start /etc/confluent-control-center/control-center.properties

At this point everything is ready for setting up the link to transfer documents from Couchbase to HDFS using Kafka Connect. We assume you are running Couchbase Server on http://127.0.0.1:8091/ and Confluent Control Center on http://127.0.0.1:9021/. For this example, make sure you have the travel-sample bucket loaded on Couchbase. If you didn't set it up when setting up the cluster, you can add it through the settings part of the Web UI.

Once you have all of theese prerequisites out of the way, navigate to the section “Kafka Connect” in your Confluent Control Center. Select “New source”, then select “CouchbaseSourceConnector” as a connector class and fill in the settings so that the final JSON will be similar to:

{
  "connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",
  "name": "travel-source",
  "connection.bucket": "travel-sample",
  "connection.cluster_address": "127.0.0.1",
  "topic.name": "travel-topic"
}

{

"connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",

"name": "travel-source",

"connection.bucket": "travel-sample",

"connection.cluster_address": "127.0.0.1",

"topic.name": "travel-topic"

}

Once you save the Source connection, the Connect daemon will start receiving mutations and storing them into specified Kafka topic. To demonstrate a full pipeline, lets setup a Sink connection to get data out of Kafka. To do so, go to “Sinks” tab, and click “New sink” button. It should ask for a topics where interesting data stored, enter “travel-topic”. Then select “HdfsSinkConnector” and fill in settings so that, the JSON config will look like this (assuming the HDFS name node is listening on hdfs://127.0.0.1:8020/):

{
  "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
  "name": "hdfs-travel-sink",
  "flush.size": "10",
  "partitioner.class": "io.confluent.connect.hdfs.partitioner.FieldPartitioner",
  "partition.field.name": "partition",
  "hdfs.url": "hdfs://127.0.0.1:8020",
  "topics": "travel-topic"
}

{

"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",

"name": "hdfs-travel-sink",

"flush.size": "10",

"partitioner.class": "io.confluent.connect.hdfs.partitioner.FieldPartitioner",

"partition.field.name": "partition",

"hdfs.url": "hdfs://127.0.0.1:8020",

"topics": "travel-topic"

}

Once the Sink connection configured, you will see the data appearing on HDFS in /topics/travel-topic/ with the default topics directory. Let's inspect one of them:

$ hdfs dfs -fs hdfs://localhost:8020 -cat /topics/travel-topic/partition=89/travel-topic+0+0000000101+0000000101.avro | avropipe
/   []
/0  {}
/0/partition    89
/0/key  "route_28879"
/0/expiration   0
/0/flags    33554438
/0/cas  1471633063247347712
/0/lockTime 0
/0/bySeqno  1
/0/revSeqno 1
/0/content  "{"id":28879,"type":"route","airline":"G4","airlineid":"airline_35","sourceairport":"AZA","destinationairport":"FWA","stops":0,"equipment":"319","schedule":[{"day":0,"utc":"01:59:00","flight":"G4097"},{"day":1,"utc":"09:30:00","flight":"G4697"},{"day":1,"utc":"09:50:00","flight":"G4879"},{"day":1,"utc":"07:44:00","flight":"G4310"},{"day":1,"utc":"01:23:00","flight":"G4226"},{"day":2,"utc":"19:58:00","flight":"G4921"},{"day":2,"utc":"09:49:00","flight":"G4376"},{"day":2,"utc":"17:57:00","flight":"G4446"},{"day":2,"utc":"21:06:00","flight":"G4032"},{"day":3,"utc":"17:05:00","flight":"G4198"},{"day":3,"utc":"12:21:00","flight":"G4098"},{"day":3,"utc":"19:31:00","flight":"G4571"},{"day":4,"utc":"05:27:00","flight":"G4001"},{"day":4,"utc":"07:03:00","flight":"G4023"},{"day":4,"utc":"16:50:00","flight":"G4631"},{"day":5,"utc":"18:13:00","flight":"G4757"},{"day":6,"utc":"20:35:00","flight":"G4157"},{"day":6,"utc":"21:52:00","flight":"G4582"},{"day":6,"utc":"00:55:00","flight":"G4348"},{"day":6,"utc":"06:01:00","flight":"G4731"}],"distance":2483.859992489083}"

$ hdfs dfs -fs hdfs://localhost:8020 -cat /topics/travel-topic/partition=89/travel-topic+0+0000000101+0000000101.avro | avropipe

/ []

/0 {}

/0/partition 89

/0/key "route_28879"

/0/expiration 0

/0/flags 33554438

/0/cas 1471633063247347712

/0/lockTime 0

/0/bySeqno 1

/0/revSeqno 1

/0/content "{"id":28879,"type":"route","airline":"G4","airlineid":"airline_35","sourceairport":"AZA","destinationairport":"FWA","stops":0,"equipment":"319","schedule":[{"day":0,"utc":"01:59:00","flight":"G4097"},{"day":1,"utc":"09:30:00","flight":"G4697"},{"day":1,"utc":"09:50:00","flight":"G4879"},{"day":1,"utc":"07:44:00","flight":"G4310"},{"day":1,"utc":"01:23:00","flight":"G4226"},{"day":2,"utc":"19:58:00","flight":"G4921"},{"day":2,"utc":"09:49:00","flight":"G4376"},{"day":2,"utc":"17:57:00","flight":"G4446"},{"day":2,"utc":"21:06:00","flight":"G4032"},{"day":3,"utc":"17:05:00","flight":"G4198"},{"day":3,"utc":"12:21:00","flight":"G4098"},{"day":3,"utc":"19:31:00","flight":"G4571"},{"day":4,"utc":"05:27:00","flight":"G4001"},{"day":4,"utc":"07:03:00","flight":"G4023"},{"day":4,"utc":"16:50:00","flight":"G4631"},{"day":5,"utc":"18:13:00","flight":"G4757"},{"day":6,"utc":"20:35:00","flight":"G4157"},{"day":6,"utc":"21:52:00","flight":"G4582"},{"day":6,"utc":"00:55:00","flight":"G4348"},{"day":6,"utc":"06:01:00","flight":"G4731"}],"distance":2483.859992489083}"

That’s my quick runthrough example! The DCP client is still under active development and has some additional features being added to handle various topology change, failure scenarios. The next couple updates of our Kafka connector will pick up those updates. I should also briefly note that Couchbase's DCP client interface should be considered volatile for the moment. We use it in various projects, but you should only use it directly at your own risk.

The source code for the connector is at https://github.com/couchbaselabs/kafka-connect-couchbase. The issue tracker is at https://issues.couchbase.com/projects/KAFKAC, and feel free to ask any questions on https://www.couchbase.com/forums/.

Sergey Avseyev, SDK Engineer, Couchbase

Products

See How Capella Stacks Up

See How Capella Stacks Up

By Industry

By Need

Why NoSQL

What is NoSQL and why choose it?

Popular Docs

By Developer Role

Capella Playground

Start A Free Capella Trial

Resource Center

Education

Certification Exams 2023

Get Couchbase certified

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Kafka Connector 3 Developer Preview 1

Author

Posted by Sergey Avseyev, SDK Engineer, Couchbase

Leave a reply Cancel reply