Many organizations looking to modernize their legacy databases are evaluating fully managed hosted DBaaS offerings that provide flexibility, performance and scale without compromising the trusted strengths of relational databases (RDBMS). With the introduction of Couchbase Capella DBaaS, customers can now benefit from exceptional price-performance to bring down their total cost of ownership (TCO).

However, ensuring that established data integration tools and processes can be used to move data into Couchbase Capella can be an important dependency in transitioning to cloud services. This is why we are excited to introduce the support of Couchbase 7.x in Talend Big Data Platform 8.0, covering both Couchbase Server and Capella.

Talend offers a range of data management capabilities, ranging from business process management to ETL and master data management. Talend’s new Big Data Platform 8.0 release contains an updated Couchbase connector, enabling the modernization and migration of business-critical applications.

Creating an ETL workflow for Couchbase Capella

This blog shows how to connect Talend with Couchbase Capella and perform a simple ETL workflow. Talend Big Data Platform will be leveraged to export relational data from a MySQL database into Couchbase using the JSON document format.

The workflow will read from a MySQL database table and create corresponding JSON documents in Couchbase Server. You can build upon this example and augment it to support more complex scenarios involving data transformations, complex mappings, etc.

The following steps have been tested on macOS but can be adapted for other systems.

Couchbase Capella offers a free 30-day trial and allows you to create your first cluster in a few clicks in just minutes. You can start your trial with a single click.

talend-couchbase-connector10

Just enter your name, email, and organization name. Once your email is verified, you can begin your trial. Simply select your preferred AWS region and click “Start Trial”.

Couchbase Capella signup

The trial will create a single node Couchbase cluster in AWS. Once your cluster is created, we need to configure a data bucket for this example.

Navigate to Clusters > Trial-Cluster > Buckets > Create Bucket.  We will name the bucket world.

Click Next to continue, accept the defaults and click Create Bucket. Also, create the travel-sample bucket by navigating to Tools > Import and scroll down to Travel Sample and select Import.

Import Couchbase sample data

We will now need to configure access to the cluster from a remote workstation.

Navigate to Clusters > Trial-Cluster > Connect and scroll down to Database Access > Manage Credentials, then +Create Database Credential.

Create database credentials couchbase cloud

Make sure to use the dropdown under Bucket Level Access and enter world, All Scopes, and Read/Write for the settings, then click Create. Repeat for the travel-sample bucket.

Allowing local IP access to Capella

Next, we need to configure access to your local workstation as we will be running Talend Studio locally. Navigate to Connect > Manage Allowed IP. From there, you can click + Add My IP.

Add allowed IP for Couchbase cloud capella

The Couchbase Capella configuration is complete for now.

Install MySQL sample database

Next, we need to install and/or configure the MySQL server which is also running on the local workstation. I am using macOS and Homebrew to install additional packages.

To install MySQL via Homebrew, open a terminal and type: brew install mysql . 

Download and import the world database:

  • Download world.sql
  • Start MySQL server, if not started already, and connect to server:
    sudo mysql –u root –p
  • Create world database:
    CREATE DATABASE world;
  • Load content from world database:
    USE world;
    SOURCE world.sql;
  • Verify the world database has been imported correctly:
    SHOW TABLES;
  • The output should show three tables: City, Country and CountryLanguage

Now it is time to install Talend Studio. You can download Talend Studio 8.0.1 from Talend Cloud. Navigate to your user and select Downloads:

Install Talend Studio

Select Talend Studio and your platform from the dropdown and version 8.0.1.

Download Talend Studio from Talend Cloud

Note: Talend recommends Java 11. For this exercise, I am using OpenJDK-11 installed via Homebrew with:

% brew install openjdk@11

Accept the license agreement once the Talend Studio is installed and provide a license file or connection to your Talend Cloud.

Install Couchbase connector for Talend Studio

After launching Talend Studio, you will need to install the Couchbase connector to proceed. Navigate through Help to the Feature Manager and search for Couchbase.

Talend feature manager with Couchbase

After launching Talend Studio, navigate to Job Designs > Standard and right-click on New Job, enter the name MySQLToCouchbaseIntegration and click Finish.

Create new job in Talend for Couchbase integration

This will create a panel where job artifacts can be dragged and dropped and should display the palette to the right. From the palette, drag and drop the tMyslInput widget to the left. Find the widget under Databases/MySQL.

couchbase input for talend operator

From the palette, drag and drop tCouchbaseOutput widget (can be found under NoSQL Databases/Couchbase) to the right.

Double click on tMysqlInput on the panel; this will display the Component panel at the bottom.

Enter the MySQL database connection info.

  • Click on the button, located to the right of Table Name, expand world and select the City entry and click on the OK button.
  • Remove the default query and replace with:
    SELECT * FROM City

Below is an example screenshot of what it should look like:

Couchbase input to Talend

Configure target schema

Click on the Guess schema button. This will bring up the Schema window; move and resize the window as appropriate.

Edit the schema with the following changes:

    • Check Nullable for ID
    • Check Nullable for Population
    • Change ID type to String

Set DB types to the following and click on OK:

    • INT for ID
    • VARCHAR for fields Name, CountryCode, and District
    • INT for Population

This screenshot shows all these schema settings:

Couchbase schema in Talend loader

Left-click on tMysqlInput and then right-click to select Row/Main and drag an arrow from the tMysqlInput component to tCouchbaseOutput. An arrow should be created between the two.

tcouchbaseoutput in Talend

We will now navigate back to the Couchbase Capella console to get our connection endpoint. Navigate to Clusters > Trial Cluster > Connections and copy the Wide Area Network endpoint address:

Couchbase cloud network endpoint setup

Left-click on tCouchbaseOutput on the panel; this will display the Component panel at the bottom.

Enter Couchbase database connection info in the panel. Note that all fields must be surrounded by double quotes (i.e., “…”):

    • The Bootstrap Nodes field is the value collected in the previous step preceded by couchbases://
    • Provide credentials previously created in the Capella console for access.
    • Set the Bucket name to world.
    • Leave the default Document Type as JSON.
    • Leave the field to use as ID as “ID”.

talend-couchbase-connector16

Click on the Run panel to build the job and execute it. You can see progress in the Job window.

talend-couchbase-connector3

Once complete, the job will have added 4079 documents to the world bucket:

tcouchbaseoutput in Talend

You can access your Capella console directly or through the web control plane to verify they have loaded into the world bucket.

Viewing documents in Couchbase

The Documents view shows what is now loaded:

JSON document list in Couchbase web console

Talend component setup for Couchbase input

Next, we will use Couchbase as an input and run a N1QL query. This will allow you to interact with the Couchbase Scopes and Collections.

Navigate to Job Designs > Standard and right-click on New Job – enter the name CouchbaseInput and click Finish.

Couchbase input job in Talend

From the palette, drag and drop the tCouchbaseInput widget to the left. Find the widget under Databases NoSQL/Couchbase.

talend-couchbase-connector9

From the palette, drag and drop the tLogRow widget (found under Logs and Errors) to the right.

talend-couchbase-connector18

Double-click on tCouchbaseInput in the panel; this will display the Component panel at the bottom. Enter the Couchbase database connection info. From the Query Type dropdown, select N1QL and enter the query as follows:

Below is an example screenshot of how it should look:

tcouchbaseinput component in Talend

Left-click on tLogRow on the panel; this will display the Component panel at the bottom.

Change the field separator from “|” to “” and select Print Header.

talend-couchbase-connector7

Connect the two components.

talend-couchbase-connector1

Run the job. If successful, it should return the value United Kingdom.

talend-couchbase-connector23

Continue Talend & Couchbase learning

Here are some further resources to help you dig deeper:

Author

Posted by David Schexnayder, Principal Partner Solution Architect‌

Leave a reply