Couchbase offers an unique NoSQL database combining integrated cache and storage technology. NoSQL is gaining strong popularity and a number of customers are looking to export or map some of the data they have in RDBMS to Couchbase NoSQL; this is where Talend’s offering can be leveraged.

Talend offers a number of capabilities around integration from business process management to enterprise service bus to master data management. Talend’s new Open Studio for Big Data version 5.3 release contains a Couchbase connector, which enables a large number of scenarios where Couchbase Server can be used alongside with traditional operational and analytical data sources.

To provide you with a basic idea on how to configure and use Talend with Couchbase Server, I will walk you through a simple scenario to create a ETL workflow using Talend Open Studio software to export data from MySQL database server into a Couchbase database server using JSON document format. The workflow will read from a MySQL database table and create corresponding JSON documents in Couchbase Server. You can build upon to example and augment it to support more complex scenarios involving data transformations, complex mappings, etc. The following steps have been tested on CentOS but can be adapted for other systems.

So let’s get started …

Prerequisites

First, let’s look at the pre-requisites you need to get installed:

Start

Now, let’s make sure that everything is started

  • Start MySQL, if not started already:
    • sudo service mysqld start
  • Start Couchbase, if not started already:
    • sudo /etc/init.d/couchbase-server start
  • Launch Talend Open Studio for Big Data:
    • Execute following command or similar (based on installation directory and version): /opt/TOS_BD-r101800-V5.3.0/TOS_BD-linux-gtk-x86_64

Create

Now, let’s use Talend Open Studio to create the ETL workflow

  • After launching Talend, click on Create button and enter a project name such as “MySQLToCouchbaseIntegration” and click on Finish.
  • Select the created project and click on Open button

  • Create account in Talend or click on Skip button, this will open the IDE
  • Wait for IDE initialization to complete and close or minimize Welcome window
  • In Repository panel, left click on Job Designs and then right click on Create job
  • Enter job name, such as “MySQLToCouchbaseIntegrationJob” and click on Finish.

  • This will create a panel where job artifacts can be dragged and dropped and should display the palette to the right.
  • From the palette, Drag and Drop tMyslInput widget (can be found under Databases/MySQL) to the left

  • From the palette, Drag and Drop tCouchbaseOutput widget (can be found under Databases/Couchbase) to the right

  • Both components require jar files which will are not shipped with Talend software. Talend has a wiki explaining how to add jar files. At times, jar files can be directly downloaded from IDE or might need be downloaded manually and loaded into Talend by clicking on jar symbol (the one with “Add External JARs tooltip”) on Modules panel.

  • Jars must be loaded / resolved before processing to next steps. See detailed info
  • Double click on tMysqlInput on panel; this will display Component panel at bottom
  • Enter MySQL database connection info in panel. Note that all fields must be surrounded by double quotes (ie “)
  • Click on … button which is located to the right of Table Name, expand world and select City entry and Click on OK button
  • Remove default Query and replace by “SELECT * FROM City”
  • Below is an example screenshot of what it should look like:

  • Click on Guess schema button, this will bring up the Schema window, move and resize window as appropriate
  • Edit schema with the following changes:
    • Check Nullable for ID
    • Check Nullable for Population
    • Change ID type to String
    • Input DB types with following and click on OK button
      • INT for ID
      • VARCHAR for Name
      • VARCHAR for CountryCode
      • VARCHAR for District
      • INT for Population
  • Below is an example screenshot of what it should look like:

  • Left click on tMysqlInput and then right click to select Row / Main and drag arrow from tMysqlInput component to tCouchbaseOutput component. An arrow should be created between the two.

  • Left click on tCouchbaseOutput on panel; this will display Component panel at bottom
  • Enter Couchbase database connection info in panel. Note that all fields must be surrounded by double quotes (ie “)
  • In this example, the bucket name is default and it does not have authentication set up so username and password are left with “”
  • URI should be input by clicking on + button and changing field to pool URI such as http://localhost:8091/pools
  • Key should be left to ID (default) and Action on data to Insert
  • Below is an example screenshot of what it should look like:

  • Click on Configure JSON Tree … button; this will bring up mapper allowing mapping from table schema to JSON.
  • Select All schema attributes (using multi selection) on the left schema list and drop onto rootTag on the Link target window to the right and select Create as sub-element of target node, then click on OK button.
  • Output should look like this:

  • Select ID attribute in the Link target window on the right and right click on Set As Loop Element, then click on OK button

  • If everything is fine, both icons on the Designer panel should show no sign of errors, in which case we can now run the job
  • Click on Run panel, this will build job and execute it
  • Access your Couchbase server bucket and verify that the documents have been loaded: http://localhost:8091/index.html#sec=buckets
  • This should have added 4079 documents to the default bucket:

  • You can drill down and look at individual documents:

  • Voila!

As indicated, much more complex scenarios can be built and we will follow up soon with more elaborate examples.

Please send comments and feedback at alexis@couchbase.com

Posted by Alexis Roos

18 Comments

  1. great post ! thanks ! I was wondering how to feed couchbase with some external data. that\’s a first part of a good answer. Maybe usefull for a one shot migration. But what about a continuous integration process ?

    1. Hi,

      It will really depend of the type of the data source and the features available in the source itself (or the tool at the top of it).
      For example in the Oracle \”world\” you can use stuff like DCN (Database Change Notification), Golden Gate to listen to the database change and push data back into Couchbase. (I need some time to write some code around this)

      I do not know if ETL tools can use this but from your code you can.

      Regards
      Tug

      1. Wilbert Klinton July 22, 2015 at 1:30 pm

        It’s really a great pleasure to provide an opinion about ETL
        tools. These are very important and useful to all the people from all over the
        world.

        ETL

    2. For continuous export, you could keep track of latest inserted ID and filter the import based on that.
      Indeed Talend offers a component for MySQL to read latest inserted ID: tMysqlLastInsertId
      You can find more info in Talend Open Studio Components Reference Guide.

      Regards,

      Alexis

    3. Talend provides a change data capture (CDC) component with its enterprise subscription. For most databases it is trigger rather than log driven. The CDC component itself can be scheduled to run on whatever polling interval is desired. It would then post the delta-records to the Couchbase data source. This works for many use cases, but it is IMO more \”business-time\” rather than real-time. While not a nightly batch run, it should not be considered a \”real-time\” solution for low latency designs.

      Ed Ost

  2. Cool Tool – Can\’t wait to use it for my next Mysql to couchdbase Migration.

  3. our company http://www.sqiar.com/ provide services like SQL support which can help to our customer…SQIAR provides full expertise in database support to our clients. All of our SQL Server DBA support staff are Microsoft Certified Database Administrators and have extensive SQL Server experience.

  4. online training, courses, talend online training, carbone, olivier, cv, talend, Engineer, Research, Developement, Training, elearning, soft, training , open source, ntic, collaboration, lms, elearning, team, talend Training, Talend course
    http://www.21cssindia.com/cour

  5. Talend Online Training
    CallUs-91-900-044-4287 21st Century Software Solutions Online Training
    Introduction
    Introduction to Talend
    Why Talend?
    Talend Editions and Features
    Talend Data Integration Overview

  6. Talend Online Training
    Call Us-91-900-044-4287 21st Century Software Solutions Online Training
    Introduction
    Introduction to Talend
    Why Talend?
    Talend Editions and Features
    Talend Data Integration Overview
    Talend Environment
    Talend Environment – Overview
    Repository and Pallate
    Talend Design and Views

  7. Thank you Alexis for the instructions.
    My problem is, that Couchbase ignores the schema definition of \’INT for Population\’.
    The document looks like this, after running the ETL-Job:
    {
    \”ID\”: \”city:1000\”,
    \”Type\”: \”city\”,
    \”Name\”: \”Taman\”,
    \”CountryCode\”: \”IDN\”,
    \”District\”: \”East Java\”,
    \”Population\”: \”107000\”
    }

    I guess it should look like this:
    Population without quotation marks. Reason: to use the _sum reduce function.

    {
    \”ID\”: \”city:1000\”,
    \”Type\”: \”city\”,
    \”Name\”: \”Taman\”,
    \”CountryCode\”: \”IDN\”,
    \”District\”: \”East Java\”,
    \”Population\”: 107000
    }

  8. Thanks for Sharing such a Wonderful Information….

    Learn ETL TESTING through Online for Details Please go through the Link…

    Online ETL TESTING Training in INDIA | USA | UK | AUSTRALIA | CANADA | SINGAPORE | EUROPE

  9. Thanks for Sharing such a Wonderful Information….

    Learn ETL TESTING through Online for Details Please go through the Link…

    Online ETL TESTING Training in INDIA | USA | UK | AUSTRALIA | CANADA | SINGAPORE | JAPAN

  10. Victoria Spencerr March 4, 2015 at 8:57 am

    To give a complete feeling of the functionality of the
    workflows benchmark for their most significant attribute, which constitute

    — Identification of information that is proper in the
    source side

    — Extraction of the information from storage that is big

    — Transportation of the information to the information
    Staging Area

    — Cleanse on the idea of company and record rules

    As a result of difficulty and the importance for factors
    that are proper, an ETL Testing Online Training tool creates a multi-million dollar
    marketplace. There\’s a superfluity of business ETL tools accessible. ETL
    alternatives are offered by the original database vendors as well as the
    DBMS\’s: Each ETL tool uses its method that is individual for execution and the
    layout assembling the outstanding job will be produced by the task of assessing
    ETL tools extremely hard but once accurately assessed.

    ETL Testing Online Training tools can be found in the market
    for over a decade, just in the past couple of years have professionals and
    researchers began to understand the importance the data integration procedure
    has in a data warehouse project\’s achievement. There have now been numerous
    attempts towards (a) modeling jobs as well as the automation of the design
    procedure, (b) operations that are distinct (c) some first results as a whole
    towards the optimization.

    The ETL Testing Online Training designer constructs a
    workflow of tasks, typically in the shape to identify the order of
    transformation and cleansing procedures before being overloaded to the data
    warehouse that need to be placed on the source data. ETL Tools scenarios on the
    basis of the above mentioned evaluation, which can serve as a probationary test
    bed for the evaluation of tools or ETL Testing Online Training layout
    alternatives.

    The ETL designer constructs a workflow of tasks, typically
    in the shape to identify the order of transformation and cleansing procedures
    before being overloaded to the data warehouse that need to be placed on the
    source data. ETL Tools scenarios on the basis of the above mentioned
    evaluation, which can serve as a probationary test bed for the evaluation of
    tools or ETL layout alternatives.

    ETL TESTING Online Training

    Contact us:

    India: +91-9949566322

    USA: +1-347-606-2716

    Email: miller@leadonlinetraining.com

  11. Victoria Spencerr March 6, 2015 at 7:37 am

    ETL TESTING Online Training is the one of the best online
    course exhaustive overall which is given by LEAD ONLINE TRAINING.LEAD ONLINE
    TRAINING are a standout amongst the most trusted ETL TESTING Online Training
    establishments among the world.

    ETL Testing Online Training devices have created to a period
    where they speak to the most time and expense productive technique for execute
    frameworks for the collection of data and moving it. They are getting to be
    unpredictable systems change to concentrate and transfer information into a
    basic framework.

    ETL TESTING Online Training is the one of the best online
    course careful overall which is given by LEAD ONLINE TRAINING.LEAD ONLINE
    TRAINING are a standout amongst the most trusted ETL TESTING Online Training
    organizations among the world.

    I have recently completed the first take off for an ETL
    Testing Online Training execution exchange and to total movement information
    consistently in the 7 different nation based systems crosswise over Asia. Every
    nation runs a focal MS SQL database supporting organization and enrollment
    administration more than 100 wellbeing clubs.
    ETL TESTING Online Training

    The justification and explanation behind the undertaking was
    to make deals and club usage data accessible significantly prior contrasted with
    the standard consummation of month bunch reporting. Having information open
    consistently empowers the constant, quick following. There exists a standard
    administration framework over all states together with the database using
    Microsoft SQL.

    1. Wilbert Klinton August 10, 2015 at 7:38 pm

      It’s really a great pleasure to provide an opinion about ETL
      tools. These are very important and useful to all the people from all over the
      world. ETL tools are useful to everyone which help to transform any data into
      any database fast and easy and comfortably.
      ETL

  12. Q1: How to set static value when add linker to target node?
    Q2: How to mapping Integer value to Couchbase Integer value(Not String)?

Leave a reply