Extend Couchbase Analytics with RapidMiner using CData
This article will guide you through the steps needed to setup the connection from RapidMiner to Couchbase Analytics using the CData JDBC driver for Couchbase. More details regarding this driver can be found here.
You will first need a Couchbase Server Enterprise Edition (EE) 6.x cluster with the Data and Analytic services enabled. I am using a single node local install of Couchbase Server EE but the information in this article applies to any Couchbase Server EE cluster.
If you do not have an existing Couchbase Server EE cluster, the following links will get you up and running quickly:
- Download Couchbase Server EE
- Install Couchbase Server EE
- Provision a single-node cluster (NOTE: use the default values for cluster configuration)
CData JDBC driver for Couchbase
Next you will need to download and install the CData JDBC driver for Couchbase.
Once downloaded and unpackaged you will want to setup the license:
Command Line Activation
The setup process should automatically install a license for your system. However, you may also install a license from the command line via cdata.jdbc.couchbase.jar. To do so execute the following command: java -jar cdata.jdbc.couchbase.jar -license. This process will create a cdata.jdbc.couchbase.lic that must reside next to the jar or in the .cdata directory under the user’s home directory.
Trial License Installation
The setup process should automatically install a trial license for your system. You may also use the method described in the “Command Line Activation” section above to install a trial license. Simply enter “TRIAL” as the product key when prompted.
Note** The cdata.jdbc.couchbase.lic must reside next to the jar or in the .cdata directory under the user’s home directory. i.e. “/Users/justinsimpson/.CData/cdata.jdbc.couchbase.lic”
In Couchbase click on Settings
Then Sample Buckets
The select the beer-sample checkbox and select Load Sample Data. You can then navigate back to your Buckets and see beer-sample.
Once this is complete, we will need to setup Analytics.
Select Analytics, then create the shadow dataset of beers from the bucket of beer-sample.
CREATE DATASET beers ON `beer-sample` WHERE `type` = "beer";
Click Execute, this will crate the shadow dataset definition.
I want to repeat this step by creating a second shadow dataset with the following definition.
CREATE DATASET breweries ON `beer-sample` WHERE `type` = "brewery";
Next you will want to initialize it by activating the dataset with the following.
CONNECT LINK Local;
You can now test this out within the Analytics dashboard by running something like the following.
SELECT COUNT(*) FROM beers
SELECT COUNT(*) FROM breweries;
More about Couchbase Analytics can be found here.
Your setup for Couchbase is complete!
To accomplish the simple task of using RapidMiner as an extension of Couchbase Analytics, there are 2 basic steps.
- Setup a connection
- Create a process that has an 2 operators to ‘Read Database’. You might also want to store those results locally to combine it and use some other operators and process within RapidMiner.
Setup a Connection
Within RapidMiner, I start from a Blank Process. Under connections I select Create Connection and give it a conneciton name. In this example I use ‘CBLocal’.
On the Setup tab, I make sure the Database system is set to “Custom (configure in Driver tab) and I select Configure URL Manually.
I populate the URL with the following:
All of the connection string options and details can be found under the CData JDBC connection string options.
Next, select the Driver tab to finish the setup.
In order to setup the JDBC driver Jar file, click the folder icon to browse to the location of the cdata.jdbc.couchase.jar. Once this is selected, you can choose ‘cdata.jdbc.couchbase.CouchbaseDriver’ in the dropdown list.
You can now click Test connection to verify your setup is complete.
Now that RapidMiner has a new connection configured, its time to load some data!
Start from a blank process.
- Drag and drop the operator ‘Read Database’ (its important to connect the output (out) to the results (res) in the Process window)
- Select the connection you just created
- Select Build SQL Query and enter the query you would like to pass to Couchbase Analytics1SELECT brewery_id,name,style,abv FROM beers;
- Click the ‘Play’ Button to get the results!
My result set looks like this…
If you wanted to store those results and create multiple dataset to utilize other RapidMiner tools you would simply add an additional operator by dragging the ‘Store’ operator and setting up the location where you would like to store the data.
Note** You need to make sure that the connection from the output (out) from the ‘Read Database’ operator to the input (inp) of Store operator is set properly.
I then repeated this process for the other shadow dataset we created ‘breweries’ as you can see above under the data section.
More about Rapid Miner Studio can be found here.
Download Couchbase, setup Analytics, and start using RapidMiner with your data and see what insights you can gleam. Extend Analytics with other tools using the many Couchbase CData drivers that are at your fingertips.