Migrating Data from MongoDB to Couchbase

This article will guide you through a one-time migration of data from MongoDB to Couchbase. You will learn how to import data exported from MongoDB and do some basic transformation on those documents.

All code from this blog is available in the following Git repository: mongodb-to-couchbase

Prerequisites

This article uses the Sample Mflix Dataset that has been loaded to a MongoDB cluster. I am using Mongo Atlas but the information in this article applies to non-Atlas installs of MongoDB also. If you need to load the sample dataset to MongoDB refer to the instructions here.

MongoDB Compass is used to export the dataset and this article assumes it is already configured to connect to the MongoDB cluster where the Sample Mflix Dataset resides.

You will also need a Couchbase Server Enterprise Edition (EE) 6.x cluster with the Data, Index, Query, & Eventing services enabled (NOTE: Index & Query will be used in a future article). I am using a single node local install of Couchbase Server EE but the information in this article applies to any Couchbase Server EE cluster.

If you do not have an existing Couchbase Server EE cluster, the following links will get you up and running quickly:

  1. Download Couchbase Server EE
  2. Install Couchbase Server EE
  3. Provision a single-node cluster (NOTE: use the default values for cluster configuration)

JSON, BSON, and Extended JSON

MongoDB and Couchbase Server are both document databases and both store JSON documents. However, MongoDB represents JSON documents in binary-encoded format called BSON. JSON can only represent a subset of the types supported by BSON. To preserve type information, MongoDB uses Extended JSON which includes extensions to the JSON format. Refer to the MongoDB Extended JSON specification for full details on the different Extended JSON types and conventions.

Here are some examples of how MongoDB represents different types of information:

  • ObjectId: “_id”:{“$oid”:”573a1390f29313caabcd4135″}
  • Integer: “runtime”:{“$numberInt”:”1″}
  • Date: “released”:{“$date”:{“$numberLong”:”-2418768000000″}}
  • Double: “rating”:{“$numberDouble”:”6.2″}

While Couchbase can store this information, it is easier to work with documents that do not use the Extended JSON format. Using the above examples, the values would look like this:

  • ObjectId: “_id”:”573a1390f29313caabcd4135″
  • Integer: “runtime”:1
  • Date: “released”:-2418768000000
  • Double: “rating”:6.2

Export Data from MongoDB

MongoDB Compass will be used to export the movies and comments collections from the sample_mflix database. In Compass, expand the sample_mflix database item and then select comments.

Choose the Collection -> Export Collection menu item. Select Export Full Collection, JSON output file type, specify an output file, and click EXPORT.

Do the same for the movies collection.

Import Data to Couchbase

Next, import the MongoDB collection data into Couchbase Server. As mentioned above, the exported data is in Extended JSON Format. The Couchbase Eventing service will be used to do some minor transformations on the data in real time as the documents are imported into Couchbase.

At a high level, the flow is as follows:

  1. Use the cbimport utility to import the JSON documents into the incoming bucket.
  2. An Eventing function will transform the documents as they are written to the incoming bucket.
  3. If the transformation is successful, the transformed document with be written to the sample_mflix bucket.
  4. If there are any errors, the original document will be written to the error bucket.  An error attribute in each document will contain the error message.

Create Buckets

Create the buckets mentioned above. Refer to the documentation on creating a bucket for full details on the different settings and considerations for setting the values.

The incoming bucket will temporarily store the documents as they are imported into Couchbase. This will be an ephemeral bucket since we don’t require any persistent storage for these documents. The Eventing function will transform them and write them to either the sample_mflix or error bucket.

The documents do not need to remain in the bucket after they are transformed so it will be configured with a Time To Live (TTL) of 900 seconds (15 minutes). The documents will be automatically deleted by Couchbase Server when the TTL expires.

To create the incoming bucket, click on Buckets and then ADD BUCKET.

Configure the incoming bucket as follows and click Add Bucket.

  1. Name: incoming
  2. Memory Quota: 256 MB. NOTE: Since ephemeral buckets do not persist to disk you must ensure there is enough memory allocated to the bucket to accommodate the entire data set being imported. The total size of the comments and movies collections used in this example is about 50 MB so 256 MB is more than enough to accommodate this data set.
  3. Bucket Type: Ephemeral
  4. Bucket Max Time-To-Live: 900 seconds. NOTE: The TTL value is the maximum time a document can exist following its creation. The documents are transformed in real time as they are written to Couchbase so this value can be set relatively low. A value of 15 minutes (900 seconds) is used in the case. If the value is set too low, the document could expire before it is processed.

The sample_mflix bucket will be used to store the transformed MongoDB export. This will be a Couchbase bucket since we require persistent storage for these documents. Configure it as follows:

  1. Name: sample_mflix
  2. Memory Quota: 256 MB. NOTE: Since Couchbase buckets persist all documents to disk the memory quota will determine how many documents can be stored in the integrated caching layer at any time. The total size of the comments and movies collections used in this example is about 50 MB so 256 MB is more than enough to accommodate this data set.
  3. Bucket Type: Couchbase

The error bucket will be used to store and documents that could not be transformed. Configure it as follows:

  1. Name: error
  2. Memory Quota: 256 MB
  3. Bucket Type: Couchbase

Data Transformation with Eventing

The Couchbase Eventing service will be used to transform the data in real time as it is imported into Couchbase. There are a few things to configure to use this feature.

First, create a metadata bucket that will be used by Eventing to store system data. The metadata bucket will be used to store this information. Configure it as follows:

  1. Name: metadata
  2. Memory Quota: 256 MB
  3. Bucket Type: Couchbase

In the Buckets section you will see the 4 buckets you created: error, incoming, metadata, & sample_mflix:

Click on Eventing and then click ADD FUNCTION to configure the function that will be used to transform the data in real time as it is being imported into Couchbase.

Configure the function as follows:

  1. Source Bucket: incoming (This is the bucket that temporarily stores the data as it is imported. The function will monitor this bucket for changes.)
  2. Metadata Bucket: metadata (This is the bucket used to store system data.)
  3. Function Name: transform
  4. Description: Transform MongoDB export
  5. Bindings (Click the +icon to add a second binding)
    • type: Alias
    • name: sample_mflix (actual name of the bucket in the cluster)
    • value: target (alias used in function to refer to bucket)
    • type: Alias
    • name: error (actual name of the bucket in the cluster)
    • value: error (alias used in function to refer to bucket)

Click Next: Add Code to add the JavaScript code for the transform function.

On the transform function screen, replace the boilerplate code with the code below.

The function includes log() statements to log the original document, transformed document, and any errors. These can be changed as necessary. The Eventing log file can be found in the @eventing application log. See this link for the location of this log file.

You can easily extend the capability of this function to perform other transformations by adding the necessary code in the transformValues() function. Note that if you make any changes to the function after it is deployed, you must undeploy it, edit the JavaScript, and then deploy it again.

Click Save to save the code and go back to the Eventing section of the console.

The new transform function is listed but it needs to be deployed. Click on the transform function item and then click Deploy.

Confirm function deployment with the default setting by clicking Deploy Function.

After the function deploys, the status will be deployed.

Import Documents with cbimport

Use the cbimport utility to import the files exported from MongoDB. Before importing any data it is important to understand the command syntax and what it is doing.

Here is an example cbimport command:

To import the MongoDB comments data, execute the command below. Note that the location of the cbimport utility varies based on the OS and is documented here: CLI Reference.

The command will connect to the specified cluster (i.e. -c couchbase://127.0.0.1) using the supplied Administrator credentials (i.e. -u Administrator -p password).

JSON data will be imported from comments.json and each line in the file represents a separate document (-f lines).

The documents will be written to the incoming bucket (-b incoming) using a key generated using the specified format (-g comment:#MONO_INCR#). In this example the format specifies that each document key will start with “comment:”. The MONO_INCR function increments by 1 each time it is called so the resulting keys will be comment:1, comment:2, etc.

Upon completion you will see the following output:

Go to the Buckets section and confirm that the sample_mflix bucket contains 50,304 documents.

 

To import the MongoDB movies data, execute the command below. Note that the location of the cbimport utility varies based on the OS and is documented here: CLI Reference.

Upon completion you will see the following output:

Go to the Buckets section and confirm that the sample_mflix bucket contains 73,843 documents.

Review some of the transformed documents. Go to the Documents section, select the sample_mflix bucket, and click on id comment:5a9427648b0beebeb69579cc (the first document in the list):

Note the contents:

Comparing it to the exported data (the first line in comments.json) you will see some differences:

The transform function has changed the Extended JSON _id, movie_id, & date values. A type attribute was also added based on the document key prefix: comment.

In the Document ID field enter movie:573a1390f29313caabcd4135, click Retrieve Docs, and click on ID movie:573a1390f29313caabcd4135.

Note the contents:

Comparing it to the exported data (first line in movies.json) you will see these differences:

The transform function has changed the highlighted Extended JSON values.

What’s Next

After you have finished migrating the data from MongoDB you can undeploy the transform function and remove the extra buckets (incoming and error).

A future article will cover how to update your existing client code to use the Couchbase SDK.

Take advantage of our free, online training available at https://learn.couchbase.com to learn more about Couchbase.

For detailed information on the architectural advantages of the Couchbase Data Platform over MongoDB see this document: Couchbase vs. MongoDB for Scale-Out and High Availability.

Learn why other enterprises choose Couchbase over MongoDB:

Posted by Douglas Bonser, Principal Solution Engineer, Couchbase

Douglas Bonser is a Senior Solutions Engineer at Couchbase and has been working in IT and technology since 1991. He is based in the Dallas/Ft. Worth area.

Leave a reply