Migrating Buckets to Collections & Scopes via Eventing: Part 1

First I want to point out an excellent blog written by Shivani Gupta, How to Migrate to Scopes & Collections in Couchbase 7.0, which covers in great detail other methods of migrating bucket-based documents to Scopes and Collections in Couchbase.  I encourage you to also read about the multiple non-Eventing methods that Shivani touches upon.

Whether you’re new to Couchbase or a seasoned vet, you’ve likely heard about Scopes and Collections. If you’re ready to try them, this article helps you make it happen.

Scopes and Collections are a new feature introduced in Couchbase Server 7.0 that allows you to logically organize data within Couchbase. To learn more, read this introduction to Scopes and Collections.

You should take advantage of Scopes and Collections if you want to map your legacy RDBMS to a document database or if you’re trying to consolidate hundreds of microservices and/or tenants into a single Couchbase cluster (resulting in much lower TCO).


Using Eventing for Scopes & Collections Migration

In this article, I’ll discuss the mechanics of another high performance method to migrate from an older Couchbase version to Scopes and Collections in Couchbase 7.0.

You only need the Data Service (or KV) and Eventing to migrate from buckets to collections. In a well-tuned, large Couchbase cluster, you can migrate over 1 million documents a second. Yes, no N1QL, and no index needed.

In the follow up post (Part 2), I will provide a simple fully automated methodology to do large migrations with dozens  (or even hundreds) of data types via a simple Perl script.

Prerequisites: Learning about Eventing

In this article, we will use the latest version of Couchbase (7.0.2), but prior 7.0 versions work fine as well.

If you are not familiar with Couchbase or the Eventing service, please walk through the following resources, including at least one Eventing example:

Eventing Function: ConvertBucketToCollections

Eventing allows you to write pure business logic. The Eventing service takes care of the entire infrastructure needed to manage and scale your function (horizontally and vertically) across multiple nodes in a performant and reliable fashion.

All Eventing functions have two entry points – OnUpdate(doc, meta) and OnDelete(meta, options). Note that we’re not worried about the latter entry point in this example.

When a document changes or mutates (insert, upsert, replace, etc.), a copy of the document and some metadata about the document is passed to a small JavaScript entry point OnUpdate(doc, meta).

Eventing Functions can be deployed with two different Deployment Feed Boundaries, either “From now” or “Everything“. The latter allows access to every current document in a Bucket in Couchbase 6.6 or a Keyspace (Bucket/Scope/Collection) in Couchbase 7.0.

The scriptlet ConvertBucketToCollections from the main Eventing docs shows how to utilize Eventing to take data from a source bucket to a destination bucket and split your data into collections.

Step 1: Load Sample Data

In the Couchbase UI, select “Settings/Sample Buckets“. Check beer-sample and click on the button “Load Sample Data“.

Step 2: Make the Needed Keyspaces

This example requires three buckets: “beer-sample” (i.e., your document store to migrate), “rr100″ (i.e., a scratchpad for Eventing that can be shared with other Eventing functions) and bulk (the bucket to create your migrated collections in). The “rr100″ and “bulk” bucket should have a minimum size of 100MB.

In the Couchbase UI, select “Buckets” and hit the “ADD BUCKET” link in the upper right.

Create two Buckets with size 100 MB, “rr100” (for the Eventing storage or scratch pad) and “bulk” (for the migration target).

In Bucket “rr100″ create scope “eventing“.

In the Scope “rr100.eventing” create the collection “metadata“.

In Bucket “bulk” create scope “data“.

In the Scope “bulk.data” create the collections “beer” and “brewery“.

At this point you should have three (3) buckets as follows:

with the following collections in the “bulk” bucket:

and the following collections in the “rr100″ bucket:

Step 3: Create the Eventing Function

In the Couchbase UI, select “Eventing” and hit the “ADD FUNCTION” link in the upper right.

The settings for the Eventing Function are as follows:

Hit the button “Save” then paste this script in the Function Editor panel:

Your code editor should look like:

Hit the button “Save and Return

What the ConvertBucketToCollections does

The OnUpdate(doc, meta) logic will process all data in the beer-sample._default._default keyspace and will perform the following on any past (historical) and any new (future) mutations.

    • First, the property of the doc.type is checked in two near identical code blocks to see if it matches either beer, or brewery. If there’s a match, continue.
    • A global constant DO_COPY (provided via the Functions settings via a Constant Binding alias) is checked to see if the item should be copied.
    • If DO_COPY is true, the document will be written to target collection or keyspace beer_col or brewery_col (defined via the Functions settings via a Bucket Binding alias) depending on the code block that matched.
    • A global constant DO_DELETE (provided via the Functions settings via a Constant Binding alias) is checked to see if the item should be removed from the source keyspace or collection (defined via the Functions settings via a Bucket Binding alias)
    • If DO_DELETE is true, the document will be removed from the collection or keyspace src_col (defined via the Functions settings via a Bucket Binding alias).

We could increase the workers from 1 to the number of vCPUs for better performance, but our dataset is trivial so we just leave the worker count as one (1). Note: The setting for workers is found in the expandable section Settings in the middle of the Function Settings dialog.

Deploying the Eventing Function

Now it’s time to deploy the Eventing function. We’ve reviewed a bit of the code and the design of the ConvertBucketToCollections migration script, and now it’s time to see everything working together.

At this point, we have a function in JavaScript so we need to add it to our Couchbase cluster and deploy it into an active state.

Hit the button “Deploy“.

The Eventing Service takes about 18 seconds to deploy your Eventing Function, at which point you should immediately see 7303 items processed. Since the dataset is static, you are finish as all items have been processed. Since the dataset is static, you are finished as all items have been processed.

 

Hit the button “Undeploy“.

Looking at the Migrated Data

Now that we are done using the Eventing Function, we can inspect the Buckets and Collections to see what happened.

In the Couchbase UI, select “Buckets

Now select “Scopes & Collections” for the bucket “bulk”, then expand the scope “data”.

In the Couchbase UI, select “Documents“, then select the Keyspace “bulk.data.beer” and you will see the migrated documents in that collection.

In the Couchbase UI, select “Documents“, then select the Keyspace “bulk.data.brewery” and you will see the migrated documents in that collection.

 

Let’s Improve the Eventing Function

Remember, Eventing can enrich data on the fly, and if we are truly splitting up a bucket (circa Couchbase 6.x) into separate collections (circa Couchbase 7.0), we no longer need the type property.  So let’s modify our Function to transform our data, too.

For example, given the document with key “abhi_brewery” in our source data in beer-sample._default._default:

Here’s the modification to our Eventing Function:

And since we add one new global constant DROP_TYPE, we also modify the settings as follows:

Final Thoughts

If you found this article helpful and are interested in continuing to learn about eventing – click here the Couchbase Eventing Service.

Now that you understand the mechanics of using Eventing to migrate your buckets to scopes and collections, please explore the follow up post (Part 2), where I provide a simple fully automated methodology to do large migrations with dozens of data types via a simple Perl script.

Resources

References

I would love to hear from you on how you liked the capabilities of Couchbase and the Eventing service, and how they benefit your business going forward. Please share your feedback via the comments below or in the Couchbase forums.

Author

Posted by Jon Strabala, Principal Product Manager, Couchbase

Jon Strabala is a Principal Product Manager, responsible for the Couchbase Eventing Service. Before joining Couchbase, he spent more than 20 years building software products across various domains, starting with EDA in aerospace then transitioning to building enterprise software focused on what today is coined “IoT” and “at-scale data.” Jon worked for several small software consultancies until eventually starting and managing his own firm. He has extensive experience in NoSQL/NewSQL, both in contributing and commercializing new technologies such as compressed bitmaps and column stores. Jon holds a bachelor’s degree in electrical engineering and a master's in computer engineering, both from the University of Southern California, and an MBA from the University of California at Irvine.

Leave a reply