Migrating Buckets to Collections & Scopes via Eventing: Part 2

Again (as I did in Part 1) I want to point out an excellent blog written by Shivani Gupta, How to Migrate to Scopes & Collections in Couchbase 7.0, which covers in great detail other methods of migrating bucket-based documents to Scopes and Collections in Couchbase.  I encourage you to also read about the multiple non-Eventing methods that Shivani touches upon.

Whether you’re new to Couchbase or a seasoned vet, you’ve likely heard about Scopes and Collections. If you’re ready to try them, this article helps you make it happen.

Scopes and Collections are a new feature introduced in Couchbase Server 7.0 that allows you to logically organize data within Couchbase. To learn more, read this introduction to Scopes and Collections.

You should take advantage of Scopes and Collections if you want to map your legacy RDBMS to a document database or if you’re trying to consolidate hundreds of microservices and/or tenants into a single Couchbase cluster (resulting in much lower TCO).


Using Eventing for Scopes & Collections Migration

In the prior article (Part 1), I discussed the mechanics of a high performance method to migrate from an older Couchbase version to Scopes and Collections in Couchbase 7.0 based on Eventing.

Just the Data Service (or KV) and the Eventing Service is required to migrate from buckets to collections. In a well-tuned, large Couchbase cluster, you can migrate over 1 million documents a second. Yes, no N1QL, and no index needed.

In this follow up article, I will provide a simple fully automated methodology to do large migrations with dozens (or even hundreds) of data types via a simple Perl script.

Recap of the final  Eventing Function: ConvertBucketToCollections

In Part 1 we had the following settings for the Eventing Function.  Note to each unique type, “beer” and “brewery” we had to add a Bucket binding alias to the target collection in “read+write” mode.  In addition we had to create the target collections, in this case “bulk.data.beer” and “bulk.data.brewery

In Part 1 we had the following JavaScript code in our Eventing Function.  Note to each unique type, “beer” and “brewery” we had to replicate a JavaScript code block and update reference the corresponding binding alias or target collection in the Couchbase Data Service.

The technique in Part 1 works but what if I have a lot of types?

Using Eventing can indeed do migrations as shown in Part 1, but it seems like a bit of work to set things up.

If you have 80 different types, it would be an incredible amount of error-prone effort to use this technique (both creating the Eventing Function and creating the needed keyspaces). If I had 80 types in a bucket to migrate and split, I wouldn’t want to do all the work described above by hand for each type.

Automate via CustomConvertBucketToCollections.pl

To solve this problem, I wrote a tiny Perl script, CustomConvertBucketToCollections.pl, that generates two files:

  • CustomConvertBucketToCollections.json, is a complete Eventing Function which does all of the above work described in this post.
  • MakeCustomKeyspaces.sh, is a shell file to build all the needed keyspaces and import the generated Eventing function.

You can find this script in GitHub at https://github.com/jon-strabala/cb-buckets-to-collections.

Note, the script CustomConvertBucketToCollections.pl requires that both Perl (practical extraction and report language) and also jq (a lightweight and flexible command-line JSON processor) are installed on your system.

Example: Migrate 250M Records with 80 Different Types

We have 250M documents in keyspace “input._default._default” with 80 different types and want to reorganize the data by type into collections under the scope  “output.reorgby the property type. We have an AWS cluster of three r5.2xlarge instances, all running the Data Service and the Evening Service.

The input bucket “input” in this example is configured with a memory quota of 16000 MB.

Below I use the CustomConvertBucketToCollections.pl Perl script from GitHub at https://github.com/jon-strabala/cb-buckets-to-collections. As you can see it can be trivial to do migrations using an automated script.

Step 1: One-time Setup

Step 2: Create 250M test documents

Running the interactive big_data_test_load.sh command:

Input configuration parameters:

There should now be 250M test documents in the keyspace input._default._default.

Step 3: Generate Eventing Function and Keyspace script

Running the interactive CustomConvertBucketToCollections.pl command:

Input configuration parameters:

In the interactive Perl script above, four of the above default choices were altered.

Step 3: Update the MakeCustomKeyspaces.sh (as needed)

You can just “vi MakeCustomKeyspaces.sh” and alter any needed values. I choose to use the Unix sed command to increase the RAM size of the bucket “output” from 100 to 1600

Step 4: Run the MakeCustomKeyspaces.sh script

output below:

Step 5: Refresh your Couchbase UI on the Eventing Page

To find the new Eventing Function (or updated Function) in the Couchbase UI, go to the Eventing Page and refresh your web browser.

Step 6: Deploy CustomConvertBucketToCollections

In the Couchbase UI, go to the Eventing Page and deploy the Eventing Function “CustomConvertBucketToCollections“.

In about 45 minutes the reorganization should be completely done.

All the documents are indeed reorganized by type as collections. On this modest cluster, they were processed at 93K docs/sec.

Final Thoughts

If you found this article series helpful and are interested in continuing to learn about eventing – click here the Couchbase Eventing Service.

I hope you find the CustomConvertBucketToCollections.pl Perl script from GitHub at https://github.com/jon-strabala/cb-buckets-to-collections a valuable tool in your arsenal when you need to migrate a bucket with many types into a collections paradigm.

Feel free to improve the CustomConvertBucketToCollections.pl script to use an intermediate config file to the Eventing Perl tool where all the parameters could be adjusted. Then use the intermediate config file to create the Eventing Function and the setup shell script.

Example intermediate config file:

Resources

References

I would love to hear from you on how you liked the capabilities of Couchbase and the Eventing service, and how they benefit your business going forward. Please share your feedback via the comments below or in the Couchbase forums.

Author

Posted by Jon Strabala, Principal Product Manager, Couchbase

Jon Strabala is a Principal Product Manager, responsible for the Couchbase Eventing Service. Before joining Couchbase, he spent more than 20 years building software products across various domains, starting with EDA in aerospace then transitioning to building enterprise software focused on what today is coined “IoT” and “at-scale data.” Jon worked for several small software consultancies until eventually starting and managing his own firm. He has extensive experience in NoSQL/NewSQL, both in contributing and commercializing new technologies such as compressed bitmaps and column stores. Jon holds a bachelor’s degree in electrical engineering and a master's in computer engineering, both from the University of Southern California, and an MBA from the University of California at Irvine.

Leave a reply