Scopes and Collections are a new feature introduced in Couchbase 7 that allow you to logically organize data within Couchbase.  You should take advantage of scopes and collections for simpler mapping from RDBMS as well as consolidation of hundreds of microservices and/or tenants in a single Couchbase cluster (resulting in much lower operational cost). To learn more about Scopes and Collections read the following introductory blog.

In this blog, I will go over how you can plan your migration from an older Couchbase version to using Scopes and Collections in Couchbase 7.0.

High Level Migration Steps

The following are the high level steps. Not all steps are essential – it will depend on your use case and requirements. In subsequent sections we will go through details of each of these steps.

    1. Upgrade to 7.0
    2. Plan your collections strategy : Determine what buckets, scopes, collections and indexes you will have. Determine the mapping from old bucket(s) to new bucket(s)/scope(s)/collections. Write scripts to create scopes, collections and indexes.
    3. Migrate your application code: This is your Couchbase SDK code including N1QL queries.
    4. Data migration: Determine if offline strategy works or online migration is necessary. Accordingly follow the steps for offline or online migration.
    5. Plan and implement your Security strategy: Determine what Users and Role assignments you will have. Create Scripts to manage these assignments.
    6. Go live with new collections aware application
    7. Setup XDCR and setup Backup 

Upgrade to Couchbase 7

  • Every Bucket in 7.0+ will have a _default scope with a _default collection in it 
  • Upgrading to 7.0 will move all data in the bucket to the _default collection of the bucket
  • There will be no impact to existing applications. E.g. an SDK 2.7 reference to Bucket B will automatically resolve to B._default._default
  • If you do not wish to use named scopes and collections, you can stop right here. But if you would like to use this new feature, read on.

Plan your Collections Strategy

There are a couple of common migration scenarios that we have come across. Please feel free to comment here on the article or on our forum if your migration scenario is completely different.

Consolidation: from multiple buckets to collections in a single bucket

This is a common scenario when you are trying to lower your costs (aka TCO) by consolidating multiple buckets into a single bucket. A cluster can only have up to 30 buckets, whereas you can have 1000 collections per cluster, allowing for much higher density. This will be a common scenario for microservice consolidation.

The diagram above shows all target collections belonging to the same scope. But you could have a variation of it where the target collections are in different scopes.

Splitting: from single bucket to multiple collections in a bucket

Another common scenario is to split out data from a single bucket into multiple collections in a bucket. Different types of data may previously have been qualified with a “type = xxx” field or with a key prefix “xxx_key”. Now these can each live in their own collection giving you advantages of logical isolation, security isolation, replication and access control.

This scenario may be a little more complex than the previous scenario especially if you want to get rid of the key prefix or type field. For a simpler migration, you may want to leave the key prefixes and type data fields as is, even though they may be somewhat redundant with collections.

Creation of scopes, collections, and indexes

Once you have planned what scopes, collections and indexes you want to have, you will need to create scripts for creation of these entities.You can use the SDK of your choice to do so, the couchbase-cli, you can use the REST APIs directly, or you could even use N1QL scripts to do so.

Given below is an example of using the CLI (couchbase-cli and cbq) to create a scope, collection and an index.

Note that the index creation statement does not require you to qualify the data with a “type = xxx” or key-prefix qualification clause anymore.

Migrate your application code

In order to use named scopes and collections, your application code (including N1QL queries) will need to be migrated. 

If you were using type fields or key prefixes previously (as in the splitting scenario), you will not need them anymore.

SDK Code Sample

In your SDK code you have to connect to a cluster, open a bucket and obtain a reference to a collection object to store and retrieve documents. Prior to collections, all key-value operations were performed directly on the bucket.

Note: If you have migrated to SDK 3.0, you have already done some of the work of starting to use collections (though up until 7.0, you could only use the default collection).

The following is a simple Java SDK code snippet for storing and retrieving a document to a collection:

N1QL Queries

Now if you want to run a N1QL query on the collection in the above Java example you can do the following:

Notice that you can query directly on a scope.  The above query on the scope object automatically maps to “select * from bucket-name.scope-name.collection-name”.

Another way to provide path context to N1QL is to set it on QueryOptions. E.g.

A scope may have multiple collections and you can join those directly by referencing the collection name within the scope. If you need to query across scopes (or across buckets), then it is better to use the cluster object to query.

Note that the queries will no longer need to qualify with “type = xxx” field (or key_prefix qualifier) if they were doing that earlier.

Old N1QL query:

Now becomes:

Data Migration to Collections

You will need to migrate existing data to your new named scopes and collections. The first thing you have to determine is whether you can afford to do an offline migration (where your application is offline for a few hours), or if you need to do a mostly online migration with minimal application downtime.

Offline could be faster overall, and require fewer extra resources in terms of extra disk space or nodes.

Offline migration

If you choose to do offline migration, you can use N1QL or Backup/Restore

Using N1QL

Prerequisite: cluster has spare disk space and query service is in use

This migration would look something like the following:

  1. Create new scopes, collections, indexes
  2. Take old application offline 
  3. For each named collection:
    • Insert-Select from _default collection to named collection (using appropriate filters)
    • Delete data from _default collection that was migrated in above step (to save space, or if space is not an issue this can be done at the end)
  4. Verify your migrated data
  5. Drop old buckets
  6. Online new application

Using Backup/Restore

Prerequisite: you need disk space to store backup files

  1. Create new scopes, collections, indexes
  2. Take application offline
  3. Take backup (cbbackupmgr) of 7.0 cluster 
  4. Restore using explicit mapping to named collections: use –filter-keys and –map-data (see examples below)
  5. Online new application
Example 1: No filtering during restore

This example moves the entire _default collection to a named collection (this is the likely case for scenario 1 of consolidation).

Example 2: Restore with filtering

This example moves portions of _default collection to different named collections (this is the likely case for scenario 2 of splitting).

Online Migration Using XDCR

In order to do a mostly online migration, you will need to use XDCR.

Depending on your spare capacity in the existing cluster, you can do self-XDCR (where the source and destination bucket are on the same cluster), or you can set up a separate cluster to replicate to.

  1. Setup XDCR from source cluster to target cluster (can do self-XDCR if you have spare disk space and compute resources on the original cluster).
  2. Create new buckets, scopes, collections
  3. Set up replications either directly from a bucket to a bucket.scope.collection or using Migration Mode (details shown below) if a single bucket’s default collection has to be split to multiple collections.
  4. Explicit mapping rules are specifiable for each destination to specify subset of the data
  5. Once replication destinations are caught up, offline old application 
  6. Online new application directing it to the new cluster (or new bucket if using self-XDCR)
  7. Delete old cluster (or old bucket if using self-XDCR).

Using XDCR to migrate from multiple buckets to a single bucket

This is the consolidation scenario.

The XDCR set up will look something like the following:

  • For each source bucket, set up a replication to the named collection in the destination bucket and scope

The following screenshot shows the XDCR set up for 1 source bucket:

Using XDCR to split from a single bucket to multiple collections

This is the splitting scenario. In order to map the source _default collection to multiple target collections, you should use the Migration Mode provided by XDCR.

The XDCR screens below show Migration Mode being used:

There are 4 filters set up:

Travel-sample._default._default is the source. A new bucket called ‘Travel’ is the target.

  • filter type=”airport”, replicate to Inventory:Airport
  • filter type=”airline”, replicate to Inventory:Airline
  • filter type=”hotel”, replicate to Inventory:Hotel
  • filter type=”route”, replicate to Inventory:Route

Plan and Implement your security strategy

Now that you have all your data in named scopes and collections, you have finer control over what data you can assign privileges to. Previously you could do so only at bucket level.

The following roles are available at Scope and Collection level (consult the documentation on RBAC for more details):

Admin Roles:

  • Scope Admin role will be available at scope level. A scope admin can administer collections in their scope. 

Data Reader Roles:

  • Data Reader
  • Data Writer
  • Data DCP Reader
  • Data Monitoring

Query Roles:

  • FTS Searcher
  • Query Select
  • Query Update
  • Query Insert 
  • Query Delete
  • Query Manage Index
  • Query Manage Functions
  • Query Execute Functions

Next Steps

I hope this migration guide is helpful to you in migrating to Couchbase 7 Scopes and Collections. Below is a list of resources for you to get started and we look forward to your feedback on Couchbase Forums

Documentation 

What’s new

Release notes

Get the Beta

Download

Blogs

Introducing Couchbase 7 Beta: Mapping RDBMS to NoSQL

Scopes and Collections for Modern Multi-Tenant Applications: Couchbase 7.0

Introducing RBAC Security for Collections

Author

Posted by Shivani Gupta

Shivani Gupta is Director of Product Management at Couchbase for the Core Server. Shivani has over 20 years of varied experience in Big Data, Distributed Systems, and Databases at different companies including Oracle, Microsoft, VMWare, Hortonworks and now Couchbase.

5 Comments

  1. @Shivani, Is it possible to configure the scope and collections to be sync’d in the sync gateway? I would like to restrict the documents sync’d only of a particular scope/collection?

  2. Sync Gateway support will come a later. So with 7.0 you can also receive documents in the default collection using Sync Gateway.

    1. Sorry typo above. I meant with 7.0 you can only receive documents in the default collection using Sync Gateway.

      1. I planned to categorize a multi-tenant application using scope and collections. I would have liked sync gateway to allow syncing based on collections or scopes. When can we expect sync gateway support?
        Couple of more questions
        1. Can the same document be part of default collection and another custom collection?
        2. Can multiple scope refer to default collection?

  3. The timeframe for Sync Gateway support is TBD.
    1) The ‘same’ document in two different collections (default and custom, or two different custom) is essentially two different documents. If you are asking whether the same document key can be used in two different collections, then the answer is Yes.

    2) I don’t understand this question. The default collection only exists in the default scope. No other scope has a default collection. As a user you cannot create a default collection (it is created by Couchbase only).

Leave a reply