Scopes and Collections are a new feature introduced in Couchbase 7 that allow you to logically organize data within Couchbase. You should take advantage of scopes and collections for simpler mapping from RDBMS as well as consolidation of hundreds of microservices and/or tenants in a single Couchbase cluster (resulting in much lower operational cost). To learn more about Scopes and Collections read the following introductory blog.
In this blog, I will go over how you can plan your migration from an older Couchbase version to using Scopes and Collections in Couchbase 7.0.
High Level Migration Steps
The following are the high level steps. Not all steps are essential – it will depend on your use case and requirements. In subsequent sections we will go through details of each of these steps.
- Upgrade to 7.0
- Plan your collections strategy : Determine what buckets, scopes, collections and indexes you will have. Determine the mapping from old bucket(s) to new bucket(s)/scope(s)/collections. Write scripts to create scopes, collections and indexes.
- Migrate your application code: This is your Couchbase SDK code including N1QL queries.
- Data migration: Determine if offline strategy works or online migration is necessary. Accordingly follow the steps for offline or online migration.
- Plan and implement your Security strategy: Determine what Users and Role assignments you will have. Create Scripts to manage these assignments.
- Go live with new collections aware application
- Setup XDCR and setup Backup
Upgrade to Couchbase 7
- Every Bucket in 7.0+ will have a _default scope with a _default collection in it
- Upgrading to 7.0 will move all data in the bucket to the _default collection of the bucket
- There will be no impact to existing applications. E.g. an SDK 2.7 reference to Bucket B will automatically resolve to B._default._default
- If you do not wish to use named scopes and collections, you can stop right here. But if you would like to use this new feature, read on.
Plan your Collections Strategy
There are a couple of common migration scenarios that we have come across. Please feel free to comment here on the article or on our forum if your migration scenario is completely different.
Consolidation: from multiple buckets to collections in a single bucket
This is a common scenario when you are trying to lower your costs (aka TCO) by consolidating multiple buckets into a single bucket. A cluster can only have up to 30 buckets, whereas you can have 1000 collections per cluster, allowing for much higher density. This will be a common scenario for microservice consolidation.
The diagram above shows all target collections belonging to the same scope. But you could have a variation of it where the target collections are in different scopes.
Splitting: from single bucket to multiple collections in a bucket
Another common scenario is to split out data from a single bucket into multiple collections in a bucket. Different types of data may previously have been qualified with a “type = xxx” field or with a key prefix “xxx_key”. Now these can each live in their own collection giving you advantages of logical isolation, security isolation, replication and access control.
This scenario may be a little more complex than the previous scenario especially if you want to get rid of the key prefix or type field. For a simpler migration, you may want to leave the key prefixes and type data fields as is, even though they may be somewhat redundant with collections.
Creation of scopes, collections, and indexes
Once you have planned what scopes, collections and indexes you want to have, you will need to create scripts for creation of these entities.You can use the SDK of your choice to do so, the couchbase-cli, you can use the REST APIs directly, or you could even use N1QL scripts to do so.
Given below is an example of using the CLI (couchbase-cli and cbq) to create a scope, collection and an index.
// create a scope called 'myscope' using couchbase-cli
./couchbase-cli collection-manage-clocalhost-uAdministrator-ppassword--bucket testBucket--create-scope myscope
// create a collection called mycollection in myscope
./couchbase-cli collection-manage-clocalhost-uAdministrator-ppassword--bucket testBucket--create-collection myscope.mycollection
// create an index on mycollection using cbq
./cbq--engine=localhost:8093-uAdministrator-ppassword--script="create index myidx1 on testBucket.myscope.mycollection(field1,field2);"
Note that the index creation statement does not require you to qualify the data with a “type = xxx” or key-prefix qualification clause anymore.
Migrate your application code
In order to use named scopes and collections, your application code (including N1QL queries) will need to be migrated.
If you were using type fields or key prefixes previously (as in the splitting scenario), you will not need them anymore.
SDK Code Sample
In your SDK code you have to connect to a cluster, open a bucket and obtain a reference to a collection object to store and retrieve documents. Prior to collections, all key-value operations were performed directly on the bucket.
Note: If you have migrated to SDK 3.0, you have already done some of the work of starting to use collections (though up until 7.0, you could only use the default collection).
The following is a simple Java SDK code snippet for storing and retrieving a document to a collection:
Now if you want to run a N1QL query on the collection in the above Java example you can do the following:
//run a N1QL using the context of the scope
scope.query("select * from collection-name");
Notice that you can query directly on a scope. The above query on the scope object automatically maps to “select * from bucket-name.scope-name.collection-name”.
Another way to provide path context to N1QL is to set it on QueryOptions. E.g.
cluster.query("select *from collection-name",qo);
A scope may have multiple collections and you can join those directly by referencing the collection name within the scope. If you need to query across scopes (or across buckets), then it is better to use the cluster object to query.
Note that the queries will no longer need to qualify with “type = xxx” field (or key_prefix qualifier) if they were doing that earlier.
Old N1QL query:
FROM Travel a
JOIN Travel rONa.faa=r.sourceairport
Data Migration to Collections
You will need to migrate existing data to your new named scopes and collections. The first thing you have to determine is whether you can afford to do an offline migration (where your application is offline for a few hours), or if you need to do a mostly online migration with minimal application downtime.
Offline could be faster overall, and require fewer extra resources in terms of extra disk space or nodes.
If you choose to do offline migration, you can use N1QL or Backup/Restore
Prerequisite: cluster has spare disk space and query service is in use
This migration would look something like the following:
- Create new scopes, collections, indexes
- Take old application offline
- For each named collection:
- Insert-Select from _default collection to named collection (using appropriate filters)
- Delete data from _default collection that was migrated in above step (to save space, or if space is not an issue this can be done at the end)
- Verify your migrated data
- Drop old buckets
- Online new application
Prerequisite: you need disk space to store backup files
- Create new scopes, collections, indexes
- Take application offline
- Take backup (cbbackupmgr) of 7.0 cluster
- Restore using explicit mapping to named collections: use –filter-keys and –map-data (see examples below)
- Online new application
Example 1: No filtering during restore
This example moves the entire _default collection to a named collection (this is the likely case for scenario 1 of consolidation).
// Backup the default scope of a bucket upgraded to 7.0
cbbackupmgr config-a backup-rtest-01--include-data beer-sample._default
cbbackupmgr backup-a backup-rtest-01-clocalhost-uAdministrator-ppassword
// Restore above backup to a named collection
cbbackupmgr restore-a backup-rtest-01-clocalhost-uAdministrator-ppassword --map-data beer-sample._default._default=beer-sample.beer-service.service_01
Example 2: Restore with filtering
This example moves portions of _default collection to different named collections (this is the likely case for scenario 2 of splitting).
// Backup the travel-sample bucket from a cluster upgraded to 7.0
cbbackupmgr config-a backup-rtest-02--include-data travel-sample
cbbackupmgr backup-a backup-rtest-02-clocalhost-uAdministrator-ppassword
// Restore type=’airport’ documents to a collection travel.booking.airport
cbbackupmgr restore-a backup-rtest-02-clocalhost-uAdministrator-ppassword --map-data travel-sample._default._default=travel.booking.airport--auto-create-buckets--filter-values'"type":"airport"'
// Restore key_prefix =’airport’ documents to a collection travel.booking.airport
cbbackupmgr restore-a backup-rtest-02-clocalhost-uAdministrator-ppassword --map-data travel-sample._default._default=travel.booking.airport--auto-create-buckets--filter-keys airport_*
Online Migration Using XDCR
In order to do a mostly online migration, you will need to use XDCR.
Depending on your spare capacity in the existing cluster, you can do self-XDCR (where the source and destination bucket are on the same cluster), or you can set up a separate cluster to replicate to.
- Setup XDCR from source cluster to target cluster (can do self-XDCR if you have spare disk space and compute resources on the original cluster).
- Create new buckets, scopes, collections
- Set up replications either directly from a bucket to a bucket.scope.collection or using Migration Mode (details shown below) if a single bucket’s default collection has to be split to multiple collections.
- Explicit mapping rules are specifiable for each destination to specify subset of the data
- Once replication destinations are caught up, offline old application
- Online new application directing it to the new cluster (or new bucket if using self-XDCR)
- Delete old cluster (or old bucket if using self-XDCR).
Using XDCR to migrate from multiple buckets to a single bucket
This is the consolidation scenario.
The XDCR set up will look something like the following:
- For each source bucket, set up a replication to the named collection in the destination bucket and scope
The following screenshot shows the XDCR set up for 1 source bucket:
Using XDCR to split from a single bucket to multiple collections
This is the splitting scenario. In order to map the source _default collection to multiple target collections, you should use the Migration Mode provided by XDCR.
The XDCR screens below show Migration Mode being used:
There are 4 filters set up:
Travel-sample._default._default is the source. A new bucket called ‘Travel’ is the target.
- filter type=”airport”, replicate to Inventory:Airport
- filter type=”airline”, replicate to Inventory:Airline
- filter type=”hotel”, replicate to Inventory:Hotel
- filter type=”route”, replicate to Inventory:Route
Plan and Implement your security strategy
Now that you have all your data in named scopes and collections, you have finer control over what data you can assign privileges to. Previously you could do so only at bucket level.
The following roles are available at Scope and Collection level (consult the documentation on RBAC for more details):
- Scope Admin role will be available at scope level. A scope admin can administer collections in their scope.
Data Reader Roles:
- Data Reader
- Data Writer
- Data DCP Reader
- Data Monitoring
- FTS Searcher
- Query Select
- Query Update
- Query Insert
- Query Delete
- Query Manage Index
- Query Manage Functions
- Query Execute Functions
I hope this migration guide is helpful to you in migrating to Couchbase 7 Scopes and Collections. Below is a list of resources for you to get started and we look forward to your feedback on Couchbase Forums.
Get the Beta