Couchbase, a document database, allows great flexibility in storing different types of documents in a single bucket (bucket being the equivalent of a database). There is a frequent need to refer to documents of a similar type together e.g. an apparel retailer may want to separate out all clothes from all shoes. They can do this today with Couchbase by using key prefixes or type fields, but it does make the application more cumbersome. Having a containerization of similar items at the database layer would not only make the application simpler but also allow for efficiencies in data processing at its lowest levels. Further, having additional levels of containment under buckets allow for access control at a finer granularity than buckets. This opens the door for having a more scalable multi-tenant platform with Couchbase than using buckets would allow. It is with these goals that we developed the feature referred to as ‘Collections’.
Couchbase Server 6.5 makes available a Developer Preview of Collections.
In this blog I will describe at a high level what collections are, what use cases they enable, and the functionality provided by collections. For sample code on how to use collections, read the blog post by Johan.
Note: A Developer Preview feature cannot be used in production. Read detailed guidelines regarding Developer Preview here : Developer Preview Documentation.
What are Collections?
Collections are logical data containers inside a Couchbase bucket that let you group similar data just like a ‘Table’ does in a relational database.
While the overall feature is referred to as Collections, there is actually another level available for data organization called ‘Scope’ similar to a ‘Schema’ in a relational database. The namespace within each scope is independent of others, hence you can have the same collection names in different scopes. Similarly, document keys need to be unique only within a collection and hence documents with the same key can exist in different collections.
With the introduction of collections, role-based access control can be applied at the cluster, bucket, scope and collection level.
Note: The Developer Preview does not have scope and collection level RBAC but it will be available with the production version of Collections.
For seamless upgrade, and for backwards compatibility, every bucket has a ‘_default’ scope and the ‘_default’ scope has a ‘_default’ collection. The _default collection provides backward compatibility as a direct reference to the bucket will automatically map to the _default collection. Also, on upgrade all existing data will automatically go to the _default collection.
While the _default collection is provided as a backward compatibility mechanism, it is recommended that new applications should be written using named collections.
Simplified Data Organization with Collections
As mentioned earlier, collections enable better data organization by keeping similar documents together in a collection as one would do with a ‘Table’ in a relational database. Using collections to organize data has many benefits including:
- Easier mapping of relational schemas to Couchbase by creating a collection for a corresponding relational table.
- Ability to refer to similar documents as a unit for various purposes such as building an index, setting up replication, querying, backup/restore etc.
- More scalable indexing as the data service has to only send the documents for the collection rather than the indexer receiving documents for the whole bucket and filtering them.
- Easier to write N1QL as N1QL statements will be able to access collections as tables directly instead of having to dynamically construct them using an attribute for the type of the document.
For example, without collections you would write:
SELECT * FROM products WHERE type = ‘clothes’;
With collections you can now write:
SELECT * FROM products.clothes;
Running Multi-tenant Applications with Collections
Multi-tenant applications require varying levels of isolation between tenants and varying levels of resource sharing of the underlying infrastructure.
Within Couchbase today: complete physical, security and logical isolation is achieved by deploying separate clusters but provides the least sharing of resources; security and logical isolation is achieved with multiple buckets per cluster but has its own limits in terms of overhead-per-bucket; and multiple tenants placed in a single bucket provides the best sharing of resources but requires the application to handle any security or logical isolation.
With the introduction of collections (and grouping them into scopes), Couchbase can provide security and logical isolation at more granular levels within a bucket. You can have thousands of collections in a single bucket hence enabling you to host thousands of tenants in a single cluster. In contrast, the number of buckets that can be hosted in a single cluster is limited (note this limit has increased to 30 in Couchbase Server 6.5 with appropriate sizing), and often not enough for the needs of multi-tenant applications.
Consolidating Microservices with Collections
Modern applications are often written as a suite of microservices and a single application can be comprised of 100s of microservices. While using a bucket or even cluster per microservice is still an option, collections (and scopes) provide a more scalable alternative to consolidate more microservices into a single Couchbase cluster.
Multi-tenancy and microservice based architecture are not mutually exclusive. Many multi-tenant applications are written using a microservices architecture. With buckets, scopes, and collections, now you have many levels of containment available to you and this gives you flexibility how you want to map tenants, microservices and tables.
Functionality availability in the Developer Preview
Once you have turned on the Developer Preview switch in a Couchbase 6.5 cluster (Developer Preview Documentation), you can start using collections and scopes. Some of the key features of the DP functionality are listed below (note that this list is not exhaustive but represents the highlights):
- Both Ephemeral and Couchbase buckets support scopes and collections.
- All Couchbase SDKs support DDL and CRUD operations for collections and scopes.
- You can create scopes and collections, and, drop scopes and collections – from the SDK, REST API or couchbase-cli.
- You can perform all CRUD operations on a collection (including subdoc).
- The item count of each collection is available with cbstats.
- DCP protocol is enhanced to stream a single scope or a single collection (in addition to the existing ability to stream a single bucket).
Note: DP is primarily for Key-Value access. RBAC will be available later. The integration of collections with XDCR, Indexing and N1QL, Eventing, Analytics and Mobile will preview later.
Here are some resources for you to start using the Developer Preview of Collections. We look forward to your feedback.