Collections provide the ability to namespace data within a Couchbase bucket. Instead of all documents having to reside in a single shared namespace, collections provide users a built-in capability to group those documents together rather than having to add manual attributes like “type” to a document. 

If you’re unfamiliar with Couchbase Collections, please feel free to read another well-written blog on collections before continuing.

Full-Text Search’s collections support is primarily driven by three design goals. 

    • Continuity.
    • Consistency.
    • Simplicity.

Let’s explore how the Search service lets one perform the indexing and searching of the collection’s data.

 

Indexing Collection’s Data

 

Search service continues to let existing users operate and define new indexes in the same conventional way on documents residing in the bucket. All the existing documents in the bucket will naturally fall into the  _default scope and _default collection. And the existing indexes continue to index newer mutations and the queries work as usual.

Once the collections are adopted, users might have already namespaced their existing multi schema documents into various collections. 

Search service supports index creations on a single source collection as well as on multiple source collections as long as all the collections belong to a single scope.

Essentially, the search indexes can span across multiple collections but not across multiple scopes

 

Let’s delve into this with the help of an example,

Consider a CRM use case where the customer details are captured in a Customers bucket and Order details into an Orders bucket.

Let’s assume the user name scoped various customers into different scopes based on the geographical regions. For example, mapping all the customers from the APAC region to a specific scope named apac and so on.

example scope and collection hierarchy

 

Single Collection Index 

 

With collections, indexing and searching data from a single source collection would be the most common and natural use case. It works almost similar to the existing bucket based index creation. Except that the user has to specify the scope and collection details while creating the index definition.

If the user is indexing the default scope and collection, then the index creation steps look exactly similar to that of the pre-collection days.

If the user wants to index a non-default scope and collection(s), then one has to tick the checkbox for “Use non-default scope/collection(s)”. Once they do this, the index creation adapts itself to let users enter the source scope and collection details. 

 

non-default scope/collection(s)

 

Once the users enable the non-default scope/collection(s) checkbox, then they should be able to choose the source scope for the documents. You may note that the scope dropdown now lists all the available scopes from the chosen bucket (CRM).

 

scope drop downs

 

The users can then select a scope in which the source collection belongs from a drop-down list as shown above. 

 

 

Specifying Type Mappings

 

Once the scope is selected, the user is all set for specifying the type of documents to index. And the convention is to specify this over the Type mappings and we continue the same type mapping definition pattern here too.

 

Upon adding a new Type mapping, the user is given an option to specify the source collection as shown below.  

 

collection type mappings

 

The user should be able to see all the available collections in the aforementioned scope (emea) as a drop-down list like below.

 

Indexing all documents types under a given collection

 

Just by selecting a collection name from the drop-down list for the type mapping name,  the user can index every type of document under that collection. 

 

collection dropdown for typemappings

 

 

Indexing multiple documents types under a collection

 

If the collection hosts multiple document types, then the user can specify any number of the interested type mapping names following with the collection name like below.

 

multiple collection type mappings

 

The above example would index document types like deptOrders and inventoryOrders from the collection customer1 under the scope emea.

 

Scaling Notes

Since the bucket data is sliced up to higher granularity from the name scoping of collections, there is a greater probability of having smaller cardinality of documents within each of the collections. So many times, users may not need the default partition settings of 6 per index to power a smaller data set. 

Appropriate partition count for a given amount of data would help to support,

  • Better utilization of resources on a node.
  • A higher number of indexes on any given node.
  • Better search performance.

Hence it’s recommended to explore the possibility of overriding the default partition count to a lower value during the cluster sizing.

 

RBAC Notes

Role-Based Access Control for search indexes can now be controlled at a Bucket, Scope, or Collection(s) level. And the user with at least search reader permissions at the source collection level will be able to access the index.

An interesting read about the latest RBAC updates for Collections is here.

 

Multi Collection Index 

 

Multi-collection indexes help the users to index and search across multiple collections within a single scope from a single index. Few multi-collection favorable use cases would be,

  1. Users have sliced the data across many collections where each collection or namespace could be either a customer account or the brand of a product etc. (homogeneous data across collections)
  2. Users have a lot of relatively small-sized collections in their data set due to the logical partitioning of the data. (heterogeneous data across collections)

 

In all such cases, users may have to create numerous indexes to enable the search on data across numerous collections. But it is both a cumbersome and demanding mandate for the users to create and maintain a large number of indexes.

 

Multi-collection indexes are supposed to alleviate the overheads by just letting the user create an umbrella index covering many collections. These collections could be containing homogeneous or heterogeneous data. 

 

Specifying Type Mappings

 

In the below example, we are defining type mappings for indexing heterogeneous data types from different collections like customer1, customer2, and customer3.  It could also be similar data types from various collections like customer1.travels, and customer3.travels.

 

multiple collection mapping

 

Lifecycle Notes

If any of the source collections gets deleted in a multi-collection index, then the index would get deleted too. Hence the multi-collection indexes are best suited for collections with similar lifespans.

RBAC Notes 

Multi-collection index access mandates the user to have the search reader permissions for all the source collections in the index. 

 

Searching Collection’s Data

 

Single Collection Index  – Users could search and retrieve the data from a single collection index in the same way as that of with a bucket based index.

 

Multi-Collection Index  – users could search the multi-collection indexes using the same old search requests. Since the index now contains data from multiple source collections, it would be useful for the users to know the source collection of their relevant hits.

With multi-collection indexes, each hit in the search result would contain information about the collection to which it belongs to. This source collection detail is available in the Fields section of each hit under the key _$c.

multi collection search results

 

Users can also scope their search requests to only specific collection(s) within the multi-collection index. This helps them to narrow down and speed up their searches on a large index.

 

A sample collection scoped search request example for collections customer1 and customer3 is as below.

 

Upgrade Notes

Search service would enable the Collection(s) support only on a fully upgraded 7.0 cluster. In a mixed version cluster, the collection’s support won’t be enabled.

 

Happy searching with collections!

 

Interested to know more, please check the below links.

Beta Release notes

Get the beta? – download.

Web console(s) for Full-Text Indexes

 

Posted by Sreekanth Sivasankaran

Sreekanth Sivasankaran is a Software Engineer, Couchbase. He is into the design and development of distributed and highly performant full text search functionality.

Leave a reply