Collections provide the ability to namespace data within a Couchbase bucket. Instead of all documents having to reside in a single shared namespace, collections provide users a built-in capability to group those documents together rather than having to add manual attributes like “type” to a document.
If you’re unfamiliar with Couchbase Collections, please feel free to read another well-written blog on collections before continuing.
Full-Text Search’s collections support is primarily driven by three design goals.
Let’s explore how the Search service lets one perform the indexing and searching of the collection’s data.
Indexing Collection’s Data
Search service continues to let existing users operate and define new indexes in the same conventional way on documents residing in the bucket. All the existing documents in the bucket will naturally fall into the
_default scope and
_default collection. And the existing indexes continue to index newer mutations and the queries work as usual.
Once the collections are adopted, users might have already namespaced their existing multi schema documents into various collections.
Search service supports index creations on a single source collection as well as on multiple source collections as long as all the collections belong to a single scope.
Essentially, the search indexes can span across multiple collections but not across multiple scopes.
Let’s delve into this with the help of an example,
Consider a CRM use case where the customer details are captured in a
Customers bucket and Order details into an
Let’s assume the user name scoped various customers into different scopes based on the geographical regions. For example, mapping all the customers from the APAC region to a specific scope named
apac and so on.
Single Collection Index
With collections, indexing and searching data from a single source collection would be the most common and natural use case. It works almost similar to the existing bucket based index creation. Except that the user has to specify the scope and collection details while creating the index definition.
If the user is indexing the default scope and collection, then the index creation steps look exactly similar to that of the pre-collection days.
If the user wants to index a non-default scope and collection(s), then one has to tick the checkbox for “Use non-default scope/collection(s)”. Once they do this, the index creation adapts itself to let users enter the source scope and collection details.
Once the users enable the non-default scope/collection(s) checkbox, then they should be able to choose the source scope for the documents. You may note that the scope dropdown now lists all the available scopes from the chosen bucket (CRM).
The users can then select a scope in which the source collection belongs from a drop-down list as shown above.
Specifying Type Mappings
Once the scope is selected, the user is all set for specifying the type of documents to index. And the convention is to specify this over the Type mappings and we continue the same type mapping definition pattern here too.
Upon adding a new Type mapping, the user is given an option to specify the source collection as shown below.
The user should be able to see all the available collections in the aforementioned scope (emea) as a drop-down list like below.
Indexing all documents types under a given collection
Just by selecting a collection name from the drop-down list for the type mapping name, the user can index every type of document under that collection.
Indexing multiple documents types under a collection
If the collection hosts multiple document types, then the user can specify any number of the interested type mapping names following with the collection name like below.
The above example would index document types like
inventoryOrders from the collection
customer1 under the scope
Since the bucket data is sliced up to higher granularity from the name scoping of collections, there is a greater probability of having smaller cardinality of documents within each of the collections. So many times, users may not need the default partition settings of 6 per index to power a smaller data set.
Appropriate partition count for a given amount of data would help to support,
- Better utilization of resources on a node.
- A higher number of indexes on any given node.
- Better search performance.
Hence it’s recommended to explore the possibility of overriding the default partition count to a lower value during the cluster sizing.
Role-Based Access Control for search indexes can now be controlled at a Bucket, Scope, or Collection(s) level. And the user with at least
search reader permissions at the source collection level will be able to access the index.
An interesting read about the latest RBAC updates for Collections is here.
Multi Collection Index
Multi-collection indexes help the users to index and search across multiple collections within a single scope from a single index. Few multi-collection favorable use cases would be,
- Users have sliced the data across many collections where each collection or namespace could be either a customer account or the brand of a product etc. (homogeneous data across collections)
- Users have a lot of relatively small-sized collections in their data set due to the logical partitioning of the data. (heterogeneous data across collections)
In all such cases, users may have to create numerous indexes to enable the search on data across numerous collections. But it is both a cumbersome and demanding mandate for the users to create and maintain a large number of indexes.
Multi-collection indexes are supposed to alleviate the overheads by just letting the user create an umbrella index covering many collections. These collections could be containing homogeneous or heterogeneous data.
Specifying Type Mappings
In the below example, we are defining type mappings for indexing heterogeneous data types from different collections like
customer3. It could also be similar data types from various collections like
If any of the source collections gets deleted in a multi-collection index, then the index would get deleted too. Hence the multi-collection indexes are best suited for collections with similar lifespans.
Multi-collection index access mandates the user to have the
search reader permissions for all the source collections in the index.
Searching Collection’s Data
Single Collection Index – Users could search and retrieve the data from a single collection index in the same way as that of with a bucket based index.
Multi-Collection Index – users could search the multi-collection indexes using the same old search requests. Since the index now contains data from multiple source collections, it would be useful for the users to know the source collection of their relevant hits.
With multi-collection indexes, each hit in the search result would contain information about the collection to which it belongs to. This source collection detail is available in the
Fields section of each hit under the key
Users can also scope their search requests to only specific collection(s) within the multi-collection index. This helps them to narrow down and speed up their searches on a large index.
A sample collection scoped search request example for collections
customer3 is as below.
"match_phrase": "exceeding budget"
Search service would enable the Collection(s) support only on a fully upgraded 7.0 cluster. In a mixed version cluster, the collection’s support won’t be enabled.
Happy searching with collections!
Interested to know more, please check the below links.
Get the beta? – download.