This article highlights new features and certain enhancements in the web console for Couchbase’s Full-Text Search service to configure and search indexes as Couchbase server moves to supporting collections with the 7.0 release.

Full-Text Search (FTS) refers to techniques for searching text content within a document or a collection of documents that hold textual content. Couchbase supports indexing and searching over text from various languages and text analyzers (customizable) to interpret text in various ways. Here is an article that showcases various components of a text analyzer: tokenizers, filters etc. and how to use them.

To accommodate all these moving parts in defining a Full-text index that would cater to your search needs, designing an amicable interface for creating and searching an index is quite the challenge.

With the introduction of the collections paradigm within a couchbase bucket – we intend for the search service to allow users to define indexes that can subscribe to multiple collections.

Before getting into all of that, let’s take a look at how our current editor looks (a screenshot from Couchbase server 6.6).

With what I have set within the index definition above – I intend to index all the content in the JSON documents from the couchbase bucket beer-sample that have a field “type” whose value is “beer“. Note the “Index Definition Preview” on the right side of the screen which carries all the settings for the index “beers“. This preview adapts immediately to any changes made to any of the settings. 

As you can see – the number of settings within the index definition is already quite large. With 7.0, we intend on adding more variations to support collections – and with this the index definition becomes a bit more daunting than it already is – especially to a new user.

So, to get a new user started with Couchbase’s Search service and for those who do not wish to delve into more advanced settings while setting up their Full-Text indexes, we’ll be offering a brand new editor (alongside the current editor) to define indexes – the quick editor.

This article will introduce to you the capability of this quick editor (debuting in 7.0), and extensions we’ve made to the current editor to accommodate collections.

But first, what are collections

A couchbase bucket (the document-based partitioned distributed database) is the central core of the couchbase server. With 7.0, the user will be able to categorize documents by configuring their buckets to form an organizational hierarchy. Each category will be held within a sub-bucket (that is also partitioned) and we’ll be referring to this as a collection. The bucket will now be managed via a 3-layer hierarchy with collections. You should be able to find other articles on this forum that go into depth on collections.

Here’s a sample bucket and it’s categorization with collections .. for a restaurant menu (I tried) ..

For this bucket hierarchy, a Full-text index can be defined to subscribe to and index data of ..

  • contents of all the vegetarian tacos within the food scope of the menu
  • the meats used in all the burgers within the food scope of the menu
  • the types of all the cocktails and coffees within the drink scope of the menu
  • … you get the idea.

A restriction

A Full-Text index can be defined to subscribe to several collections all right, but all those collections would need to belong to one scope. An index definition cannot transcend a scope.

Let’s talk index definitions

Before I share with you all the screenshots I’ve collected, let’s quickly go over the updates we’ve made to the definition of a Full-Text index to accommodate collections. Note that we’ll continue to support older index definitions from current or older versions of Couchbase server.

Let’s pick on an example that I used earlier – the beers index on the couchbase bucket: beer-sample. Stripping off the default settings and holding on to only the relevant settings, here’s a minimal index definition for it ..

In 7.0 terms, all that content in the bucket beers-sample resides within the _default collection of the _default scope. Meaning – when you upgrade your server to 7.0, your data in your bucket will move into the _default collection within the _default scope.

The above index definition will continue to work with 7.0. Here’s an alternative though – a 7.0 index definition that will do the exact same thing as the one above ..

Note the 3 differences between the two definitions above:

  • sourceType has changed from couchbase to gocbcore. We’ve changed the underlying SDK that the Full-Text index uses to communicate with a couchbase bucket to a newer, better supported one.
  • params.doc_config.mode has changed from just “type_field” to “scope.collection.type_field” indicating that the type mapping names will now follow that format.
  • the type mapping name has now become “_default._default.beer” indicating that it will index documents of “type”: “beer” from within the _default collection in the _default scope of the bucket beer-sample.

Collections will allow for users to model their data better.

Modeling your data into a single collection (mimicking pre 7.0 behavior) will mean that all the data within the bucket will be shipped and the index get’s to filter out documents based on the definition.

With support for collections, you will now be able to model your data into categories – each of which resides in a separate collection. I’ll highlight one obvious advantage with this approach .. with an example.

The beer-sample bucket holds documents of type beer and brewery – all residing within the _default collection of the _default scope. Let’s change this model –

  • Set up a scope content within beer-sample
  • Within the scope, set up 2 collections beers, breweries
  • Load data of “type”:”beer” into beers and data of “type”:”brewery” into breweries

Now, here’s an index definition to hold the same data as the earlier ones ..

This time around – the bucket will only ship documents of “type”:”beer”. So with the latest index definition, your search nodes would ..

  • consume lesser network bandwidth
  • observe faster index build times

** Here is a separate blog for more details on various nuances of Full-Text index definitions with Couchbase bucket collections.

Introducing the quick editor

Here’s a peek ..

On selecting a bucket, scope and collection within the “Keyspace” section above, a sample document will appear within the “Select Fields” section – that belongs to the bucket.scope.collection selected. A refresh button in the right top corner of the “Select Fields” section will allow the user to iterate through documents (at random) within the collection.

Now, the user will be able to select a field from the document (mouse-click on field name/value). The selected field will show up for configuration within the “Configure Fields” section. The type of the field is detected automatically (for now – only text, number and boolean are recognized). If the field were datetime (string in ISO-8601 format) or a geopoint (an object, an array or a geohash) – the user will want to explicitly select the type from within the type dropdown.

When the configured field is “Add“ed, it would show up in the “Mapped Fields” section. A mapped field can be edited anytime by either selecting it again from the “Select Fields” section or from within the “Mapped Fields” section.

The “Create Index” button at the bottom of the page will let you create the index.

While configuring a field here are the available settings ..

  • Type .. This is the type of the field value. Supported types are: text, number, boolean, geopoint, datetime.
  • Only if the chosen field type were “text”, will a checkbox saying “Index this field as an identifier” show up. If selected, this will enforce the keyword analyzer for the text.
  • If the field type were “text” and the field isn’t indexed as an identifier, the “Language” dropdown is available, where the analyzer can be chosen for the text field.
  • The next 4 checkboxes essentially translate to a single option or a combination of options (as in the current editor) supported for a field ..

You can find more documentation on these here.

  • The last section is for setting “Searchable As” that takes a text input which will serve as the alias for the field. This setting is optional and defaults to the name of the selected field. During search, the field to look within is the entry in this section.

An index set up from within the quick editor can be edited anytime using the quick editor or the current editor.

Limited options within the quick editor

  • The quick editor holds limited options to configure index definitions when compared to the current editor.
  • You will not be able to index a field that isn’t available in the sample document loaded in the “Select Fields” section.
  • Custom analyzers will not be supported with the quick editor.
  • Geopoint and Datetime fields are not recognized automatically – you will however be able to explicitly set the type of the field upon selection.
  • You will not* be able to edit an index created using the current editor with the quick editor. However, you will be allowed to edit an index created with the quick editor using the current editor.
  • While you can set up fields from within multiple collections, you will not be able to index the same field multiple times within a single collection.
  • Index Replicas, Index Type, Index Partitions cannot be set within the quick editor. They will assume default values when an index is created within the quick editor. However for an index created this way, you will be able to change these settings using the current editor and continue to edit the index definition using the quick editor subsequently for as long as the “params.mapping” and “params.doc_config” sections of the index definition aren’t altered within the current editor*.
  • Filtering documents within a scope.collection (to just index documents of a certain type) will not be supported within the quick editor.

* This behavior may be subject to change in the future as we extend support within the quick editor.

We’ve made some changes to the current editor as well

The first thing you’ll notice that’s different is a new checkbox that appears under the “Index Name” and “Bucket” entries, asking you if you’d like to set up the index to subscribe to a non-default scope or non-default collection(s) ..

Enabling the checkbox will prefix “scope.collection” to the “params.doc_config.mode” within the index definition implying that the index can subscribe to one or more collections from within a scope. This “scope.collection” prefix will work in combination with the existing settings: “type_field”, “docid_prefix” or “docid_regexp” for filtering documents to index.

Upon checking this setting, you’ll first see a dropdown appear underneath it to select a scope from the available scopes for the bucket chosen.

Now within the type mappings, you’ll be asked to select a collection from a dropdown (Note that the “default” type mapping would need to be checked off in case of a non-default scope selection, because an index definition cannot transcend a scope) ..

Once you make a collection selection, you will be allowed to optionally append a type name to the “<scope>.<collection>” based on the “Type Identifier” selected for filtering out documents to index from within the collection. This type mapping can be edited at any time later as well. Sub mappings and child fields can be added within the type mapping like before.

The rest of the functionality within the editor remains identical to current.

Searching a collection-aware Full-Text Index

The UI for searching within a Full-Text index will remain the same as before for now. The text box will only support a query string. There are other types of queries that the Full-Text index supports, all documented here: https://docs.couchbase.com/server/current/fts/fts-query-types.html

The search request going to the endpoint directly will now take a new argument (optional) to only fetch results from a single or a set of collections that the Full-Text index subscribes to. 

Here’s a sample search request ..

On the couchbase web console however, you will for now not be able to set the “collections” argument for a search request, and the request will span across the indexed content of all collections that the index subscribes to.

Let’s consider a sample index definition set up using the quick editor – this index subscribes to collections “beer” and “brewery” within the scope “content” of bucket “default”. Within these 2 collections, the name fields are indexed with the following options set ..

  • Include in search result(s)
  • Support Highlighting

Here’s the relevant content from the index definition ..

In case the index definition subscribes to more than one collection (like in the example above), for a search – the collection that the document (hit) belongs to will appear as a “stored” field with the key _$c.

Here’s a sample search results snippet for the above index definition ..

Future

We’ve put out a developer preview of Couchbase Server 7.0 on November 17, 2020 for you to check out all these features and enhancements.

We’re continuing to better the user interface for creating, editing and searching a Full-Text Index as we go.

Cheers.

Posted by Abhinav Dangeti, Software Engineering, Couchbase inc.

Work on Couchbase's Distributed Full Text Search

Leave a reply