Couchbase Mobile 2.0, introduces powerful Full Text Search (FTS) capabilities on your JSON Documents. This is part of the new Query interface based on N1QL, Couchbase’s declarative query language that extends SQL for JSON. If you are familiar with SQL, you will feel right at home with the semantics of the new API.

Full Text Search enables natural lanugage querying. This is the third in a series of posts that discusses the query interface in Couchbase Lite. This blog assumes you are familiar with the fundamentals, so if you haven’t done so already, be sure to review the earlier post first. If you are interested, links to blogs discussing other features of the Query interface are provided at the end of this post.

You can download the latest pre-release version of Couchbase Mobile 2.0 from here.

Background

If you were using 1.x versions of Couchbase Mobile, you are probably familiar with Map-Views for creating indexes and queries. In 2.0, you no longer have to create views and map functions! Instead, a simple interface allows you to create indexes and you can use a Query Builder interface to construct your queries. The new query interface is simpler to use and much more powerful in comparison. We will discover some of it’s features in this post.

Sample Project

While the examples discussed here use Swift for iOS, note that barring some minor differences, the same query interface is supported on the Android and Windows platforms as well.

So with some minor tweaks, you should be able to reuse the query examples in this post when working with other platforms.

Follow instructions below if you are interested in a sample Swift Project

  • Clone the iOS Swift Playground from GitHub
  • Follow the installation instructions in the corresponding README file to build and execute the playground.

Sample Data Model

We shall use the Travel Sample database located here. You can embed this pre-built database into your mobile application and start using it for your queries.

The sample data set includes several types of documents as identified by the type property in the document. We will focus on documents of type “landmark” . The JSON document model is shown below. For brevity, we have omitted some of the properties that are not relevant to this post from the model below.

** Refer to the model above for each of the query examples below. **

The Database Handle

In the queries below, we will use the Database API to open/create CouchbaseLite Database.

The Basics

Full Text Search enables natural lanugage querying. In our post on the Query Fundamentals, we discussed the like and regex expressions for pattern matching operations. FTS supercedes that capability by enabling support for stemming, relevance based ranking and locale-specific natural language querying.

Full Text Searches are case insensitive and use the match query expression. In order to perform FTS, you must create Full Text Index on appropriate properties. You can create index on one or more properties.

Stemming

Before we proceed with the examples, first a word on Stemming. Stemming is the process of reducing words to their root stem word. So for instance, “catty”, “catlike” and “cats” are reduced to the word “cat”. So searching for the term “cats” would give us results that match “cat”, “catlike” and so on.

Couchbase Lite currently supports Stemming in the following languages
* danish
* dutch
* english
* finnish
* french
* german
* hungarian
* italian
* norwegian
* portuguese
* romanian
* russian
* spanish
* swedish
* turkish

If no specific language is used, the tokenizer will still break the text into words at Unicode whitespace characters. So it should work, although less well, with any language that puts spaces between words.

Full Text Index

The name that is associated with the index during creation is important. The query examples that we will see later will refer to the appropriate index via the name

Single Property Index

The following example creates a fullTextIndex on the “content” property of a Document. Stemming is enabled by default and the locale is assumed to be the locale of the device. While not shown below, you also have the option of specifying if “accents” have to be ignored or not via the ignoreAccents option. By default, accents are not ignored.

Multiples Property Index

The following example creates a fullTextIndex on “content” and “name” properties of a Document

Index without stemming

The following example creates a fullTextIndex on the “content” property of a Document with stemming disabled. Stemming is enabled by default using the current device language settings. Setting language to nil will disable stemming.

FTS Search with Stemming

The query below fetches the id and content properties of “landmark” type documents containing the term “Mechanical” in the “content” property. We use the “ContentFTSIndex” that was created earlier.

Request

Sample Response

The response to the above query will include documents that contain the terms “mechanical”, “mechanism”, “mechanisms”, “mechanic” and so on.

FTS Search without Stemming

The query below fetches the id and content properties of “landmark” type documents containing the exact term “Mechanical” in the “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier which specified the option to disable stemming.

Request

Sample Response

The response to the above query will include documents that contain exactly the term “mechanical” in it. Note again that all searches are case insensitive.

FTS Search on Multiple Properties

The query below fetches the id , name and content properties of “landmark” type documents containing the term “Mechanical” in either the “name” or the “content” property. We use the “ContentAndNameFTSIndex” that was created earlier. This index enabled indexing on the “name” and “content” properties

Request

Sample Response

The response to the above query will include documents that contain the term “mechanical” (or variants of it derived through stemming) in either the “name” or “content” property.

FTS Search with Logical Expressions

In an earlier example, you saw that by disabling stemming, you can look for the exact search string. But what if you wanted to look for more than one search term ? The match query expression accepts logical expressions including AND and OR.

The query below fetches the id , and content properties of “landmark” type documents containing the term “Mechanical” or “Mechanism” in the “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier to disable stemming.

Request

Sample Response

The response to the above query will include documents that contain the eactly the terms “mechanical” or “mechanism” in the “content” property.

FTS Search with Wilcard Expression

You can use the “*” character in the search string to represent zero or more character matches.

The query below fetches the id , and content properties of “landmark” type documents containing the term “walt*” in the “content” property. This will match all search terms that start with “walt” followed by zero or more characters. We use the “ContentFTSIndex” that was created earlier.

NOTE: One could argue that the use of wildcard in the search term could be a naive way of implementing stemming. But then you may end up with derived forms that may not correspond to the terms derived through stemming. So it is preferrable to use stemming if that’s what you need.

Request

Sample Response

The response to the above query will include documents that contain the terms “walt”, “Walter”, “Waltham”,“Walthamstow” and so on.

FTS Search with Stop Words

Stop Words refer to common words in a language. In English, this would be terms like “the”, “is”, “and” , “which” and so on.

Example 1: Search String contains stop words

Couchbase Lite ignores stop words that appear in search string.

The query below fetches the id , and content properties of “landmark” type documents containing the term “on the history” in the “content” property. We use the “ContentFTSIndex” that was created earlier.

Couchbase Lite ignores the stop words “on” and “the”, so you would fetch documents that only include the term “history” and derived forms of the stem word

Request

Sample Response

The response to the above query will include documents that contain the terms “history” and derived forms of this word such as “historical”

Example 2: Ignoring Stop Words while Searching

By default, Couchbase Lite ignores stop words within the search content.

The query below fetches the id , and content properties of “landmark” type documents containing the terms “blue fin yellow fin” in the “content” property. We use the “ContentFTSIndex” that was created earlier.

Couchbase Lite ignores stop words during search, so you would fetch documents that include the terms “blue”, “fin” and “yellow” in that order, separated by any number of stop words.

Request

Sample Response

The response to the above query will include documents that contain the terms “blue”, “fin” and “yellow” separated by any number of stop words such as “blue fin and yellow fin”

FTS Search with Ranking

You can use the FullTextFunction.rank to specify the rank order of the search results. This is useful to rate the matches in order of best match.

The query below fetches the id , and content properties of “landmark” type documents containing the term “attract” in the “content” property. The documents are ordered in descending order according to rank which means that the document which the maximum number of matches is sorted higher than the rest.

Request

Sample Response

The response to the above query will include documents that include the term “attract” or derived versions of it. Documents with the maximum number of matches are sorted higher.

Limitations

While the FTS capabily in Couchbase Lite 2.0 is extremely powerful and would suffice for use cases typical on an embedded database, there are a few limitations

  • Match Expression can only be at the top-level or top-level AND expression. This means that the following expression is not allowed ftsExpression.match(“attract”).or(ftsExpression2.match(“museum”))
  • Custom Language Tokenizers
    The list of supported languages was specified earlier. At the time of writing this post, you cannot plug in a custom tokenizer in order to extend support to other languages
  •  Fuzzy Search Support
    We cannot specify a “fuzziness” factor on the query that may result in less relevant matches being considered
  •  Facets
    There is no support for faceted search

Bear in mind that Couchbase Lite is an embedded database. So one could argue that the FTS capabilities does not have to be as extensive as a server side database implementation. The support for these will be evaluated in future releases.

What Next

This blog post looked at how you can leverage the Full Text Search (FTS) capabilities in the new Query API in Couchbase Mobile 2.0. This is a start. Expect to see more functionality in future releases. You can download the latest release from our downloads page.

Here are a few other Couchbase Mobile Query related posts that may be of interest
– This blog post discusses the fundamentals
– This blog post discusses how to query array collections
– This blog post discusses how to do JOIN queries

If you have questions or feedback, please leave a comment below or feel free to reach out to me at Twitter @rajagp or email me priya.rajagopal@couchbase.com.  The Couchbase Forums are another good place to reach out with questions.

 

Author

Posted by Priya Rajagopal, Senior Director, Product Management

Priya Rajagopal is a Senior Director of Product Management at Couchbase responsible for developer platforms for the cloud and the edge. She has been professionally developing software for over 20 years in several technical and product leadership positions, with 10+ years focused on mobile technologies. As a TISPAN IPTV standards delegate, she was a key contributor to the IPTV standards specifications. She has 22 patents in the areas of networking and platform security.

Leave a reply