With Couchbase v6.5, Full-Text Search is now integrated into the Couchbase N1QL query construct. Customers can now leverage FTS indexes directly with N1QL. This provides developers a single API to combine N1QL exact predicate matching and FTS powerful searching.

The one constant challenge for many application developers with relational databases is query performance. Resolving query performance issues are often limited to what relational databases offer — getting a larger database server or better indexes.

With Couchbase, N1QL query performance also relies on similar components. But unlike relational RDBMS, Couchbase architecture of services isolation means both the Query and Index services can be scaled out independently. With appropriate sizing and capacity planning, Couchbase can deliver blistering fast performance as shown in an Altoros NoSQL Benchmark report.

Beyond Query Predicates – N1QL & Search

Customers can achieve milliseconds response time for queries with appropriate indexes. However, there are times when the query predicates used by Couchbase GSI indexes are not known ahead of time. The ideal solution is to have an indexing system that could work with any combination of the available query predicates.

Couchbase Adaptive Indexing can address many of these use cases. Couchbase Full-Text Search is another approach to irregular pattern use cases. They offer text and fuzzy search capabilities on any field in the document.

Let’s consider the activity management document below. An activity:

  1. Always belong to a customer (account)
  2. Can also have multiple contacts from the customer’s organization and are represented by an array of contacts
  3. May include multiple participants, represented by an array of users.
  4. May be of type appointment or a task, both of which have their specific corresponding attributes, such as title, start date, due date, etc.
  5. An activity of type Task has an array of ToDo list

sample json document for examples sample documents for examples

The use case

John, a service representative for a call center at Acme Ltd needs to retrieve all the customer activities while he is on the phone with a customer. The customer may provide one or many of the values below for John to query the application:

  1. Activity title:  The query should return all activities that have this text, anywhere in the activity title.
  2. Customer name: The entered customer name may be incomplete, thus the query needs to use a wildcard to match with the customer name.
  3. Contact name, email, or phone contact point: The customer may also provide contact person details. These may also be incomplete.
  4. A participant name: The customer may also provide the name of the account manager, an employee of Acme whom the customer has been interacted with and had been part of the activity.
  5. Activity date: Customer may provide a range of dates and times for the activities.
  6. The service rep may receive one or more of the above information. The pattern is not fixed.
  7. The query response time needs to be ~1 sec
  8. Data volume is 3millions per year and a retention period of 3 years.

What are the challenges to retrieve this information?

  1. There could be up to eight fields that the customer can provide, and none of them are mandatory. This would pose a challenge for an efficient GSI Index design because covering index leading key needs to be present for the index selection. As the result, GSI indexes cannot cover all cases.
  2. Wildcard matching: The provided activity title, customer and contact name, email, or phone can be incomplete so an exact N1QL predicate matching technique will not work.
  3. Both contacts and participants are child objects for activities. In the JSON data model, contacts and participants are represented as two separate arrays. If we need a coverage index, it needs to include one or more elements from both arrays.

The solutions

1. The simplest approach is to use N1QL predicates:

The following GSI indexes would also be required:

Note that the above query may use one or all of the available indexes to improve query performance. However, there could still be performance issues because of the need for the query plan to use IntersectScan operation.

2. Leverage FTS Index

Couchbase Full-Text Search could help with this use case, because of its non-exact search capability as well as the ability to search the fields in any order. Here is an FTS index that can cover the search criteria.

couchbase dialog for setting up a full text search index

2.1 Using CURL – This is supported in Couchbase 5.5

2.2  With N1QL/FTS integration using SEARCH_QUERY

2.3 With N1QL/FTS integration using N1QL SEARCH predicate

Notes:

  1. The above example leverages the FTS compound query with the conjunct construct to combine all predicates into a single SEARCH(). Refer to Couchbase FTS documentation for more detail on FTS Query type
  2. The above statement should be programmatically constructed to include only the required search predicates.
  3. The FTS index design must include the fields that are used in the SEARCH()
  4. The N1QL predicate a.type=’activity’ must be present in the query for the FTS index selection

N1QL SEARCH_QUERY and SEARCH predicate is part of the N1QL/FTS Integration feature available in Couchbase v6.5 and I will update the blog with the documentation when it becomes available.

For more detail on the FTS query syntax https://docs.couchbase.com/server/6.0/fts/full-text-intro.html

N1QL & Search Summary:

  1. N1QL/FTS integration allows a query to use FTS search construct directly as search predicates
  2. The use of the FTS index in the N1QL query alleviates the need to have an exact index for each query pattern
  3. N1QL/FTS provides an additional option for developers to explore when dealing with query performance issues
  4. FTS index is well suited where you need to search on multiple fields in any order
  5. FTS index is well suited for cases where you need to search for fields in multiple arrays

Resources

We would love to hear from you on how you liked the 6.5 features and how they’ll benefit your business going forward. Please share your feedback via the comments or in the forum.

Author

Posted by Binh Le

Binh Le is a Principal Product Manager for Couchbase Query service. Prior to Couchbase, he worked at Oracle and led the product management team for Sales Clould Analytics and CRM OnDemand. Binh holds a Bachelor's Degree in Computer Science from the University of Brighton, UK.

Leave a reply