The one constant challenge for many application developers with relational databases is the query performance. It is by far the most elusive feature, because the means to resolve the query performance are often limited to what the relational databases can offer, which is either getting a larger database server and/or better indexes.
With Couchbase, N1QL query performance also relies on the similar components. But unlike relational RDBMS, Couchbase architecture of services isolation means both the Query and Index services can be scaled out independently. With appropriate sizing and capacity planning, the Couchbase data platform can deliver a blistering fast performance as reported in this Altoros NoSQL Benchmark report.

But while customers can achieve milliseconds response time for queries with appropriate indexes, there are times when the query predicates, the basis for the Couchbase GSI indexes, are not always known ahead of times. The ideal solution is to have an indexing system that could work with any combination of the available query predicates. Couchbase Adaptive Index can address many of these use cases. Couchbase Full Text Search is also another candidate that can address the irregular pattern use cases. i.e. offering text and fuzzy search capability on any field in the document.

With Couchbase v6.5, Full Text Search is now integrated into the Couchbase N1QL query construct. Customers can now leverage FTS index directly with N1QL, providing application developers a single API to combine both N1QL exact predicate matching and FTS powerful search capability.

Let’s consider the activity management document below:

  1. An activity always belong to a customer (account)
  2. An activity can also have multiple contacts from the customer’s organization and are represented by an array of contacts
  3. An activity can have multiples participants, represented by an array users.
  4. An activity can be of type appointment or a task, both of which have their specific corresponding attributes, such as title, start date, due date, etc..
  5. Activity of type Task has an array of ToDo list

The use case

John, a service representative  for a call center at Acme Ltd needs to retrieve all the customer activities while he is on the phone with a customer. The customer may provide one or many of the values below for John to query the application:

  1. Activity title:  The query should return all activities that have this text, anywhere in the activity title.
  2. Customer name: The entered customer name may be incomplete, thus the query needs to use wildcard to match with the customer name.
  3. Contact name, email or phone contact point: The customer may also provide a contact person details. These may also be incomplete.
  4. A participant name: The customer may also provide the name of the account manager, an employee of Acme whom the customer has been interacted with and had been part of the activity.
  5. Activity date: Customer may provide a range of date and time for the activities.
  6. The service rep may receive one of more of the above information. The pattern is not fixed.
  7. The query response time needs to be ~1 sec
  8. Data volume is 3millions per year, and a retention period of 3 years.

What are the challenges to retrieve this information

  1. There could be up to eight fields that the customer can provide, and none of them are mandatory. This would pose a challenge for an efficient GSI Index design because covering index leading key need to be present for the index selection. As the result, GSI indexes cannot cover all cases.
  2. Wildcard matching: The provided activity title, customer and contact name, email, or phone can be incomplete so an exact N1QL predicate matching technique will not work.
  3. Both contacts and participants are child objects for activities. In the JSON data model, contacts and participants are represented as two separate arrays. If we need a coverage index, it needs to include one or more elements from both arrays.

The solutions

1. The simplest approach is to use N1QL predicates:

The following GSI indexes would also be required:

Note that the above query may use one or all of the available indexes to improve query performance. However there could still be performance issues because of the need for the query plan to use IntersectScan operation.

2. Leverage FTS Index

Couchbase Full Text Search could help with this use case, because of its non-exact search capability as well as the ability search the fields in any order. Here is an FTS index that can cover the search criteria.

2.1 Using CURL – This is supported in Couchbase 5.5

2.2  With N1QL/FTS integration using SEARCH_QUERY

2.3 With N1QL/FTS integration using N1QL SEARCH predicate

Notes:

  1. The above example leverages FTS compound query with the conjunct construct to combine all predicates into a single SEARCH(). Refer to Couchbase FTS documentation for more detail on FTS Query type
  2. The above statement should be programmatically constructed to include only the required search predicates.
  3. The FTS index design must include the fields that are used in the SEARCH()
  4. The N1QL predicate a.type=’activity’ must be present in the query for the FTS index selection

N1QL SEARCH_QUERY and SEARCH predicate  is part of the N1QL/FTS Integration feature available in Couchbase v6.5 and I will update the blog with the documentation when it becomes available.

For more detail on the FTS query syntax https://docs.couchbase.com/server/6.0/fts/full-text-intro.html

Summary:

  1. N1QL/FTS integration allows query to use FTS search construct directly as search predicates
  2. The use of FTS index in N1QL query alleviates the need to have an exact index for each query pattern
  3. N1QL/FTS provides an additional option for developers to explore when dealing with query performance issues
  4. FTS index is well suited where you need to search on multiple fields in any order
  5. FTS index is well suited for cases where you need to search for fields in multiple arrays

Resources

We would love to hear from you on how you liked the 6.5 features and how it’ll benefit your business going forward. Please share your feedback via the comments or in the forum.

Posted by Binh Le

Binh Le is a Principal Product Manager for Couchbase Query service. Prior to Couchbase, he worked at Oracle and led the product management team for Sales Could Analytics and CRM OnDemand. Binh holds a Bachelor's Degree in Computer Science from the University of Brighton, UK.

Leave a reply