Overview

Index advisor is introduced in Couchbase server 6.5 as a developer preview feature. It targets at providing secondary index recommendation to help DBAs and developers optimize N1QL query performance. This version is rule-based, in which the index candidates will be generated following the design rules specified in https://www.couchbase.com/blog/create-right-index-get-right-performance/:

1. Leading array index key for unnest

2. Equality predicates

3. IN predicates

4. Not less than/between/not greater than predicates

5. Less than/greater than predicates

6. Array predicates

7. Derived join filter as leading key for left-hand-side keyspace

8. IS NOT NULL/MISSING/VALUED predicates

9. Functional predicates

10. Partial index condition

Index advisor is designed to work in two ways:

    • N1QL Statement to advise indexes for a single query.
    • Advisor function to advise on query workload and support session handling.

This article will focus on the first function and share some insights on how it works internally.

Details of ADVISE Statement

N1QL Statement Syntax:

ADVISE [INDEX ] <Query>

It supports query types of SELECT/DELETE/UPDATE/ANSI MERGE, and provides index recommendations for operations on a single keyspace, ANSI JOIN/NEST, INDEX JOIN/NEST, UNNEST and subquery in FROM clause.

The primary guidelines that Index Advisor follows are:

    1. Generate secondary index candidates based on indexable predicate, projection, GROUP BY/ORDER BY expressions with correct syntax.
    2. Sort the index keys following the rules w.r.t the priority order of the predicate types with adjustment on leading array index key for UNNEST and derived ISNOTNULL filter for ANSI JOIN.
    3. Make sure the recommended index be superior to all the current existing/deferred indexes and be selected by the current rule-based query optimizer in the query execution.

Let’s firstly take a look at the output of a simple example:

Query 1:

There are two parts in the advice output:

  • Current Indexes:

This session provides the information on the current used indexes by query optimizer to run the input query. It will also include the information on the status of the index iff it is identical to the recommended index or it is an optimal covering index. In both situations, no index recommendation will be provided.

  • Recommended Indexes:
    • Indexes:

This session lists the recommended indexes based on predicates in WHERE/ON clause, along with the corresponding recommendation rule that each index follows.

    • Covering Indexes:

This session lists the covering indexes that applicable to the input query.

Partial Indexes

Couchbase data model is JSON with flexible schema, and one bucket can be heterogeneous with multiple types of documents inside. Typically, data entries include a “type” field for categorization. When a query includes filters on a particular type of documents, a partial index with “type” field in the WHERE clause will be given. It can help to minimize the index size, shorten the index access path and execute the query more efficiently.

In the query example above, the recommended index is a partial index with “type = ‘route'” added in the WHERE clause, that will only create index on the documents of type “route”:

Internally, index advisor uses the “flavor” statistics from INFER statement to match the fields in predicate and add it to the partial index condition.

To get more ideas, let’s run a query with ANSI JOIN on multiple types of documents in which partial index with field “type” as index condition will be advised for each keyspace:

Query 2:

Array index for Array Predicate

JSON is a hierarchical structure comprised of nested scalar fields, objects, and arrays. Array index key can go deep into the nested array and clearly reference the elements or the fields of the object needed to be indexed. Index advisor will recommend array index by recursively traversing the bindings and expressions in ANY range predicate.

Query 3:

Since users have the flexibility to apply any additional logic and processing on the elements and fields of the object in the array to generate ANY expression, Index advisor will also make efforts to support those complex functions, array expressions, boolean conditions and etc. as shown in the query below:

Query 4:

Array index for UNNEST Operation

UNNEST is used to perform a join operation of the nested array with its parent object. When the predicate in this operation applies to the individual elements in the nested array,  an array index will help to optimize the execution of the query.

N1QL query optimizer supports unnest scan and unnest covering scan when there is an appropriate leading array key with ALL array elements indexed in the index definition. Index advisor follows this rule to collect unnest expressions and generate array index reversely for recommendation. Let’s take a look at the example below:

Query 5:

Covering Indexes

Covering indexes can provide all data required for a particular query and avoid the overhead in fetching documents from data service. Index advisor will advise this efficient covering index for the input query if applicable in the following steps:

    • Firstly collect index keys from all the predicates in WHERE/ON clause.
    • Then append the remaining projection and GROUP BY/ORDER BY expressions to the index definition and generate a tentative covering index candidate.
    • Last but not the least, check if the index expressions can provide all required data to cover the query using the same approach as for query optimizer.

The previous query examples have shown that covering indexes and covering array indexes are provided for single keyspace query, JOIN operation, ANY expressions and UNNEST predicates.

Index Naming Convention

In the Index advisor, index name is designed to meet three requirements:

    1. Reflect the index construction by connecting all the index keys with underscore.
    2. Differentiate from user-defined indexes by adding “adv-” prefix.
    3. Reasonable length by truncating and replacing with hash code.

The formats for regular index and array index are list below:

  • adv_field1_field2_field3…
  • adv_[DISTINCT/ALL]_level1_level2_level3…

Duplicate names are not supported in Couchbase indexing, and index advisor doesn’t guarantee the uniqueness of the index name in the recommendation.  Uses need to change the index name upon encountering this error.

Summary

Index advisor (ADVISE statement) provides index recommendations for a single query. It advises regular index, partial index, array index and covering index and provides information on the corresponding recommendation rule that each index key follows.  Moreover it evaluates the current used indexes by one query and avoids recommending unnecessarily.

Released as a DP feature, index advisor is still in its initial stages. The functionality and usability will be further enhanced to better meet customer requirements and contribute to query optimization.

Resources

We would love to hear from you on how you liked the 6.5 features and how it’ll benefit your business going forward. Please share your feedback via the comments or in the forum.

Author

Posted by Chang Liu

Software Engineer in N1QL Query Team

Leave a reply