With the Couchbase Server 4.0 GA – we gave you N1QL which was a key enabler to Develop with Agility, and Operate at Scale! Now, we are pleased to announce the developer preview release of 4.1 – which has valuable stability fixes. Embedded within that is a key feature called Covering Indexes.

Covering Index is an exciting new N1QL feature in 4.1 release which provides the ability for an application, that leverages it appropriately, to recognize a noticeable query performance improvement in many cases. A covered query is a N1QL query in which all the fields in the query are part of an index, and all the fields returned in the query are in the same index.

Prior to this improvement, if you take a look at the query execution flow, it is as follows:

  1. The application client issues a query request to the server via a Rest API

  2. The query service goes through a Parsing and Analysis phase, and generates a Query Plan.

  3. The query service issues scan requests to the Index Service if there is a valid index for the situation

  4. Gets back the qualified document keys from the Index service

  5. Sends the document keys to the Data Service (a ‘fetch request’)

  6. Gets the documents back from the Data Service

  7. Evaluates the documents (re-apply filter criteria etc)

  8. Sends the final filtered results back to the client

In a well designed application, the steps 5 and 6 can be avoided using a covering index (which we highly recommend as the best practice), and that can result in a demonstrable performance improvement.

The EXPLAIN statement in N1QL allows you to see the query execution plan. When a covering index is used for query execution, the EXPLAIN will now show that a covering index is used for data access (and key-value document fetches and associated overhead are avoided).

Let’s say that we create an index on the attribute “state” in the beer-sample bucket.

CREATE INDEX idxstate ON beer-sample(state) USING GSI

Now if we select state from beer-sample, since all the data that is to be returned is present in the index, we can avoid key-value data fetches, and just return the data based on the index.

EXPLAIN SELECT state FROM beer-sample’ WHERE state = “CA”

"results": [

      {

          "#operator": "Sequence",

          "~children": [

              {

                  "#operator": "IndexScan",

                  "covers": [

                      "cover((beer-sample.state))"

                  ],

                  "index": "idxstate",

                  "keyspace": "beer-sample",

                  "namespace": "default",

                  "spans": [

                      {

                          "Range": {

                              "High": [

                                  ""CA""

                              ],

                              "Inclusion": 3,

                              "Low": [

                                  ""CA""

                              ]

                          }

                      }

                  ],

                  "using": "gsi"

Note: The same index is NOT a covering index if the query were to be slightly modified.

EXPLAIN SELECT state, city FROM beer-sample WHERE STATE = "CA";

It still uses the idxstate index, however, since the index does NOT contain the city information, it uses the index for faster lookup, but not as a covering index.

{

    "requestID": "c60b10d2-2cab-4b6a-987c-e5e6ebe4a900",

    "signature": "json",

    "results": [

       {

           "#operator": "Sequence",

           "~children": [

               {

                   "#operator": "IndexScan",

                   "index": "idxstate",

                   "keyspace": "beer-sample",

                   "namespace": "default",

                   "spans": [

                       {

                           "Range": {

                               "High": [

                                   ""CA""

                               ],

                               "Inclusion": 3,

                               "Low": [

                                   ""CA""

                               ]

                           }

                       }

                   ],

                   "using": "gsi"

               },

               {

                   "#operator": "Parallel",

                   "~child": {

                       "#operator": "Sequence",

                       "~children": [

                           {

                               "#operator": "Fetch",

                               "keyspace": "beer-sample",

                               "namespace": "default"

                           },

                           {

                               "#operator": "Filter",

                               "condition": "((beer-sample.state) = "CA")"

                           },

                           {

                               "#operator": "InitialProject",

                               "result_terms": [

                                   {

                                       "expr": "(beer-sample.state)"

                                   },

                                   {

                                       "expr": "(beer-sample.city)"

                                   }

                               ]

                           },

                           {

                               "#operator": "FinalProject"

                           }

                       ]...rest omitted for brevity

How do we then convert this modified query to use a covering index?
Here you go,

CREATE INDEX idxstatecity ON beer-sample (state, city) USING GSI

Now, if we execute the query:

EXPLAIN SELECT state, city

FROM beer-sample USE INDEX (idxstatecity)

WHERE STATE = "CA" ;

Bingo! It becomes a query that uses a covering index

    

               {

                   "#operator": "IndexScan",

                   "covers": [

                       "cover((meta(beer-sample).id))",

                       "cover((beer-sample.state))",

                       "cover((beer-sample.city))"

                   ],

                   "index": "idxstatecity",

                   "keyspace": "beer-sample",

                   "namespace": "default",

                   "spans": [

                       {

                           "Range": {

                               "High": [

                                   "successor("CA")"

                               ],

                               "Inclusion": 1,

                               "Low": [

                                   ""CA""

                               ]

                           }

                       }

                   ],

                   "using": "gsi"

               },

  

EXPRESSIONS AND AGGREGATES

Queries with expressions and aggregates can benefit from covering index as well.

CREATE INDEX idxstatecountry ON beer-sample(state, country) USING GSI

SELECT country, max(state)

FROM beer-sample USE INDEX (idxstatecountry)

WHERE STATE LIKE '%'

GROUP BY country

UNION / INTERSECT / EXCEPT

These are supported as well for covering indexes

SELECT COUNTRY

FROM beer-sample

WHERE STATE = 'CA'

UNION ALL

SELECT COUNTRY

FROM beer-sample`

WHERE STATE = ‘Texas’

And since the index is already sorted, when you use ORDER BY in the case of a covering index, there is an added benefit of the query engine already being able to use a sorted index.

Please refer to our documentation for more information and more scenarios where covering indexes will be useful to you.

We strongly encourage you to try this feature in 4.1 developer preview, and give us your valuable feedback.

You may also want to try out N1QL using the Developer Preview of the Query Workbench,  which provides a nice GUI for writing and executing N1QL queries.

Author

Posted by Ilam Siva, SR. Product Manager, Couchbase

A former Senior Product Manager at Couchbase, Ilam Siva is responsible for various areas of Couchbase Server product development, roadmap, positioning, messaging and collateral. Ilam is passionate about Big Data and NoSQL and believes that it will push the frontiers of scale, performance and capabilities of next generation applications. Prior to Couchbase, he has worked on Hadoop, Cloud platform and database technologies at Yahoo!, Microsoft and Oracle.

Leave a reply