In Couchbase, data is always partitioned using the consistent hash value of the document key into vbukets which are stored on the data nodes.  Couchbase Global Secondary Index (GSI) abstracts the indexing operations and runs as a distinct service within the Couchbase data platform.  When a single index can cover a whole type of documents, everything is good. But, there are cases where you’d want to partition an index.

  1. Capacity: You want increased capacity because a single node is unable to hold a big index
  2. Queriability: You want to avoid rewriting the query to work with manual partitioning of the index using a partial index.
  3. Performance: Single index is unable to meet the SLA

To address this, Couchbase 5.5 introduces automatic hash partitioning of the index.  You’re used to having bucket data hashed into multiple nodes. Index partitioning enables you to hash the index into multiple nodes as well.  There is good symmetry.

Creating the index is easy.  Simply add a PARTITION BY clause to the CREATE index definition.

This as the following meta data in the system:indexes.  Note the new field partition with the hash expression. The HASH(state) is the basis on which the index logically named customer.ih is divided into a number of physical index partitions. By default, the number of index partitions are 16 and it can be changed by specifying num_partition parameter.  In the example above, we create 8 partitions for the index customer.ih.

Now, issue the following query. You don’t need additional predicate on the hash key for the query to use the index.   The index scan simply scans all of the index partitions as part of the index scan.

However, if you do have an equality predicate on the hash key, index scan detects the right index partition having the right range of data and prunes rest of the index nodes from the index scan.    This makes the index scan very efficient.

Now, let’s look at how this index helps you with three things we mentioned before: Capacity, Queriability and Performance.

Capacity

The query customer.ih will be partitioned to a specified number of partitioned with each partition stored on one of the index nodes on the cluster.  The indexer uses a stochastic optimization algorithm to determine how to distribute the partitions onto the set of indexer nodes, based on the free resource available on each node.  Alternatively, to restrict the index to a specific set of nodes, use the nodes parameter. This index will create eight index partitions and store four each on the four index nodes specified.  

 

So, with this hash partitioned index,  one logical index (customer.ih) will be partitioned into a number of physical index partitions (in this case, 8 partitions) and give the query an illusion of a single index.  

Because this index uses the multiple physical nodes, the index will have more disk, memory and CPU resources available. Increased storage in these nodes makes it possible to create larger indexes.

You write your queries, as usual, requiring predicates only the WHERE clause (type = “cx”) an at least on one of the leading index keys (e.g. name).  

Queriability

Limitations in the Couchbase 5.0 indexing:

Until Couchbase 5.0, you could manually partition the index like below. You had to partition them manually using the WHERE clause on the CREATE INDEX.  Consider the following indexes, one per state. By using the node parameter, you could place them in specific index nodes or the index will try to automatically spread out within the index nodes.

For a simple query with equalify predicate on state, it all works well.

There are two issues with this manual partitioning.   

  1. Consider the following with slightly complex predicate on the state.  Because the predicate (state IN [“CA”, “OR”]) is not a subset of any of the WHERE clauses of the index, none of the indexes can be used for the query below.

2. If you get data to a new state, you’d to be aware of it and create the index in advance.

   If the field numerical field, you can use the the MOD() function.

Even this work around each query block can only use one index and requires queries to be written carefully to match one of the predicates in the WHERE clause.

Solution:

As you see from the figure above, the interaction between the query and index goes through the GSI client sitting inside each query node.  Each GSI client gives the illusion of a single logical index (customer.ih) on top of eight physical index partitions.  

The GSI client takes all of the index scan request and then using the predicate, tries to see if it can identify which of index partitions has the data needed for the query.  This is the process of partition pruning (aka partition elimination). For the hash based partitioning scheme, equality and IN clause predicates get the benefit of partition pruning. All other expressions use the scatter-gather method. After the logical elimination, GSI client sends the request to the remaining nodes, gets the result, merges the result and sends the result back to query.  The big benefit of this is that queries can be written without worrying about the manual partitioning expression.

Example query below does not even have a predicate on the hash key, state.  The below query does not get the benefit of partition elimination. Therefore, the GSI client issues scan to every index partition in parallel and then merges the result from each of the index scan. The big benefit of this is that queries can be written without worrying about the manual partitioning expression to match the partial index expression and still use the full capacity of the cluster resources.

Additional predicate on the hash key (state = “CA”) in the query below will benefit from partition pruning.  For query processing, for simple queries with equality predicates on hash key, you get uniform distribution of the workload on these multiple partitions of the index. For complex queries including the grouping & aggregation we discussed above, the scans, partial aggregations are done in parallel, improving the query latency.  

You can create indexes by hashing on one or more keys, each of which could be an expression.  Here are some examples.

Performance

For majority of database features, performance is everything. Without a great performance, proven by good benchmarks, the features are simply pretty syntax diagrams!

Index partitioning gives you improved performance in two ways.

  1. Scale out. The partitions are distributed into multiple nodes, increasing the CPU and memory availability of for the index scan.  
  2. Parallel scan. Right predicate giving queries the benefit of partition pruning. Even after the pruning process, scans of all the indexes are done in parallel.
  3. Parallel grouping and aggregation. The DZone article Understanding Index Grouping and Aggregation in Couchbase N1QL Queries explains the core performance improvement of grouping and aggregation using indexes.  
  4. The parallelism of the index parallel scan (and grouping, aggregation) is determined by the max_parallelism parameter. This parameter can be set per query node and/or per query request.

Consider the following index and query:

The index is partitioned by HASH(state), but state predicate is missing from the query.  For this query, we cannot do partition pruning or create groups within individual scans of the index partitions.  Therefore, it will need a merge phase after the partial aggregation with the query (not shown in the explain). Remember, these partial aggregations happen in parallel and therefore reduces the latency of the query.

Consider the following index and query:

Example a:

In the above example, the group by is on the leading keys (state, city, zip) of the index and hash key (zip) is part of the group by clause. This will help the query to scan the index and simply created the required groups.

Example b:

In the above example, the group by is on the third key (zip) of the index and hash key (zip) is part of the group by clause. In the predicate clause (WHERE clause), there is single equality predicate on the leading index keys before the key zip (state and city).  Therefore, we implicitly include the keys (state, city) in the group by without affecting the query result. This will help the query to scan the index and simply created the required groups.

Example c:

 

In the above example, the group by is on the third key (zip) of the index and hash key (zip) is part of the group by clause. In the predicate clause (WHERE clause), there is range predicate on city. The index key (city) is before the hash key(zip).  So, we create partial aggregates as part of the index scan and then then query will merge these partial aggregates to create the final resultset.

Summary:

Index partition gives you increased capacity for your index, better queriability and higher performance for your queries.  By exploiting the Couchbase scale-out architecture, indexes improve your capacity, queriability, performance and TCO.

Reference:

  1. Couchbase documentation: https://developer.couchbase.com/documentation/server/5.5/indexes/gsi-for-n1ql.html
  2. Couchbase N1QL documentation: https://developer.couchbase.com/documentation/server/5.5/indexes/gsi-for-n1ql.html#

Posted by Keshav Murthy

Keshav Murthy is a Senior Director at Couchbase of N1QL R&D. Previously, he was a Senior Director at MapR, Senior Architect for IBM, with more than 20 years experience in database design & development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelors degree in Computer Science and Engineering from the University of Mysore, India, holds eight US patents and invented databases for systems of engagement.

Leave a reply