Couchbase 4.5 is released!! Part2 is a continuation of this blog and it includes covering array indexes, support for more operators such as UNNEST, ALL, ANY AND EVERY etc.,

Do you have documents with embedded arrays and need an efficient means to access the data stored in those arrays? Arrays are one of the powerful features of NoSQL/JSON data modeling. Arrays can store multi-valued data inside a document attribute, such as multiple phone numbers, children information, flight schedules, product reviews, blog comments and responses, etc.  Not only does it keep all relevant data together in the same document, but also improves query performance by avoiding unnecessary JOINs. Arrays in documents are not new to Couchbase or N1QL (SQL for JSON), it already has wide selection of constructs and operators to store, index, and process arrays.

Couchbase 4.5 Developer Preview adds support for Array Indexing with the capability to index and query individual array elements or any objects/attributes nested within the array. Array Indexing tremendously improves performance of queries involving arrays, especially accessing values and attributes within the array. It also simplifies the processing of array attributes in the N1QL queries. This is a huge leap from the previous versions where the secondary indexes could be created and subsequently queried, but only on the contents of the whole array.  

Let’s see how the magic works. Consider the ‘travel-sample’ bucket shipped with the product, and the schedule array attribute in the documents with type = route

In earlier versions, when an index is created on schedule the whole array is indexed with all of its content/elements as a single scalar value. That means that everything you see in the blue box in above example is considered as one single index value. So, later when you query on schedule, the whole array value needs to be provided in the WHERE-clause in order to use the index. This may be manageable for simple/small arrays, but in general has following issues:

  1. Indexing fine-grain data within the array: What if you want to index only subset of the array, say flight attribute within the schedule array or only for certain days?

  2. Index size and performance: When the whole array is indexed, it consumes more index storage space and takes longer to search the index. Figure 1 depicts the index created for attribute schedule with pre-4.5 version.

  1. Searching within Arrays: By indexing the whole array we cannot efficiently search for specific data within the array. The application would have to first fetch/SELECT the whole array value and then process it, to find the data of interest.

  2. Large size arrays: It’s not convenient (and sometimes impossible) to supply the whole array value in a WHERE clause when the array has lot of elements.

Array Indexing in Couchbase 4.5 addresses all these issues and has N1QL & indexer enhancements to leverage the indexed arrays. This includes support for partial indexes, nested arrays, composite indexes and operators such as  ANY, IN, WITHIN, DISTINCT ARRAY. Let’s look at some examples:

  1. First, create an Array Index on the schedule attribute, using the DISTINCT ARRAY operator which specifies the exact array elements or nested attributes to be used as index keys. The following statement indexes “all flights scheduled on weekdays (i.e day <= 5)”. Note that, we are able to create index on only a subset of the array elements that are of interest. And hence, the new array index is compact and efficient (see  Figure 2), and stores only the required information in the index.

  1. Now, the index can be used in a SELECT or any other DML statement, by specifying the index keys and index predicates in the WHERE clause. In current release, it is required to specify the exact variable names, index keys (such as v, v.day, v.flight), and predicates such as (type = “route”) and (v.day = 4) matching the ones specified in the CREATE INDEX definition. This index-selection-criteria is required for N1QL to automatically pick the matching index to process a query. As usual, when multiple matching indices are created, USE INDEX clause can be used to suggest N1QL to use a specific index.

For example, the following query finds the “number of UA flights scheduled on 4th day” using the array index. The query uses the isched index because it follows the index selection requirements:

  1. The variable v is used, which is the exact variable used in the index definition.

  2. The index key v.flight is used in the WHERE clause.

  3. Both the index predicates (type = “route”) and (v.day = 4) are used in the WHERE clause. Note that (v.day = 4)is considered matching, because that is “included” by (v.day <= 5)which is specified in the CREATE INDEX definition.

  1. Performance:  Array Indexes bring magnitude boost to the performance of queries that can leverage the index. For example, the above query in (2) which uses the array index ‘isched’ took roughly 256ms in my laptop. However, following query using index ‘def_type’ took almost 2.38seconds. That’s a whooping 9x better performance with array indexes.

You can learn more about array indexing and see more examples, such as composite and nested array indexes, in the Couchbase 4.5 documentation, and checkout the demo.

Give it a try, and let me know any questions/comments, or just how awesome it is ;-)

Cheers!!

Author

Posted by Prasad Varakur, Principal Product Manager, Couchbase

Prasad Varakur is a Principal Product Manager, Couchbase. Prasad is Product and Engineering leader in Databases(SQL, noSQL, Bigdata) & Distributed systems.

Leave a reply