June 30, 2014

Indexes and Selectivity

Indexes in Couchbase

Indexes in Couchbase are a flexible means of performing back end processing.   When utilizing indexes it's possible to create decision tree type aggregations and selectivity functionality, such as those found in an inventory control system.   By using a well defined and flexible data model in conjuction with Couchbase's indexing subsystem, it's possible to build applications that offer a high degree of selectivity.   Couchbase provides the ability to pass parameters in for a dynamic querying experience.    A quick review of the available parameters is as follows:

  • Key=""
  • Keys=[]
  • startkey/endkey=""
  • startkey_docID/endkey_docID=""
  • stale=[false,update_after,true]
  • group/group_by
  • include_docs=[true]

Building Selectivity

In the following example, an inventory control system will be utilized to demonstrate the selectivity power of indexes in Couchbase.  The example is for a clothing store that needs to be able to perform queries with selectivity for:

  • by item category
  • by item vendor
  • by item discount

The JSON Format for items stored in the inventory bucket is simplified for this example and utilizes a small test data set of Seven documents.  A sample document is as follows:

{
        "item": "KS-00114629 Dark Grey Glasses",
        "vendor": "kate spade",
        "category": "glasses"
}

With this bucket we can build an index called "catalogue" as follows:

function(doc,meta){
  if(doc.item){
        if(doc.vendor){
                emit(doc.vendor, doc.item);
        }
        if(doc.category){
                emit(doc.category, doc.item);
        }
        if(doc.discount){
                emit(doc.discount,doc.item);
        }
  }
}

This index performs some interesting functionality that allows for selectivity in querying.   The workflow for this index is as follows:

  • if(doc.item){  // Checks if a document has a field called "item"   
  • if(doc.vendor){  //Checks if a document has a field called vendor, and if it does emits the vendor name as the key, and the item field as the value
  • if(doc.category){  //Checks if a document has a field called category, and if it does emits the category name as the key, and the item field as the value
  • if(doc.discount){  //Checks if a document has a field called discount, and if it does emits the discount amount as the key, and the item field as the value

This index can potentially emit three rows within for each document, assuming a document has a vendor, category, and discount.   
    
Once deployed, if this index is called from any endpoint on the cluster (for this example a cluster node with ip of 192.168.60.101 is used) with:

http://192.168.60.101:8092/inventory/_design/inventory/_view/category?stale=update_after&connection_timeout=60000&skip=0

16 Rows are returned:

{"total_rows":16,"rows":[
{"id":"0000000007","key":"10","value":"CH-44308670 Two Tone Yellow Kremlin Bag"},
{"id":"0000000006","key":"15","value":"IH-00884501 Triple ring Shoulder bag"},
{"id":"0000000007","key":"Chrome","value":"CH-44308670 Two Tone Yellow Kremlin Bag"},
{"id":"0000000001","key":"glasses","value":"KS-00114629 Dark Grey Glasses"},
{"id":"0000000003","key":"glasses","value":"RL-00657038 Blue Rodeo Sunglasses"},
{"id":"0000000002","key":"handbag","value":"KS-00113108 Blue Shoulder Strap Handbag"},
{"id":"0000000004","key":"handbag","value":"RL-00332567 Yellow Messenger Bag"},
{"id":"0000000006","key":"handbag","value":"IH-00884501 Triple ring Shoulder bag"},
{"id":"0000000007","key":"handbag","value":"CH-44308670 Two Tone Yellow Kremlin Bag"},
{"id":"0000000005","key":"iron heart","value":"IH-00130628 Slim Cut Denim"},
{"id":"0000000006","key":"iron heart","value":"IH-00884501 Triple ring Shoulder bag"},
{"id":"0000000001","key":"kate spade","value":"KS-00114629 Dark Grey Glasses"},
{"id":"0000000002","key":"kate spade","value":"KS-00113108 Blue Shoulder Strap Handbag"},
{"id":"0000000005","key":"pants","value":"IH-00130628 Slim Cut Denim"},
{"id":"0000000003","key":"ralph lauren","value":"RL-00657038 Blue Rodeo Sunglasses"},
{"id":"0000000004","key":"ralph lauren","value":"RL-00332567 Yellow Messenger Bag"}
]
}

Selectivity in the Query

With the flexibility defined in the data model and deployed in the index, this index can be utilized to perform queries against the database with a high degree of selectivity.  A Range query can be utilized to select all items from a specific vendor.   For example, to see all items in the bucket from Kate Spade a startkey and endkey of "kate spade" could be passed in as follows:

http://192.168.60.101:8092/inventory/_design/inventory/_view/category?stale=update_after&connection_timeout=60000&skip=0&startkey=%22kate%20spade%22&endkey=%22kate%20spade%22

Even though the index contains 16 rows, only two would be returned to the client application: 

{"total_rows":16,"rows":[
{"id":"0000000001","key":"kate spade","value":"KS-00114629 Dark Grey Glasses"},
{"id":"0000000002","key":"kate spade","value":"KS-00113108 Blue Shoulder Strap Handbag"}
]
}

At this point the option also exists to return the entire document with this query--not just the items emitted in the index.   If this same query is run again, with the "include_docs" parameter set to "true" the as follows:

http://192.168.60.101:8092/inventory/_design/inventory/_view/category?stale=update_after&connection_timeout=60000&skip=0&startkey=%22kate%20spade%22&endkey=%22kate%20spade%22&include_docs=true

The same two id's are returned, this time with the entire document for each row in the index that matches the range:

{"total_rows":16,"rows":[
{"id":"0000000001","key":"kate spade","value":"KS-00114629 Dark Grey Glasses","doc":{"meta":{"id":"0000000001","rev":"6-00002ec467e06f2a0000000000000000","expiration":0,"flags":0},"json":{"item":"KS-00114629 Dark Grey Glasses","vendor":"kate spade","category":"glasses"}}},
{"id":"0000000002","key":"kate spade","value":"KS-00113108 Blue Shoulder Strap Handbag","doc":{"meta":{"id":"0000000002","rev":"6-00002ec9a2088ad70000000000000000","expiration":0,"flags":0},"json":{"item":"KS-00113108 Blue Shoulder Strap Handbag","vendor":"kate spade","category":"handbag"}}}
]
}

The power of this data model can be explained further in running a query to look at items that currently have a discount associated with them.   A query can be made with an endkey of "a" due to the unicode sort order that Couchbase utilizes and all numerical key values will be returned:

http://192.168.60.101:8092/inventory/_design/inventory/_view/category?stale=update_after&connection_timeout=60000&endkey=%22a%22

The two rows in the index that have been emitted due to a discount being present in the document for the item are returned

{"total_rows":16,"rows":[
{"id":"0000000007","key":"10","value":"CH-44308670 Two Tone Yellow Kremlin Bag"},
{"id":"0000000006","key":"15","value":"IH-00884501 Triple ring Shoulder bag"}
]
}

 

Comments