December 13, 2013

Couchbase 104: Q & A

In our ongoing training series, a number of questions come up each time, I list them out with their respective answers below!

Couchbase 104 - Views and Indexing

Q: Are wild cards like * and % supported in view queries?

A: Not exactly, no, but in many cases using a range query with partial text in the startkey can substitute for many wildcard necessities, but not all of them. For instance to get all the keys that start with "user::" you can do startkey="user" and endkey="user\uefff". You cannot do things like startkey="u%er" which would match indexed key's of "user" and "uzer".

Q: How could model a graph with views?

A: This is a bit involved to answer because it's a very open ended question in terms of defining what your "graph" actually is and how you will be querying it. If you are doing it "Twitter style" it's not actually a graph but a list of followers and people you follow. While technically it's a graph of limited depth, it's more like two lists. If you are querying the graph based on the relationship attributes between nodes, this can be very complex very quickly. If you are also querying for many levels of depth (or degrees of separation) this can also be quite complicated. Graph databases are specifically designed in their data structures for traversing and querying graph structures. However, they are not as good as general purpose data stores, so it's a good idea to do a polyglot combination of Couchbase and them (I have used Couchbase and Neo4J together very nicely).

Q: Are there options to change the default order of collation?

A: Not out of the box, I am sure there is a way to modify and build that from source. You are welcome to post on the Couchbase google group if this is important to you. Unicode collation is far more favorable than byte-order.

Q: Using the dateToArray(timestamp) as the indexed key for the emit, how would you group just by month?

A: The B+ trees are still ordering compound keys as strings, so there would be no way to group by only month (multiple years) without changing the indexed key in the emit statement. You could either, take out the year by doing emit(dateToArray(timestamp).splice(1), output_val) or make a different View just for that if you need both. But if you are using the standard output of dateToArray() which has yyyy first, you cannot group by just MM and ignore the yyyy.

Q: What happens if there are multiple emits in single map() function with different keys getting emitted?

A: They will be added to a single Index, and sorted as strings within it. You can use this to do "simulated" joins as I showed in the video and which comes with the beer-sample database Views. Each View is a single index, so each Map function is the processing to create that index and each emit statement within the map function outputs indexed keys as nodes and associated output value and meta.id attached. We also store the reduce value of that node and all the nodes underneath it (children) in the B+ tree to speed up reduce queries and not having to traverse all the nodes in reduces. 

Q: Is there full-text search possible with Views?

A: No, that is not possible with Views as indexes, but you can use Elastic Search integration for this. There is also a Developer Preview of N1QL Couchbase Query Language for ad-hoc querying of Couchbase. It's software that is independent of Couchbase and runs in parallel right now, there is also an online tutorial as well: http://www.couchbase.com/communities/n1ql

Comments