August 7, 2012

Couchbase 2.0 Views Documentation

We all think Views in Couchbase Server 2.0 are cool. Honestly, after 25 years of working with the fixed schemas of RDBMS and then SQL-based databases, the flexibility and power of the document style during data creation and update you begin to wonder why we've gone down that path for so long. Add in the power and flexibility of views and getting the out the way you want…

I spoke last week about the work that has been done to get various parts of all our documentation up to spec, and as part of that, I mentioned the Views work. Here's a deeper dive into the Views chapter and what's been added, and where to go looking for information. 

Couchbase Views is a big chapter and you van find the entire content here.

View Writing

The basis of all views are the map and reduce functions that select, format, and if necessary summarise the information. There's a lot of power in those very basic components, and that's what makes Views so useful and, for people coming from the SQL and RDBMS, more difficult to understand and capture their power. 

These basic building blocks are described in detail http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing.html

I've also added detailed information on writing custom reduce functions and how to handle the data and merging during incremental reduce and rereduce as data is merged from multiple servers and collations. http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-reduce.html#couchbase-views-writing-reduce-custom

Once you've got the data in, you need to know how that relates to querying the view and getting the data back out again. We've added a big section on the querying system, how to select information, and how to control the query arguments to get the information you want http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-querying.html

Examples

The best way to learn how to write and use views is to look at some samples and examples of both the map/reduce and querying phase. I've tried really hard to give a good overview of different techniques and information using views for different use cases. 

Currently I've explicitly called out the following:

Emitting Multiple Rows

If you use compound values (arrays or hashes) in your documents and want to be able to query on just one of these, then calling the emit() function multiple times is what you need. I use recipes (and ingredients) as an example here to show you how you can select information using this technique, but it can be equally applied elsewhere. See Emitting Multiple Rows

Date and Time Selection

There are lots of different scenarios where date and time selection are useful, but one of the more common, particularly in big data and analysis situations is when doing a statistics or value rollup in combination with date output. 

Within a view, splitting out the values (day, month, year, hour, minute, second) individual enables you to perform a rollup using the group_level argument to a reduce function. Using this combination you can count up items by day, hour, minute, and all range combinations thereof. Want data between 6:01 and 19:12 on day by aggregated every 5 minutes? Easy with a view. 

Want to output the different error levels over a given date range? If you combine a map using the individual values trick with a custom reduce function you can output that information too. There's a full example of both the map and custom reduce functions required to achieve this

Selective Record Output

There are two high-level models for storing data in Couchbase Server when storing different record types. Either you use multiple buckets, and use your client to martial the data to each bucket accordingly, or you use one bucket for your entire application and then use a field to highlight the record type. You can use this information in your view to select data and output accordingly

One, less obvious, method in a view for selective record output is to speed up and simplify your views for specific query types. For example, if you are storing recipes and have a homepage that displays the top 20 recipes that can be prepared in under 20 minutes you can use selective view output to generate the information, and then use the querying to otherwise select your data. Just put an if statement into the map to make the primary selection (under 20 minutes). 

This is much more efficient, and means you can keep your view and query parameters simple. If you have very fixed information like this, you can even take it extremes and create views where you never specify query selection arguments. 

Solutions for Simulating Joins

Couchbase Server doesn't support joins, but there are ways in which you emulate the data that would be output in a join by combining emitted rows through a common or shared parameter or value. For example, outputting the blog posts and related comments, both share a blog post identifier. See Solutions for Simulating Joins

Simulating Transactions

For certain operations, you can simulate transactions by only writing a single record into the database and then using that to merge and update the output value that would otherwise have been create in each individual record. In the example in this section we use a bank transaction exchanging money between two accounts. See Simulating Transactions

Reduction and Summaries

Sums, counts, custom reduction provide a wealth of possibilities, especially if you want to reduce and summarize very large volumes of data. With Views, the summary information and output from the reduce function is a stored in the index, which means accessing that stored data is incredibly quick. Details on these functions, including complex custom builds are available here

Simulating Multi-phase Transactions

Multi-record updates and transactions can by simulated by storing and recording the process of the transaction at each stage. It requires multiple record updates, but because each update also records the progress of the transaction, you can back it out or recover from failure at every single stage. 

Views for SQL Writers

Many of you may know that I worked at MySQL for many years and I have 20 years experience of writing SQL in Oracle, MySQL and many others under my belt. In this new section, I've tried to demonstrate the different SQL expressions and how they can be translated into a corresponding View and query.

Provided here are a combination of simple expressions (SELECT), selection (WHERE), ranges and paging (LIMIT/OFFSET), sorting (ORDER BY), and summary (GROUP BY) techniques. 

Multi-user View access

Views create indexes and you want to ensure that your clients are able to get either the most up to date information, or information the fastest, or a combination of the two. You can control this through the stale parameter. But you need to keep in mind that you may have multiple clients querying the data at the same time and they may all have different accuracy and speed requirements. 

Understanding the impact of the stale parameter and how it works both in isolation and when the view is being accessed by multiple clients is covered in detail

Views for Administrators

Developers are obviously the main users of views both in terms of creating and consuming the content. But administrators need to know how views will impact the performance of the cluster, how views are built and updated, and to ensure they can keep the performance of the cluster at its highest level. The key elements of how views work are covered in this section.

Comments/Suggestions/Requests

As always, if there is some example, solution or problem that you would like to see described in detail in the documentation, either leave a comment through our comment system, or email us at editors@couchbase.com!

 

 

 

 

 

 

 

Comments