Introduction

Couchbase is capable of very high write rates, can scale out fast and add nodes easily, but a poor object model can be a hindrance to these qualities. In some databases, if you have very high write rates, you sacrifice read rates, but Couchbase has some fairly unique capabilities in the NoSQL space to support both of these in an effective manner. In this blog post we will discuss what it takes to design an object model that will play to these capabilities for logging and event data, but also have easy searchability with N1QL as well.

The example use case I came across in a conversation recently is to use Couchbase as the operational database to collect various types of events from external systems like network gear, server or even log data. Then the service needs the ability to very quickly see the number of events in UI in a sort of hourly rollup fashion. The other need is to be able to click on that number and drill down to a list of that event type. For example, show all of the RouterError events on June 22nd 2015 for the hour of 16:00.

One other things to remember, this blog post is to show you a more advanced concept and how you might apply it. It obviously does not mean it is the exact right way for your use case even if it may be similar to what I am talking about. It is to get you thinking about advanced object modeling in Couchbase and how you might use its power more effectively to get the most out of it in a way that may not be obvious to everyone.

Incremental Counter to Read Events and Counts Per Hour

This approach would allow one to easily read the last N number of events or an event type for an hour instead of doing things with a view query. Or we can read all events for a particular day. It is optimized for very high write rates and allows for easy lookup of data within a decent level of granularity. I will explain a high level of this and then dive down into specific examples to explain the idea further.

We will need two object types for this in our bucket.

  1. The Counter Object – This is a key/value object and holds an integer. This integer represents the top end of the number of objects for that event type and hour or put another way, it is the top end of the array of events. It is also the object you will read to show how many events are for that type/hour combo. We will use special and specific methods of the Couchbase SDKs called Counter Operations. Each SDK has its own version of these methods, but here is the node.js version of it as an example.
  2. The Event Object – This is a JSON document object and has the actual data about the events we want to capture.

The Counter Object

You create a counter object for each event type and hour combination. Think of it like an operational object about the documents we will create. This may sound odd, but bear with me. This counter object will be a key/value object, not JSON, with the value being an integer. There is an Counter Operations in the Couchbase SDKs specifically for this type of object and it is very efficient and offers fast read write capabilities to maintain consistency. It is a single, atomic operation in the Couchbase SDKs so it is very easy and fast to use. Here is an example of the node.js version in the documentation. In our case, this counter will be incremented every time we add a new event. Since each event type and hour has its own counter, we can easily read how many events there are and that number becomes the upper bound of an array if we needed to read all of the events for that type and hour.

The other important part is the object’s key. We want to choose a key so the application can easily construct the keys needed and then gather the data by key to display the number of events, but also the events for the given time period. Fetching objects by key will always be faster than querying. It is the difference between knowing the answer already and having to ask a question to retrieve the data that is the answer. By knowing the key, you simply tell the database to go retrieve the data. Simple, effective and very fast.

Here is an example of the counter’s key/value:

Object key:

An example ObjectID would be:
Value: 293

Where 293 is the value of the most recent increment of the counter.

For the timestamp in the key, I made it the four digit year, a two digit month, the day, then the hour (in 24hour time). I did not need to go down to the minute or second level, but you could. I also could have used a UNIX timestamp which would work too, but again that was unnecessarily granular for this particular use case.

In the above example, 2015 is the year, February is the month, 20th is the day and 4pm for the hour. So if you wanted to read all of the counters for an event type and specific day, the application could easily assemble the objectIDs for those counters and bulk read them.

One other thing, I use double colons as my delimiter, but you can use whatever makes sense.

The Event Object

For each event object, the object ID would look something like as follows:


Where 293 is the value of the most recent increment of the counter object for that hour

With this schema, for you to retrieve the count of items for that Event Type in a specific hour, just read the one operational object and there you have it.

Simply put, the value of the counter is the upper bound of that event type’s objects. If you wanted the last 10 events of that event type for an hour, you’d read that counter, subtract 9 and then do a parallelized bulk read in Couchbase for the following objects:

So you can read all 10 of these events very fast and with no querying, no indexes, no views, just raw speed via parallelized bulk read. A bulk read of the listed objects would be VERY fast in Couchbase even if you had >300 of them.

The one minor issue with this approach, is that it is possible, however very unlikely for that count to become inconsistent with the actual count object. For example, someone could iterate the counter object, but then not create an event document with that object. That being said, if you are using bulk operations and it requests an object that does not exist, it will simply receive a miss and the whole operation will not suffer for it. The trade-off of this in my mind if just fine considering the performance that a model like this can scale to. If you figure out a better way, please post in the comments as I’d love to hear about it.

The Application Code

Let’s look at how the application code might be laid out to read and write this object model. I am going to use pseudo-code to specifically not get into a particular language. I will leave the details of that to your language and Couchbase SDK of choice.

Summary

By using the series of object modeling techniques outlined above, it’s possible to structure the data in such a way as to maximize throughput and performance. Although perhaps initially counter-intuitive, the use of additional key/value lookups instead of secondary index based queries for primary application functionality can give significant benefits. In many systems a complex index based lookup may take an order of magnitude longer to complete than the simple key-value lookups used throughout this design. In the right system architecture, Couchbase can easily provide consistent sub-millisecond response time for these lookups. Then when you need the power to really query, you use N1QL. You get to pick and choose where you utilize the power Couchbase gives you.

Furthermore, because of Couchbase’s automatic-sharding architecture, load for the queries and ingestion will be evenly spread across the cluster. As application usage and ops demand increases over time, additional Couchbase nodes can be added to scale out in an online operation, meeting demand without any changes at the application layer.

Post Script About Querying

One last thing if you have gotten this far. You may be saying, but all this and I am not querying the database. Why are you not using N1QL? I did not say that in this use case I would not be using N1QL and nothing prevents us from using N1QL on this documents. The way I look at N1QL is that it is yet another tool in the toolbox for interacting with Couchbase. Key/Value access will ALWAYS be faster. That is just how things are. So what I promote to people is to use the power and flexibility Couchbase offers to get the performance and functionality I need where and when I need it. This will be a mix of key/value, traditional Couchbase views, global secondary indexes (GSI) and N1QL.

In this specific use case, I need to be able to write data at very high velocity, have a way to look up just some of the data in a very specific way and scale the data tier linearly to handle this with minimal complexity. Key/Value did that for me with the right object key pattern. Nothing prevents me from whipping out N1QL to query these events or logs how we have designed this schema. Notice I did not even really talk much about the JSON document object modeling itself. For what I am trying to show it did not matter and as for querying, I just did not need that tool from the toolbag.

All that being said, in the next blog on this object model I will dive in to look at how we can shred through these same event with N1QL and GSI where it makes sense.

Author

Posted by Kirk Kirkconnell, Senior Solutions Engineer, Couchbase

Kirk Kirkconnell was a Senior Solutions Engineer at Couchbase working with customers in multiple capacities to assist them in architecting, deploying, and managing Couchbase. His expertise is in operations, hosting, and support of large-scale application and database infrastructures.

Leave a reply