Blog Post

Calculating average document size of documents stored in Couchbase.

Alexis Roos Published

 

Couchbase offers an unique NoSQL database combining integrated cache and storage technology. Many customers and prospects are using it to store binary data and/or JSON documents. Often; it is necessary to determine average document size to be able to properly size a production system. See this great set of blogs from Perry to learn more about Couchbase sizing:

This blog entry details a simple way to calculate average document size (for binary data and JSON documents) using views.

This is meant to be used in a development or staging system only and not in a real production system as this will create overhead while indexing every document.


Views are functions that you write in Javascript and which allow to extract, filter, aggregate and find information for documents stored in Couchbase server.


First step in creating a view is to provide a map function which will filter entries for certain information and extract some of the information. The result of a map function is an ordered list of key/value pairs, called an index. The results of map functions are persisted on to disk by Couchbase server and will be updated incrementally as documents change.

Additionally and optionally, it is possible to create a reduce function which can sum, aggregate, or perform calculations on information.

Views are explained in our development guide at:
http://www.couchbase.com/docs/couchbase-devguide-2.1.0/understanding-views.html


Now let’s take a look at what it will take to create the view:


Map: the following code of the map function for the view generates a simple Key Value pair index with the Key being the key for the document and the Value the length of the document:


function (doc, meta) {

 if (meta.type == "json") {

   var size = JSON.stringify(doc).length;

   emit(meta.id, size);

 }

 if (meta.type == "base64") {

   var size = decodeBase64(doc).length;

   emit(meta.id, size);    

 }

}


The function works for JSON documents (using JSON.stringify) and Binary documents (using decodeBase64)


The function could be augmented to be able to calculate average size for specific documents as needed.



Reduce: finally, we create a reduce function (Thanks Aaron at Couchbase (https://twitter.com/apage43)  for helping create it!) for the view which calculates average document size by aggregating the values and dividing by the number of values. It follows map and reduce construct so looks a little bit complicated at first but will output count, sum and average (the default built-in functions do not provide the average and I wanted to make this function very easy to use).


function (keys, values, rereduce) {

 if(!rereduce) {

   var total = 0,

       count = 0;

   for (v in values) {

     total+= values[v];

     count++;

   }

 } else {

   var count = 0;

       total = 0;

   for (v in values) {

     total += values[v]['total'];

     count += values[v]['count'];

   }    

 }

 var average = total / count;

 return {count: count,

           total: total,

           average: average};

}




Output should look like this in Couchbase view editor:




Before running the query, it is necessary to specify reduce to true:

 

 



Once the view has been created and saved, click on Show Results and this will display a single line with the average document size, similar to this:





Voila!


Please send comments and questions at alexis@couchbase.com