Blog Post

Caching queries in Couchbase for high performance

Alexis Roos Published

Starting from version 2.0, Couchbase server offers a powerful way of creating indexes for JSON documents through the concept of views.


Using views, it is possible to define primary indexes, composite indexes and aggregations allowing to:

. query documents on different JSON properties

. create statistics and aggregates


    Views generate materialized indexes so provide a fast and efficient way for executing pre-defined queries.

    However in Couchbase 2.x, indexes are stored to disk and read from disk for each query, which has some performance implications.

    In the future Couchbase will allow caching indexes into the managed cache similar to what is done for JSON documents to speed up queries.


    In the meantime, this blog provides a simple example of how query results can be cached into Couchbase to be retrieved from the cache instead of being served from index on disk.

    This is useful for scenarios where a query for an index does not need to be up to date immediately (minutes or more are ok) but is read often (multiple times a second). In this case, the query results will be calculated only every so often based on application needs and read from managed cache the rest of time.

    A good use case example for this, is a game leaderboard. A view can be used to create an index for top scores for a particular game and that view can be queried every few mins (say 5 minutes) and cached into Couchbase Server. All requests for the view will go against the cached value and as such will only take ms and do not need any index querying on the server.


    Note that, the method above is independent from automatic updating of indexes. By default, every index in Couchbase is updated every 5 seconds or 5000 updates, both tunable through the REST API. Learn more about that at: http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-views-operation-autoupdate.html


    So this means that while the index can be kept up to date, specific queries, which do not need to be up to date, can be cached for higher throughput and lower latency. The only caveat is that maximum length for values in Couchbase is 20MB so cached queries should not be used for super large result sets although it always possible to split results into multiple cached values for larger sets.


    This is fairly simple to implement, let’s take a look at how can we do this in Java.


    I will use the bee-sample database, which comes with Couchbase server. If you have not installed it already, go into Settings and select beer-sample then click on Create:

     


     

    This comes with a brewery_beer view, which I will use to build the caching example:


    Now let’s take a look at a simple Java application that can be used to execute and cache a query and compare against executing the query every time.


    The Java code below, first connects to the bee-sample database and:

    . executes the query 1 time and reads it from the cache n times or

    . executes the query n times


    In both cases, a timer is started before and after to measure the execution time.


    The code is very straightforward, uses no parameters for the query but use includeDocs to retrieve all JSON documents associated to the results of the query vs just the document IDs.


    To learn more about views and queries in Couchbase, read: http://www.couchbase.com/docs/couchbase-devguide-2.1.0/indexing-querying-data.html



    The full source code is:


    // @author Alexis Roos

    package com.couchbase.dev.examples;


    import com.couchbase.client.CouchbaseClient;

    import com.couchbase.client.protocol.views.*;


    import java.net.URI;

    import java.util.LinkedList;

    import java.util.List;


    public class CachedQuery {


       public static void main(String args[]) {


           List<URI> uris = new LinkedList<URI>();

           uris.add(URI.create("http://127.0.0.1:8091/pools"));


           CouchbaseClient client = null;

           try {

               client = new CouchbaseClient(uris, "beer-sample", "");


               int requestCount = 100;


               double t1 = System.currentTimeMillis();

               View view = client.getView("beer", "brewery_beers");

               Query query = new Query();

               query.setIncludeDocs(true).setLimit(10000);

               query.setStale(Stale.FALSE);


               // Doing query a single time and caching it

               ViewResponse result = client.query(view, query);

               client.set("cachedBrewery_beersQuery", 0, result.toString());


               // Using cache for subsequent requests

               for (int i = 0; i < requestCount - 1; i++) {

                   String cachedIndex = (String) client.get("cachedBrewery_beersQuery");

               }

               double t2 = System.currentTimeMillis();

               System.out.println("Test with cache finished in " + (t2 - t1) / 1000 + " seconds");


               t1 = System.currentTimeMillis();

               // Querying every single time

               for (int i = 0; i < requestCount; i++) {

                   result = client.query(view, query);

               }

               t2 = System.currentTimeMillis();

               System.out.println("Test without cache finished in " + (t2 - t1) / 1000 + " seconds");


               client.shutdown();


           } catch (Exception e) {

               System.err.println("Error connecting to Couchbase: " + e.getMessage());

               System.exit(0);

           }

       }

    }


    Running the code outputs both test results, which for 100 serial queries yields:


    Test with cache finished in 3.755 seconds

    Test without cache finished in 19.835 seconds


    Not only test with cache is a lot faster but it also requires fewer resources on the Couchbase server.

    The following graph shows the ops per second metric for the beer-sample bucket and the first small bump corresponds to test with cache (essentially mapping to the number of documents for breweries and beers as the query is ran only once), whereas the rest of the larger curve shows that the query has been executed many times and as such resulted in many more operations per second.

     





    Using caching for querying views is easy and it is simple to set up a program, which will periodically query the view and store the result into Couchbase server where it will be cached. In turn applications can use this cached value for efficiency.

    This should be used as appropriate based on application use cases.