When you need to query documents using SQL, there are two options available in Couchbase. The Query service and the Analytics service. Our blog, N1QL: To Query or To Analyze? provides a detailed overview of both services. I highly recommend reading it before this one. This article aims to expand on the previous blog by adding some concrete, hands-on examples. For each example, we’ll cover how to write the query in both services and we’ll look at the performance differences. The goal is that readers will walk away with more knowledge to help quickly identify patterns and use cases that best fit each service.

Summary

Before jumping into examples. Let’s refresh ourselves on the high-level key characteristics of the two services.

Query Service
Analytics Service
Used for data manipulation within the application logic. Used for reports, analysis (historical, interactive), and dashboards.
Most efficient for short, operational queries that retrieve or manipulate smaller amounts of data. Most efficient for longer, complex, ad-hoc queries that typically retrieve and process large amounts of data.
Supports SELECT, INSERT, UPDATE, DELETE, MERGE operations. Only supports SELECT operations.

See https://www.couchbase.com/blog/n1ql-to-query-or-to-analyze/ for a complete table.

Setup

For this tutorial we will use Couchbase 6.5 and the sample data provided in the Couchbase Admin UI. My environment is a 3 node Couchbase 6.5 cluster with 1536 MB allocated to the Analytics service. All other settings are the defaults. If you don’t have access to a cluster, you can quickly run Couchbase 6.5 in a Docker container by running the following command:

If you’re going the Docker route, go to http://localhost:8091 in your browser after the container starts and setup your Couchbase instance using the default options at each step. It does not matter what name you give your Couchbase instance.

Performance Disclaimer

We will look at the response times for the examples that follow. It is important to note that your performance may vary greatly depending on how your Couchbase environment is setup. However, you should still be able to observe similar differences between the Query and Analytics services regardless of your environment.

Install Travel Sample Bucket

From the Couchbase Admin Dashboard, navigate to Settings -> Sample Buckets. Install the Travel Sample bucket. Detailed documentation on how to do this can be found over at our documentation site.

Query Service Setup

Installing the Sample Bucket also creates the necessary indexes. This means no additional setup is needed for the Query Service.

Analytics Service Setup

For the Analytics Service, we need to populate Datasets for each “type” of the document in our bucket. Navigate to the Analytics Workbench (http://localhost:8091/ui/index.html#!/cbas/workbench) and execute the following queries to create the Datasets:

This will create Datasets for Routes, Landmarks, Hotels, Airlines, and Airports using the travel-sample bucket. Finally executing the CONNECT statement will begin populating each of the Datasets.

In the examples that follow, we’ll be using simple N1QL queries. For a detailed breakdown of the N1QL language differences between Query and Analytics see the N1QL for Analytics vs. N1QL for Query reference page in our docs.

Use Case: Get All Routes For LAX to SFO

Now we are ready to write our first query. For this use case, we want to find all routes available for a given source and destination airport.

Which service is best?

This is definitely an operational query that will return a limited amount of data. It is a simple query that does not perform any aggregations or complex functions over our data. There’s only a simple filter on source and destination. Therefore the Query Service (http://localhost:8091/ui/index.html#!/query/workbench) is the obvious choice.

Performance:  4 milliseconds

This is a simple query and it also returns only 7 documents. This is a typical operational query that an application might send to Couchbase. Performance is reliably fast and consistent.

Analytics Service Equivalent

Let’s craft the same query for the Analytics Service for demonstration’s sake. The Analytics Service is overkill for a simple query like this. Therefore, if we were building an application for this use case, we wouldn’t choose the Analytics service. We’d expect it to underperform the Query Service.

Performance: ~36 milliseconds

As you can see from this example. The Query Service performs best for this use case as expected. Under heavy load, we’d expect the Query Service to perform even better than the 30+ millisecond difference that this simple test shows.

Use Case: Get the Cities With the Most Hotels

Which service is best?

For this use case, we want to figure out the number of hotels available in each city and sort the results by the cities with the most hotels first. This will require us to scan through all of our hotels and collect counts by Country and City and then sort it. Following the logic we laid out in the beginning, the Analytics Service (http://localhost:8091/ui/index.html#!/cbas/workbench) should perform better for this use case. Let’s test this theory.

Performance: ~36 milliseconds

Interestingly, the performance of this query is almost the same as our previous Analytics example (36ms) even though the previous query was much simpler and smaller computationally. This tells us the baseline performance in my 3 node environment may be around 36 milliseconds for Analytics queries. Even though this query is more complex than our first example, it is still relatively simple for the Analytics service.

Query Service Equivalent

Let’s craft the same query for the Query Service. In theory, this is a heavier query than our previous Query Service example. It’s also processing and returning much more data than the first example. We’d expect that the Query Service would not perform as well as the Analytics Service.

Performance: ~90 milliseconds

Here we have a truly large divergence in performance. As expected, the Analytics service is able to process the query 2x faster on average than the Query Service.

Use Case: Get The Airlines With the Most Routes

Which service is best?

This query is asking a similar question to our previous example. But the twist here is that we will need a join since the Airline data resides in a separate type of document in our bucket from the routes data. We expect the Analytics Service to perform better with this query because it is doing an aggregation and a JOIN.

Performance: 82 milliseconds

Here we can see that this query is actually starting to push the Analytics service a little bit. Our first Analytics query took 36 milliseconds on average and this one is pushing it up to 82 milliseconds. The major difference with this query is the addition of a JOIN.

Query Service Equivalent

Remember in the beginning we created separate Analytics “Datasets” for each type of document in the Travel Sample data. Each Dataset functions like its own table. So joining them in a query is simple if you’ve ever written SQL joins before. The Query Service doesn’t have any concept of “Datasets” in the same way as the Analytics service does. Therefore we have to write the query a little differently to account for all of the data residing in the same bucket. We need to join documents of type = “airline” to documents of type = “route”. We need a subquery to do this.

Performance: 2 seconds

The performance of the Analytics Service is drastically better for this use case due to the JOIN. Another added benefit of the Analytics Service for this case is that writing the JOIN is simpler since we didn’t need to create a subquery to join against.

Use Case: Get the Percentile Rank of Airlines With the Most Routes

Which service is best?

This example builds on the previous one by adding the percentile rank of route counts. We expect that the performance will be better with Analytics since we are adding even more complexity and computation to the query. Let’s see how big of an impact adding a Window function to our query has.

Performance: 85 milliseconds

Adding a Window function was no sweat for the Analytics service as we can barely see a performance difference.
Query Service Equivalent

Performance: 2 seconds

When we add the Window function to our Query service query, we are also unable to detect a performance hit. The major takeaway that we can infer from these results is that the JOIN is the biggest performance factor and the aggregation (COUNT in this case) is the second biggest.

Conclusion

We hope that this article has helped you achieve a better understanding of the two SQL options in Couchbase and when to apply them. Be sure to check out the following resources on Query and Analytics.

 

Author

Posted by Evan Pease

Leave a reply