In business applications, the data is often modeled for a large number of concurrent low latency queries. If you want to gain insight by looking at trends, however, you end up wishing you had an entirely different data model. The traditional means for squaring this circle had been to move, transform, and load the data elsewhere, but this introduces its own raft of problems, including unacceptable latency, multiple sources of truth, and a lot of expense.

Couchbase customers know that the Analytics service provides an easy way to handle real-time analytical and trend reporting on the data they have in production right now. An example of this recently came up when we worked with a customer looking to identify high-end customer activity associated with a corporate partner loyalty program. The underlying document model was clearly designed with the interactive application, rather than reporting, in mind. (This is not at all uncommon, as you may know from painful experience.) Let’s take a quick look at the problem and how we solved it.

Example document

The document model in our case (supporting an online booking application) is comprised of four sections. The first section includes basic document and app identifiers. The second describes the booking information about an excursion. The third contains details on one or more itineraries associated with the booking, along with passenger requirements for one or more passengers. The final section describes the corporate loyalty programs to which each of the passengers might belong.


 

Query elements

In order to complete the analysis, my customer was required to pull or filter on the following fields:

  • status, equipment, embarking (converted to human-readable format), line, _type, daysOnboard, passengerNumber, loyaltyId, partnerId

The problem, of course, is that these fields exist in entirely different hierarchical levels within the document model. Some are scalar values, readily accessible from a simple query:

  • status, equipment, embarking, line, _type

Another is an element within an array (comprised of trip itineraries), which must be unnested:

  • daysOnboard

Within this same array is a second array (comprised of passenger details), an element of which must be used as a join key:

  • passengerNumber

This join key is used to access elements from within a third array, which for business application reasons is not nested within the second:

  • loyaltyId, partnerId

These different levels equate to different access paths, adding some complexity to the analysis. Fortunately N1QL for Analytics provides the syntactic tools we need. Below is a step-by-step description of the process you might use to build your query.

Step 1 – simple select of one scalar element

This step ought be fairly clear to people with SQL experience. We use a select statement to retrieve a scalar value from the lines bucket. We qualify the status field as part of the booking section and we limit the number of records to return.


 

Query results:


 

Step 2 – Unnest and add element from first array

Next we add data from the itinerary section of the document. Because these elements are embedded within an array, however, we must first unnest them.


 

Query results:


 

Step 3 – Unnest and add element from second (within first) array

Now we add elements from the embedded passengers array. (Note that we increase our limit to make sure that we really are accessing more than one element from the array.)


 

Query results:


 

Step 4 – Unnest and add element from third array, accessible via join

The elements from the third array (passengerDetails) must be unnested and tied to the elements of the passengers array above. We do this via the where clause.


 

Query results:


 

Step 5 – Add remaining query elements

Other fields are required to complete the query. Note especially the _type field added to the where clause. In all likelihood in a production system, a bucket will contain documents of multiple types. Query results might be filtered in the query itself (as in the example below) or as part of the creation of the Analytics dataset.


 

Query results:


 

Further reading

Unnest is a powerful feature of N1QL for Analytics, providing you with the means to master multiple embedded elements within your data. Further reading on its syntax can be found here: https://docs.couchbase.com/server/6.0/analytics/3_query.html#Unnest_clauses

A complete guide to N1QL for Analytics–I’m proud to own a signed copy of this one–can be found here: https://www.amazon.com/SQL-Users-Tutorial-Don-Chamberlin/dp/0692184503/

Try it out for yourself

Head straight to https://docs.couchbase.com/server/6.0/analytics/quick-start.html#Using_docker and get started right away with a Docker-based tutorial. Or if you prefer, download Couchbase Server 6 Enterprise from this page: https://www.couchbase.com/downloads

Posted by Peter Reale

Peter Reale is a Senior Solutions Engineer at Couchbase and has been in the data business since 1984. He is based in Los Angeles.

Leave a reply