Analytics - Love the Doc Model You’re With

In business applications, the data is often modeled for a large number of concurrent low latency queries. If you want to gain insight by looking at trends, however, you end up wishing you had an entirely different data model. The traditional means for squaring this circle had been to move, transform, and load the data elsewhere, but this introduces its own raft of problems, including unacceptable latency, multiple sources of truth, and a lot of expense.

Couchbase customers know that the Analytics service provides an easy way to handle real-time analytical and trend reporting on the data they have in production right now. An example of this recently came up when we worked with a customer looking to identify high-end customer activity associated with a corporate partner loyalty program. The underlying document model was clearly designed with the interactive application, rather than reporting, in mind. (This is not at all uncommon, as you may know from painful experience.) Let’s take a quick look at the problem and how we solved it.

Example document

The document model in our case (supporting an online booking application) is comprised of four sections. The first section includes basic document and app identifiers. The second describes the booking information about an excursion. The third contains details on one or more itineraries associated with the booking, along with passenger requirements for one or more passengers. The final section describes the corporate loyalty programs to which each of the passengers might belong.

{

"_type": "booking",

"_header": {

"created": 1562888960,

"source": "app",

"version": "v1.1"

"booking": {

"status": "BOOKED",

"bookingType": "agency",

"details": {

"agent": "FBL33",

"contact": "Arlene",

"seats": 2,

"excursion": {

"embarking": 1562958000,

"equipment": "123X",

"line": "SRF",

"fromStation": {

"code": "LAX",

"facilityType": 1

"toStation": {

"code": "SOL",

"facilityType": 2

"bookingAgency": "PC",

"agencyType": "3"

}

"itinerary": [

{

"daysOnboard": 1,

"passengers": [

{

"passengerNumber": 1,

"specialAccomodations": false

{

"passengerNumber": 2,

"specialAccomodations": false

}

"itineraryType": "business"

}

"passengerDetails": [

{

"loyaltyId": "aaaabbbbccccdddd",

"passId": 1,

"programType": {

"corporatePartner": true,

"partnerId": 1

}

{

"loyaltyId": "eeeeffffgggghhhh",

"passId": 2,

"programType": {

"corporatePartner": false

}

]

}

Query elements

In order to complete the analysis, my customer was required to pull or filter on the following fields:

status, equipment, embarking (converted to human-readable format), line, _type, daysOnboard, passengerNumber, loyaltyId, partnerId

The problem, of course, is that these fields exist in entirely different hierarchical levels within the document model. Some are scalar values, readily accessible from a simple query:

status, equipment, embarking, line, _type

Another is an element within an array (comprised of trip itineraries), which must be unnested:

daysOnboard

Within this same array is a second array (comprised of passenger details), an element of which must be used as a join key:

passengerNumber

This join key is used to access elements from within a third array, which for business application reasons is not nested within the second:

loyaltyId, partnerId

These different levels equate to different access paths, adding some complexity to the analysis. Fortunately N1QL for Analytics provides the syntactic tools we need. Below is a step-by-step description of the process you might use to build your query.

Step 1 – simple select of one scalar element

This step ought be fairly clear to people with SQL experience. We use a select statement to retrieve a scalar value from the lines bucket. We qualify the status field as part of the booking section and we limit the number of records to return.

select booking.status

from lines

limit 1;

Query results:

				
				1
2
3
4
5

						[
  {
    "status": "BOOKED"
  }
]

Step 2 – Unnest and add element from first array

Next we add data from the itinerary section of the document. Because these elements are embedded within an array, however, we must first unnest them.

select l.booking.status,

i.daysOnboard

from lines l

unnest l.itinerary i

limit 1;

Query results:

[

{

"status": "BOOKED",

"daysOnboard": 1

}

]

Step 3 – Unnest and add element from second (within first) array

Now we add elements from the embedded passengers array. (Note that we increase our limit to make sure that we really are accessing more than one element from the array.)

select l.booking.status,

i.daysOnboard,

p.passengerNumber

from lines l

unnest l.booking.itinerary i

unnest i.passengers p

limit 2;

Query results:

				
					
				1
2
3
4
5
6
7
8
9
10
11
12

						[
  {
    "status": "BOOKED",
    "daysOnboard": 1,
    "passengerNumber": 1
  },
  {
    "status": "BOOKED",
    "daysOnboard": 1,
    "passengerNumber": 2
  }
]

					

			

Step 4 – Unnest and add element from third array, accessible via join

The elements from the third array (passengerDetails) must be unnested and tied to the elements of the passengers array above. We do this via the where clause.

				
					
				1
2
3
4
5
6
7
8
9
10

						select l.booking.status,
       i.daysOnboard,
       p.passengerNumber,
       pd.loyaltyId
from lines l
unnest l.itinerary i
unnest i.passengers p
unnest l.passengerDetails pd
where p.passengerNumber = pd.passId
limit 2;

					

			

Query results:

				
					
				1
2
3
4
5
6
7
8
9
10
11
12
13
14

						[
  {
    "status": "BOOKED",
    "daysOnboard": 1,
    "passengerNumber": 1,
    "loyaltyId": "aaaabbbbccccdddd"
  },
  {
    "status": "BOOKED",
    "daysOnboard": 1,
    "passengerNumber": 2,
    "loyaltyId": "eeeeffffgggghhhh"
  }
]

					

			

Step 5 – Add remaining query elements

Other fields are required to complete the query. Note especially the _type field added to the where clause. In all likelihood in a production system, a bucket will contain documents of multiple types. Query results might be filtered in the query itself (as in the example below) or as part of the creation of the Analytics dataset.

				
					
				1
2
3
4
5
6
7
8
9
10
11
12

						select l.booking.status, l.booking.details.excursion.equipment, l.booking.details.excursion.line,
       i.daysOnboard,
       p.passengerNumber,
       pd.loyaltyId, pd.programType.partnerId,
       millis_to_str(l.booking.details.excursion.embarking*1000) embarking
from lines l
unnest l.itinerary i
unnest i.passengers p
unnest l.passengerDetails pd
where p.passengerNumber = pd.passId
  and l._type = "booking"
  and str_to_millis("2019-07-12T19:00:00Z") = l.booking.details.excursion.embarking*1000;

					

			

Query results:

				
					
				1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

						[
  {
    "embarking": "2019-07-12T19:00:00Z",
    "status": "BOOKED",
    "equipment": "123X",
    "line": "SRF",
    "daysOnboard": 1,
    "passengerNumber": 1,
    "loyaltyId": "aaaabbbbccccdddd",
    "partnerId": 1
  },
  {
    "embarking": "2019-07-12T19:00:00Z",
    "status": "BOOKED",
    "equipment": "123X",
    "line": "SRF",
    "daysOnboard": 1,
    "passengerNumber": 2,
    "loyaltyId": "eeeeffffgggghhhh"
  }
]

					

			

Try it out for yourself

Head straight to https://docs.couchbase.com/server/6.0/analytics/quick-start.html#Using_docker and get started right away with a Docker-based tutorial. Or if you prefer, download Couchbase Server 6 Enterprise from this page: https://www.couchbase.com/downloads

Peter Reale

Products

See How Capella Stacks Up

See How Capella Stacks Up

By Industry

By Need

Why NoSQL

What is NoSQL and why choose it?

Popular Docs

By Developer Role

Capella Playground

Start A Free Capella Trial

Resource Center

Education

Certification Exams 2023

Get Couchbase certified

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Analytics – Love the Doc Model You’re With

Example document

Query elements

Step 1 – simple select of one scalar element

Step 2 – Unnest and add element from first array

Step 3 – Unnest and add element from second (within first) array

Step 4 – Unnest and add element from third array, accessible via join

Step 5 – Add remaining query elements

Further reading

Try it out for yourself

Author

Posted by Peter Reale

Leave a reply Cancel reply