Data analytics capabilities in Couchbase just got more robust with the 7.0 release.

I’m excited to announce two new powerful capabilities now available as part of Couchbase Analytics: 1) the CUBE functionality, and 2) ROLLUP aggregation, both to help you aggregate data across multiple document attributes.

The addition of the CUBE extension functionality to the Couchbase Analytics Service is part of the Couchbase Server 7.0 release. (We covered the ROLLUP functionality in an article last week.)

A Walkthrough Example of the New CUBE Functionality

As in SQL, the CUBE aggregation capability is an extension of the GROUP BY clause.

The CUBE extension sub-clause allows you to generate subtotals for all combinations of grouping columns specified in the GROUP BY clause. Specifically, the result includes extra rows representing subtotals in your data – commonly referred to as super-aggregate rows – along with the usual grand total row.

For example, let’s say you have a sample analytics collection called Orders that tracks ecommerce information. The key data elements you’re most interested in tracking are region/state, order date, quantity and price. (And let’s say you calculate Sales as Sales = quantity * price.)

Below is a sample of the ecommerce data for our example:

Now, let’s say your business users would like to know the following from your ecommerce data:

1) Total sales for all years by region
2) Total sales for all regions by year
3) Total sales for each region by year
4) Total sales for all regions and all years

The CUBE extension is a good use case to answer these aggregation-based data analysis questions. Your data engineer or data analyst simply uses the N1QL query shown below within the Couchbase Analytics Service.

In the above N1QL query, we’ve used year and region as variables for convenience. As a result, CUBE(region, year) generates the following query results for our sample data:

Note that the output is exactly what your users were originally looking for:

  1. At the top of the results, a grand total shows the total sales over all years and regions. In the query output, the NULL values in the country and region attributes are transformed to “All Years” and “All Regions” respectively to make the output more readable.
  2. Then, you’ll see the result of another subtotal summarizing All Regions by each Year, e.g., 2020 and 2021, displaying the total sales. In these JSON fields, values in the region attribute are transformed from NULL to “All Regions”.
  3. After that, we see another subtotal summary for “All Years” displaying the total sales for each region, e.g., Alabama & California. In these fields, values in the region attribute are transformed from NULL to “All Regions”.
  4. Lastly, the output shows the total sales rolled up for each region for each year. In this example, Sales for California for the Year 2020

Conclusion

I hope you are excited about this new aggregation CUBE capability that extends the powerful analytical expressions available to you in Couchbase Analytics. For more information, check out the CUBE documentation. I hope you find this new capability useful for future data analytics projects at your enterprise.

I look forward to your feedback on the Couchbase Forums – let us know what you think!

Take Couchbase Analytics for a test drive:<br/ >Download Couchbase Today

 

Author

Author

Posted by Idris Motiwala

Idris is a Principal Product Manager, Analytics at Couchbase with 20+ years experience in design, development and execution of software products at both Fortune 500s and startups leading teams in digital transformation, cloud and analytics. Idris holds an MS in Technology Management and certifications in product management .

Leave a reply