I’m excited to announce two new powerful data analysis capabilities now available as part of Couchbase Analytics: 1) ROLLUP aggregation, and 2) CUBE functionality, both to help you aggregate data across multiple document attributes.

The addition of the ROLLUP aggregation functionality to the Couchbase Analytics Service is part of the Couchbase Server 7.0 release. (We’ll cover the CUBE functionality in an article next week.)

ROLLUP Aggregation Example: Ecommerce Data

The ROLLUP aggregation function is an extension of the GROUP BY clause.

The ROLLUP sub-clause allows you to include extra rows – commonly referred to as super-aggregate rows – that represent subtotals in your data along with the grand total row.

For example, let’s say you have an analytics collection called Sales that tracks ecommerce information for your business. The key data elements you’re most interested in tracking are country, region/state, product, quantity and price. (And let’s say you calculate Sales as Sales = quantity * price.)

Below is a subset of our example ecommerce data collection:

Now, let’s say your users need to determine the following information using the data above:

1) Total sales for all countries and regions
2) Total sales for all regions by country
3) Total sales for each region

The ROLLUP aggregation sub-clause is an excellent choice for answering these sorts of data analysis questions. Your data engineer or data analyst simply uses the N1QL query shown below.

In the above N1QL query, we use country and region as variables. The ROLLUP sub-clause assumes a hierarchy among the specified data attributes.

For example, if the input column is (country, region), the hierarchy country is higher or greater than region. This is why it makes sense to use ROLLUP to generate the subtotals and the grand total for reporting purposes.

Based on the query above, ROLLUP(country, region) generates the following results:

Note that the output is exactly what your example users were originally looking for:

  1. At the top of the results, a grand total summary JSON field shows the total sales of all countries and regions. In these rows, the NULL values in the country and region attributes are transformed to “All Countries” and “All Regions” respectively to make the output more readable.
  2. After that, for each country (in this case, the U.S.), an extra subtotal summary JSON data element appears displaying the total sales. In these, values in the region attribute is transformed from NULL to “All Regions”
  3. Lastly, the output shows the total sales rolled up for each region within the country. In this example, sales for California, Colorado and Connecticut are shown within the U.S.

Conclusion

This has been just a quick look at the new ROLLUP aggregation feature in the Couchbase Analytics Service. For more information, check out the ROLLUP documentation. I hope you find this new capability useful for future data analysis projects at your enterprise.

I look forward to your feedback on the Couchbase Forums – let us know what you think!

You don’t have to take my word for it: Take Couchbase Analytics for a spin today<br/ >Give Couchbase a Test Drive

 

Author

Posted by Idris Motiwala

Idris is a Principal Product Manager, Analytics at Couchbase with 20+ years experience in design, development and execution of software products at both Fortune 500s and startups leading teams in digital transformation, cloud and analytics. Idris holds an MS in Technology Management and certifications in product management .

Leave a reply