A SQL++ Implementation

Summary

With the proliferation of JSON databases in recent years, a new query language SQL++ has begun to emerge that could standardize on how to access these data sources.  In fact the language has been around for a number of years now, but has existed mostly in the academic realm. Couchbase N1QL, a SQL language for JSON data, was released in 2015. The company has continued to develop its N1QL/SQL++ language and in October 2018, it announced the release of Couchbase Analytics.  Don Chamberlin, co-inventor of the original SQL, has also authored a SQL++ tutorial book with practical examples that work with Couchbase Analytics.

Overview

SQL++ is a database query language that is designed to work with both structured and semi-structured data. The language is based on the original SQL with extensions for it to work with JSON document database. In relational databases, data is represented in a tabular fashion. The rows in a table each have the same flat record structure, with identical field names and field types (according to the table’s schema). Semistructured databases relax these constraints, allowing records to be nested, to have different field names and types, and do not require a schema. Accordingly, SQL++ “extends” SQL, the query language standard used in the relational world, by relaxing its restrictions on the data model. By doing so, SQL++ retains the benefits of SQL, including its high-level declarative nature, while allowing it to handle the more flexible structures commonly found in the semi-structured world. Relational database vendors like IBM , Microsoft, and Oracle, as well as open-source systems like PostgreSQL and MySQL, have extended their own versions of SQL to work with JSON data. They add (often system-specific) extensions for JSON as a column type as well as new functions, and in some cases new syntax, to enable the manipulation of JSON documents[1][2][3][4][5] The ANSI/ISO SQL standard itself was extended in a similar fashion in 2016[6].

In contrast, as explained above, SQL++ was developed by relaxing SQL’s target data model in order to arrive at a query language where JSON data is treated as “first class” data and where relational data is a special case whose records are regular, identically typed, and not nested.[7]

History

SQL++ was originally developed by Yannis Papakonstantinou and others at the University of California, San Diego[8].  SQL++ was also used in the NSF funded FORWARD project[9] at the UCSD.

Applications

One of the early adopters of SQL++ was Apache AsterixDB, an open source Big Data Management System, originally co-developed by a team of faculty, staff, and students at UC Irvine and UC Riverside in 2009. Another early SQL++ adopter is Couchbase, Inc., a scalable JSON database vendor whose 6.0 release has adopted SQL++ for its Couchbase Analytics [8] offering.

Books

Donald D Chamberlin, one of the principal designers of the original SQL language specification, authored a tutorial for the SQL++ language[11] that includes practical examples to show how the language works with Couchbase Analytics.

Examples

SELECT & SELECT VALUE

One of the key differences between SQL and SQL++ is in the format of the result. Standard SQL, designed for row and table, returns the result set in a table format. SQL++, on the other hand, returns the result set in JSON format.

(Q1) List the customer id, name, zipcode, and credit rating of all customers, in order by customer id.

(Q2) Find the names of customers with a rating greater than 650.

(Q3) Shows the effects of SELECT VALUE (compare to Q2).

A SELECT VALUE query can be used with an object constructor to create labels or to give some structure to a query result, as in the following example.

(Q4) List customers with credit rating greater than 650, in order by descending credit rating, and again in ascending order by zipcode.

GROUPING, AGGREGATION and UNNEST

SQL++ supports the same SQL concept of grouping and aggregation. The UNNEST takes the contents of nested arrays, i.e. orders and join them with their parent object, i.e. customers.

(Q6) List the first order by order number and item number, together with total quantity for all the orders made on 2017-05-01.

GROUP AS

A query can generate output data at summary level. The level definition is provided in the GROUP BY clause. The Q6 query generates a summary of orders at the order number and order item number level. Often you will want to generate output that includes both summary data and line items within the summaries. For this purpose, SQL++ supports several important extensions to the traditional grouping features of SQL. The familiar GROUP BY and HAVING clauses are still there, and they are joined by a new clause called GROUP AS.

(Q7) List all orders by order number and item number, together with total quantity for all the orders made on 2017-05-01, also include all the orders and order items that made for each summary line. (Compare to Q6)