Overview

The main purpose of this article is to show how Couchbase ANSI JOINs are tested. In Couchbase version 5.5, you can now execute joins using the ANSI standard syntax. This addition to joins makes them much more flexible, but also much more complex. This complexity makes it difficult for us to test in our normal framework, there are simply too many possibilities to manually write out for automation. Luckily one of our testing frameworks allows us to set the parameters of a query, and then generates random sets of queries based on those parameters. This framework is known as the Random Query Generator (RQG). In this article we will discuss how RQG is used to test the ANSI JOIN functionality that was implemented. First, let us take a brief look into what is different with ANSI JOINs.

In this article we will go over the following topics:

1. Introduction to ANSI JOINs

2. Overview of Random Query Generator(RQG)

3. USING RQG to test ANSI JOINs

4. Key Points

5. List of additional ANSI JOIN bugs found

6. Appendix

Introduction to ANSI JOINs

Previously in Couchbase you could only execute lookup joins and index joins. This was good for queries in which one side of the join can produce a document key of the other side of the join. However, the problem with this is that joins could not be executed on other fields of a key space. ANSI joins fix this problem and take it a step further, now joins can be executed on arbitrary fields or expressions that result in fields, or on multiple join conditions.

Old Join syntax:

        Lhs-keyspace [join-type] JOIN rhs-keyspace

ON KEYS [expression resulting in a meta().id of one of the keyspaces]

ANSI JOIN syntax:

lhs-expression [join-type] JOIN rhs-keyspace ON [join-condition]

Old Join Syntax Query:

Same query w/ ANSI JOINs Syntax:

The join types that are supported in N1QL are either INNER JOIN or LEFT/RIGHT OUTER JOIN. If you simply put JOIN this will be interpreted as INNER JOIN, and LEFT/RIGHT JOIN will be interpreted as OUTER JOIN. In the new join syntax, the left hand side of the join can be a keyspace, a nested n1ql query, a n1ql expression, or a join itself. The right hand side of the join must be a keyspace in the current implementation of ANSI JOINs inside of Couchbase. The ON-clause contains the conditions the join must meet, these conditions can be any predicate expression so long as there is a proper index on the right hand side keyspace. So you can do equality comparisons and non-equality comparisons (b1.field = b2.field or b1.field</>/=/!=/b2.field or b2.field is NOT NULL or complex expression = b2.field) This is a brief look into the new feature, for a deeper dive into the feature please check out this blog post written by Bingjie Miao(the developer who implemented the feature): https://blog.couchbase.com/ansi-join-support-n1ql/

Now we begin to understand that ANSI JOINs have very few limitations in comparison to our old joins. The left hand side of an ANSI join can be very complex, it can even be several joins itself, or a nested query that results in some json data, or several nested queries. Because of this level of complexity, it is a better idea to use something we call RQG, rather than coming up with individual queries to test. Here is a brief overview of how RQG works.

Overview of Random Query Generator(RQG)

RQG is a framework that was written by Couchbase that leverages another framework called Randgen. Basically we specify a set of parameters in a file called a grammar, and from those parameters Randgen will create SQL query templates (queries with placeholders for the field names and table names), then our framework processes these templates and converts them to N1QL formatted queries based off the dataset that is loaded into couchbase.

Example Query Template:

This template is generated from the grammar file that contains the parameters, basically the grammar file tells randgen to generate a query that selects a list of numeric type data from a join of two arbitrary buckets on an equality comparison. The grammar will have various types of fields to be selected from and various join conditions that can be slotted into the ON/ WHERE predicates. Once the template is generated as shown above, our framework will go through and replace these arbitrary values with values based on the data loaded in. For NUMERIC_FIELD_LIST it will select a field that is a list of ints/bigints/floats/decimals/etc depending on what is available. The framework will replace BUCKET_NAME with the specific name of a bucket that exists on the server. PREVIOUS_TABLE.FIELD will be a field from the first bucket and CURRENT_TABLE.FIELD will be the same field from the second bucket. After all this is done, it will go through the query and create a copy of the query where any differences between N1QL and SQL are ironed out. For example, in the old join syntax we could not use the keyword ON, so instead RQG would convert ON to ON KEYS for the n1ql queries.

Example Converted Query:

Then both the n1ql queries and the sql queries are run against the couchbase server and an equivalent MySQL server. The results of the N1QL queries are compared to the results of the equivalent MySQL queries. Our goal is to have our results match up with MySQL results because we want N1QL to be functionally similar to SQL. If you want to read more about how RQG is laid out or how it generates templates and converts them to n1ql please check out this 5-part series on RQG: https://dzone.com/articles/testing-n1ql-sql-for-json

USING RQG to test ANSI JOINs

The way RQG is setup allows us to generate a variety of queries we may not have otherwise thought of ourselves. Here is an extremely complicated template generated by RQG:

As you can see by the sheer size of this query, it would have been very difficult to create this query on our own, and many such queries are generated and executed through RQG. This ability to create highly complex queries makes it perfectly suited to tackle the arbitrary complexity of ANSI JOINs. The grammar that we have defined to test ANSI JOINs attempts to capture the complexity of the left-hand side of a join as well as the complexity of the ON clause. We tell RQG to generate the left hand side of a join as one of three things, a normal keyspace, a n1ql subquery, or several nested joins. We tell RQG to generate the right hand side of a join as a keyspace. Then we tell RQG to generate the ON clause of a join as mostly equality predicates between two fields one from the left hand keyspace or left hand alias, and one from the right hand keyspace. The other things that can be generated in the ON clause could be non-equality comparisons (</>/=/!=) or simply checking attributes of the right hand keyspace field (b2.field is not null, b2.field is not missing, b2.field is in a list, etc). For ANSI JOINs, there is a small difference between n1ql and sql. That difference is that N1QL supports the comparison IS (NOT) MISSING, this is because of the non-relational data model. Some buckets can have documents that have a field that other documents in that bucket do not have. So IS (NOT) MISSING means that the field does not appear in the document(or in the case of IS NOT MISSING means that the field appears in the document). Technically, if we wanted we could also tell RQG to generate n1ql subqueries inside the ON clause and compare those to the right hand keyspace values, however this would exponentially increase the run-time of our queries and is not a common use case. Here are some sample ANSI JOIN queries generated from our set of parameters above:

As you can see by these examples the ON clause can be multiple chained predicates, chained by AND/OR. We can also observe that multiple joins can be executed in one query, and that each join can have its own ON clause.

Now that it is understood how RQG generates the templates and what those templates look like, let us look at an example of a bug that we found in ANSI JOIN functionality that we probably would not have found otherwise. MB-27483, in this bug we had the following query:

This query should have executed successfully given the way our server was setup, however instead of executing it was throwing an error that the proper index did not exist on the right hand side keyspace. The problem here was the WHERE clause was interfering with the index selection of the ON clause, causing the query to think it did not have the index it needed when that index did in fact exist. If we look into the WHERE clause we see it is very complex, this is not the level of complexity that we would normally have in the functional side of testing. Thus we would not have spotted this bug without using RQG.

Key Points

ANSI JOINs are quite flexible in terms of what is possible to join together, you can join arbitrary expressions with keyspaces and the ON clause can contain predicates that compare arbitrary expressions to fields. This level of complexity makes it impractical to try and come up with specific queries to test every scenario on our own through functional testing. Luckily we have a framework called RQG that is capable of generating queries based on a rule set that we can define. This functionality is very powerful and has allowed us to test ANSI JOINs to our satisfaction. It has even helped us find bugs that would be hard to find otherwise, for example if a large number of complex queries were being run, ANSI JOIN results would be off by 1. However, if any of those individual queries was run in isolation by itself, the results were correct. That is why we are confident that ANSI JOINs are in good shape and ready for customer use.

List of additional ANSI JOIN bugs found with RQG:

Bug Number Bug Description
MB-27834This query was not properly using the LIMIT query hint, it would execute the whole query before applying the LIMIT
MB-27763This query was not properly using the LIMIT query hint, similar to the above bug
MB-27483The query thought that the appropriate index didn’t exist, prompting the query to throw an error. However the index was actually present in the system
MB-27230The wrong error was being returned by the query
MB-27201This query was running into an error with backfill not working properly
MB-27109Another query that thought the index it needed was unavailable, even though the index was present in the system
MB-27028When a bunch of queries were executed in succession some of the query results were off by 1. When the queries were executed by themselves all the results were correct.
MB-26649Query would intermittently hang

Appendix

  1. Couchbase Server 5.5 GA Blog https://blog.couchbase.com/announcing-couchbase-server-5-5/
  2. Index Joins https://dzone.com/articles/join-faster-with-couchbase-index-joins
  3. ANSI JOINs Couchbase Blog https://blog.couchbase.com/ansi-join-support-n1ql/
  4. Randgen https://launchpad.net/randgen
  5. RQG Dzone article series https://dzone.com/articles/testing-n1ql-sql-for-json

Posted by Ajay Bhullar

Leave a reply