One of the most frequent questions I receive when it comes to NoSQL is on the subject of joining data from multiple documents into a single query result. While this question is brought up more frequently from RDBMS developers, I also receive it from NoSQL developers.

When it comes to data joining, every database does it different, some of which require it to be done through the application layer, rather than the database layer. We’re going to explore some data joining options between database technologies.

MongoDB is a popular NoSQL technology, so we’ll be seeing how much easier it is to join documents in Couchbase by comparison.

The Sample Data

For this example, we’ll be basing both MongoDB and Couchbase off two sample documents. Assume we’re working with a classic order and inventory example. For inventory, our documents might look something like this:

While flat, the above document can properly explain one particular product. It has a unique id which will be involved during the join process. For orders, we might have a document that looks like the following:

The goal here will be to join these two documents in a single query using both MongoDB and Couchbase. However, query language aside, these documents can always be joined via the application layer through multiple queries. This is not the result we’re after though.

Joining Documents with MongoDB and the $lookup Operator

In recent versions of MongoDB there is a $lookup operator that is part of the aggregation queries. Per the MongoDB documentation, this operator performs as the following:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.

To use the $lookup operator, you’d have something like this:

Now this is great, but it doesn’t work on relationships found in arrays. This means that the $lookup operation cannot join the product_id found in the products array to another document. Instead the array must be “unwound” or “unnested” first which adds extra complexity to our query:

The $unwind operator will flatten the array and then do a join on the now flat objects that were produced. The result of such query would look like this:

Had there been more than one reference in the array, there would have been more results returned. However, what is returned isn’t very attractive. We still have the old products object and now a productsObject array. Further manipulations to the data stream needs to happen.

The productsObject array should be “unwound” and then reconstructed to how we want it. This can be accomplished by doing the following:

Notice that the aggregate query is now getting more complex. After doing the join, the result is “unwound” and then the result is reconstructed using the $project operator.

At this point further manipulations to the result can be made such as grouping the results so that the products objects become a single array again. Each manipulation to the data set requires more aggregation code which can easily become messy, complicated, and difficult to read.

This is where Couchbase N1QL becomes so much more pleasant to work with.

Using Couchbase and N1QL to Join NoSQL Documents

Let’s use the same document example that we used for MongoDB. This time we’re going to write SQL queries with N1QL to get the job done.

The first thing that comes to mind might be to use a JOIN in SQL. Our query might look something like this:

In the above example, both the documents exist in the same Couchbase Bucket. A JOIN against document ids happens based on the product_id values found in the products array. The above query would yield results that look like this:

Like with MongoDB, there will be a result for every item of the products array that matches. In fairness, while the N1QL version was easier to write, it wasn’t necessarily any more difficult than the MongoDB Query Language at this point. As we manipulate the data more, Couchbase becomes a lot easier by comparison.

For example, let’s say we wanted clean up the results:

There are some major differences in what we’re doing in the above, but minor differences in how we’re doing them. Instead of joining directly on the array, we are first flattening or “unnesting” the array, like what we saw in the MongoDB $unwind operator. The join is now happening on each of the flattened results. Finally, the quantity from the original object is added to the new object.

The result to the above query would look something like this:

Let’s say that the original products array had more than one product reference in it. Instead of returning several objects based on the JOIN criteria we saw above, it might make sense to re-pack that original array.

In the above query we’ve only added ARRAY_AGG and a GROUP BY, but as a result, each joined document shows up in the products array instead of the id value.

Don’t want to use an actual JOIN operator? Try using a SQL subquery instead.

Conclusion

Joining data in NoSQL is a very popular concern for developers that are seasoned RDBMS veterans. Because MongoDB is a very popular NoSQL technology, I thought it would be good to use as a comparison to how Couchbase handles document joining. For light operations, MongoDB’s $lookup operator is tolerable, but as queries become more complex, you may need to take a step back. With N1QL, writing complex queries that include joining operations become very easy and stay easy regardless on how complex the query is.

For more information on N1QL and Couchbase, visit the Couchbase Developer Portal.

Posted by Nic Raboy, Developer Advocate, Couchbase

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in Java, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Apache Cordova. Nic writes about his development experiences related to making web and mobile development easier to understand.

Leave a reply