Since we've announced that our new Java SDK is completely based on reactive and asynchronous components, our users keep asking what the benefits over their tried and true synchronous access patterns are. While very often “performance” is one of the driving factors, there is much more to be considered. Besides error handling (which will be covered in a separate post), the following features stand out:

  1. Building data flows with streams of data
  2. Better resource utilization without headaches

The purpose of this blog post is to convert common data access patterns from synchronous to reactive and as a result to give you a glimpse of whats possible with this new approach. Don't worry if you've never written “reactive” code yet – you will pick up some basics along the way. Also, keep in mind that we still support a synchronous API. The synchronous API is just a thin wrapper around the reactive one, allowing you to gradually migrate your application to a more powerful approach, without having to jump in the deep end right away.

Oh and by the way, we have a webinar coming up on January, 22nd where Simon Basle, one of our SDK engineers, is going to introduce the new Java SDK and also showcase some of the reactive bits. Join him and ask questions!

The Lookup Pattern

A very common pattern when accessing a database or a key/value store is the lookup pattern. You load a document, and based on the content of the document you load more documents. Let's consider the classical “blog post and comments” example. Imagine you have the following blog post stored in Couchbase:

It contains a list of comment IDs which can be used to load the full details of each content as needed. A comment might look like this:

Now let's say we want to load the first two comments that are published when the blog post is loaded, so they are displayed below the actual posting. Here is one way to achieve this with synchronous access and the new SDK:

Leaving error handling aside, this approach has a few drawbacks:

  • We need to wait at least 3 times for the network to respond, keeping our thread idle while it could potentially perform valuable work instead.
  • It is very hard to apply a global timeout to the whole process, since every operation has an individual timeout.

 

Let's convert this to a reactive approach, also using Java 8 lambdas (Java 6/7 are also supported, you just need to use anonymous classes instead of callbacks):

Based on the naming of the operators, it's quite clear on what's going on. The code loads the blog post. Once it arrives, it extracts the list of comment IDs and passes it on to the next method which actually loads the comment documents. The filter method only lets those comments pass through that are published. Afterwards, we only take the first 2 comments and then unsubscribe to avoid further work (like loading more documents, when we already have enough). Finally, we aggregate all found comments into a list, apply a global timeout and then block.

This code “fans out” to load all needed comments in parallel, hitting more nodes in the cluster at the same time and as a result returning the desired results more quickly. Also, the code applies a global timeout to the whole “operation”, which is very hard to get right with synchronous code. Finally the whole code is composable. You can encapsulate the logic in a method without applying a timeout or specifying how much comments to “take”. The upper layer can then chain in the operators they need to perfom the work as desired.

Also, you can see that it is very easy to go from the more powerful concept (async, reactive) to the less powerful one (sync). The other way round it is not possible.

Note that we still block at the very end, and that's okay. Most applications will block at some point (maybe even right before returning a response to the user). While going “reactive” the full stack gives you the best performance and resource utilization, you can still make larger portions of your code execute asynchronously and hugely benefit.

Query Execution

Naturally, every database allows to you to query its stored documents based on some criteria. Most of the time, more than one record will be returned, sometimes even thousands in one batch. Once the data is returned you very often also need to modify, combine or filter the content based on your application requirements.

Let's consider the following example: you are a telecommunications provider and you are storing user records in your bucket. At the end of the month you want to make sure that all of your new customers signed up this month actually got their new phone delivered. Your contracted parcel shipping company provides a web service where you can query the status of the delivery.

Here is the mock implementation of the provider:

Given a UUID which represents the parcel ID, we randomly return if a package has been delivered already or not.

Here is how a user record could look like:

We use N1QL to grab all users in the last month and then pass the parcel ID to our provider. Here is the synchronous version:

Thise code doesn't look so bad, right? What about if I tell you that your parcel provider has an incredibly bad web server which sometimes returns queries in 10 seconds instead of 500 milliseconds? Also, we loose the possibility of streaming our results and then perform actions right as they arrive.

Imagine we got lucky and 2000 new customers signed up. We first need to wait until the 2000 users are queried and then we need to perform 2000 serial queries against our parcel provider's web server. Of course we can fan them out to an executor service, but then we need to do the orchestration and aggregation on our own.

We can do much better with the reactive version:

Here, we are streaming the results as they arrive from the server. Once we have a result, we extract the parcel ID and send it over to the parcel server. Once a response arrives, we take it and group it by the shipping state. Finally, we apply some nice formatting and print it out. Since we never block, we send all requests to the parcel server and group them once they come back (in any order). If some of the requests take longer, we don't care really because we can finish processing the others first and are not stuck. We can also apply an overall timeout.

Summary

These two examples clearly show how your application can today benefit from asynchronous and reactive execution. It provides a clear path away from blocking, resource hogging database drivers that slow down your application servers and your response times.

We barely scratched the surface on what's possible. We are working on sample projects and extended documentation to provide you a clear guiding path in this new way of writing database accessing applications. Let us know if you want to see your use case covered and/or how it can be transformed.

Finally, again a quick reminder that in a few days, there will be a webinar about the new Java SDK. Join in, listen to the introduction and then don't hesitate to ask your questions!

Author

Posted by Michael Nitschinger

Michael Nitschinger works as a Principal Software Engineer at Couchbase. He is the architect and maintainer of the Couchbase Java SDK, one of the first completely reactive database drivers on the JVM. He also authored and maintains the Couchbase Spark Connector. Michael is active in the open source community, a contributor to various other projects like RxJava and Netty.

One Comment

  1. Jacek Laskowski March 10, 2015 at 8:23 pm

    A very useful blog post. I wish there were more about how the respective methods are doing under the covers so the entire data flow is async and non-blocking. How is that that once a row/record arrives it passes on to the following method call? I can barely imagine it, and hence the questions about the internals of the SDK.

Leave a reply