Note from March 2017: detailed information on the event bus and metrics collection can be found in the official documentation. Some information in this article may be outdated.

While europe was melting away in the summer heat, Simon (from Paris), Sergey (from Minsk) and I (from Vienna) reused the heat and did bake a new release for you. It is the second developer preview of the upcoming 2.2.0 release. Apart from bugfixes (which also made it into the 2.1.4 release), it brings the following enhancements and new features

  • Extended support for N1QL and Multi-Dimensional Scaling (MDS)
  • Sync and Async API enhancements
  • Supportability enhancements with metrics
  • Various dependency upgrades and DCP changes

Here is how you can get it right now:

https://gist.github.com/daschl/5b32706a0a4fe50cfaa4.js

Extended N1QL and MDS Support

The N1QL DSL functionality has been further extended to support a variety of N1QL functions, including (but not limited to) aggregation, array, comparison, date, meta, pattern matching and string functions. All of those functions are located under the “com.couchbase.client.java.query.dsl.functions” namespace and should be imported as static helper methods for convenience.

Since Multi-Dimensional Scaling also affects Memcached buckets (not every node has to be a data node), the SDK now automatically makes sure that only those data nodes are used in the ketama hashing algorithm. This is completely transparent to the user, but it is important to pick 2.2.0 or later if you want to use Couchbase Server 4.0 with MDS and memcached buckets. The 1.4.x SDK is not affected and will continue to work without issues.

Finally, to make sure all the APIs are consistent, we’ve chosen to rename the “parametrized” queries to “parameterized”, which is considered to be the correct form across Couchbase documentation and SDK APIs.

Sync and Async API Enhancements

One of the common pitfalls of the asynchronous API is that the returned Observables were “hot” instead of “cold”. This has subtle implications on retry semantics and reusability. Especially if want to use the retry operator, you need to “defer” the returned observable so that on every resubscribe a new observable is generated. In 2.2.0 we decided to make every API call cold by wrapping them for you out of the box. Existing code will continue to work, and even double defers won’t do any harm.

Compare this retry code against 2.1.4:

https://gist.github.com/daschl/0666841cb2bd69e53d17.js

with the slightly simpler one against 2.2.0:

https://gist.github.com/daschl/2b63c5193bac4dd11713.js

Since the getFromReplica calls serve as a way to treat availability over consistency, it very often makes sense to just take the first N documents which are returned. While this is quite easy to do in the asynchronous API with the “take()” operator, the synchronous API did only expose a List version. To make it more flexible for users working with the blocking API, new overloads have been added which now return an Iterator instead. If you only care about the first document returned, here is how you can do it:

https://gist.github.com/daschl/0c7c92642d8c0af7dae3.js

Previously it was not possible to fail on a counter operation if the document did not exist – it was always initialized with 0. Since this feature was available in the 1.x series, we decided to bring it back on the method overload where no default value is specified.

So in 2.2.0, this method overload “Observable counter(String id, long delta)” will fail with a “DocumentDoesNotExistException” if the document does not exist. If you want the previous behaviour, just use the overload with the initial value and set it to 0.

Finally, the SDK now supports more environment configuration options (including making TCP_NODELAY configurable) and design documents can now be configured with options on creation. Here is how you can create a design document and change the default minimum update interval:

https://gist.github.com/daschl/ddec1089eed01285d0c4.js

Supportability Enhancements with Metrics

A common question developers and operations ask themselves is: what is going on inside my application? And also very often related: why do I get a TimeoutException? We’ve been debugging production deployments for a few years now and learned a thing or two while doing that. One of the most things is information. The more information you can get out of your application, the better you can understand it.

For this exact reason, we’ve added always-on latency and runtime metrics to the SDK which are published over the event bus and can be consumed as messages. There is a big difference in just logging stuff or actually exposing it over an event bus (even if it is logged afterwards too). It allows you to consume it, and more importantly, react to it instantly. You can take the data and send it to your favourite monitoring system like nagios, graphite or logstash. No need to parse logfiles in the aftermath of a system outage you are staffed to analyze.

By default, the SDK will transparently collect latencies for operations running through it and write them onto the event bus every hour. The emit interval as well as many more settings are fully customizable through the environment. Here is a simple example which listens on the event bus and only prints metric events to stderr (we are distributing much more events than those on the bus to provide maximum flexibility):

https://gist.github.com/daschl/51e5193a57d909fd072b.js

You can spot two events here. The first one prints runtime information like GC stats, memory and thread usage. The other one is slightly larger, containing collected latency (and throughput) statistics in internal histograms. The information printed contains minimum and maximum latencies, the number of operations as well as percentiles. Under the covers we are using the excellent HdrHistogram library and the related LatencyUtils package.

Note how it not only prints it on a per operation basis, but you can actually identify the target node as well as the status code on return. This allows you to build a tree form of system state and derive insight on how individual nodes or services are performing (node A is slower than the others, replace is faster than insert, many errors against node B,…).

Based on user feedback we are considering adding out of the box consumers to log those metrics, send them to graphite or logstash. Also, we’ll add more sophisticated output formats, including nicely formatted JSON which is both humand and machine parasable. Please let us know what target format you’d like to see built into the driver.

Dependency Upgrades and DCP Changes

Since we are bumping the minor version, we are upgrading dependencies to their latest bugfix versions as well. Here is the full list of dependencies, but keep in mind that we actually only expose RxJava as an explicit dependency, all the others are repackaged to cause no hassle in your environment if you potentially have conflicting versions.

Here are the changes from 2.2.0-dp2 over 2.1.4:

  • RxJava from 1.0.4 to 1.0.13
  • Netty from 4.0.25.Final to 4.0.29.Final
  • LMAX Disruptor 3.3.0 to 3.3.2
  • Jackson from 2.4.2 to 2.5.4
  • LatencyUtils new in version 2.0.2

In addition, Sergey is busy working on extending the Kafka Connector, which also resulted in DCP enhancements in the core-io library. It is still heavily experimental, but we are getting it closer to a point where it can be consumed from a broader audience.

The Road towards GA

Another N1QL feature which is still incubating is (named) prepared statements. The code has been updated in this second developer preview but is still subject to change, so we’ve commented out that API for now. Please be patient until we approach GA for full featured support and extensive documentation.

Other than that, there are no big features on the todo list left for 2.2.0, so we are shifting gears towards smaller fixes, stability enhancements and most importantly documentation. To make this the best release we’ve shipped so far, we need your input! Please kick the tires and provide feedback, especially on the new features and N1QL support. Let us know what’s missing or broken either through a comment here, on the forums or through the bug tracker!

Author

Posted by Michael Nitschinger

Michael Nitschinger works as a Principal Software Engineer at Couchbase. He is the architect and maintainer of the Couchbase Java SDK, one of the first completely reactive database drivers on the JVM. He also authored and maintains the Couchbase Spark Connector. Michael is active in the open source community, a contributor to various other projects like RxJava and Netty.

Leave a reply