In this second instalment of “Inside the Java SDK” we are going to take an in-depth look at how the SDK manages and pools sockets to the various nodes and services. While not ultimately necessary to follow, I recommend you check out the first post on bootstrapping as well.

Note that this post was written with the Java SDK 2.5.9 / 2.6.0 releases in mind. Things might change over time, but the overall approach should stay mostly the same.

In the spirit of the OSI and TCP models, I propose a three layer model that represents the SDKs connection stack:

Higher levels build on top of lower levels so we’ll start with the Channel layer and work our way up the stack.

The Channel Layer

The channel layer is the lowest level the SDK deals with networking and is built on top of the excellent, fully asynchronous IO library called Netty We’ve been extensive Netty users for years and also contributed patches as well as the memcache codec back to the project.

Every Netty Channel corresponds to a socket and is multiplexed on top of event loops. We’ll cover the threading model in a later blog post, but for now it’s important to know that instead of the “one thread per socket” model of traditional blocking IO, Netty takes all open sockets and distributes them across a handful of event loops. It does this in a very efficient way, so it’s no wonder that Netty is used all over the industry for high performance and low latency networking components.

Since a channel is only concerned with bytes going in and out, we need a way to encode and decode application level requests (like a N1QL query or a Key/Value get request) into their proper binary representation. In Netty this is done by adding handlers to the channel pipeline. All network write operations work their way down the pipeline and server responses come back up the pipeline (also called inbound and outbound in Netty terminology).

Some handlers are added independent of the service used (like logging or encryption) and others depend on the service type (for example for a N1QL response we have JSON streaming parsers customized to the response structure).

If you ever wondered how to get packet-level logging output during development or debugging (for production use tcpdump, wireshark or similar), all you need to do is enable the TRACE log level in your favourite log library and you’ll see output like this:

Note the little LoggingHandler  up there? This is because we only add the logging handler if tracing is enabled to the pipeline so you are not paying the overhead if you are not using it (which is most of the time):

You can also see that depending on the environment configuration we make other adjustments like adding a SSL/TLS handler to the pipeline or configuring TCP nodelay and the socket timeouts.

The customEndpointHandlers  method is overridden for each service, here is the pipeline for the KV layer (slightly simplified):

Lots going on here! Let’s go through it one by one:

  • The IdleStateHandler  is used to trigger application level keepalives.
  • The next two handlers BinaryMemcacheClientCodec  and BinaryMemcacheObjectAggregator  deal with encoding memcache request and response objects into their byte representations and back.
  • KeyValueFeatureHandler , KeyValueErrorMapHandler , KeyValueAuthHandler  and KeyValueSelectBucketHandler  all perform handshaking, authentication, bucket selection and so forth during the connect phase and remove themselves from the pipeline once complete.
  • Finally, the KeyValueHandler  does most of the work and “knows” all the different request types going in and out of the system.

If you want to take a look at a different one, here is the N1QL pipeline for example.

Before we move up one layer there is one important bit. The RxJava Observable completion also happens at this layer. Once a response is decoded it is completed either on the event loop directly or in a thread pool (configured by default).

It is important to know that once a channel goes down (because the underlying socket is closed) all state at this level is gone. On a reconnect attempt a fresh channel is created. So who manages a channel? Let’s move up a layer.

The Endpoint Layer

The Endpoint  layer is responsible for managing the lifecycle of a channel including bootstrap, reconnect and disconnect. You can find the code here.

There is always a 1:1 relationship between the Endpoint and the channel it manages, but if a channel goes away and a socket needs to be reconnected, the endpoint stays the same and gets a new one internally. The endpoint is also the place where the request is handed over to the event loops (simplified):

If our channel is active and writable we’ll write the request into the pipeline, otherwise it is sent back and re-queued for another attempt.

Here is a very important aspect of the endpoint to keep in mind: if a channel closed, the endpoint will try to reconnect (with the configured backoff) as long as it is explicitly told to stop. It stops when the manager of the Endpoint  calls disconnect  on it which will happen ultimately when the respective service/node is not part of the config anymore. So at the end of a rebalance or during a failover the client will receive a new cluster config from which it infers that this endpoint can be terminated and then it does so accordingly. If, for whatever reason, there is a delay between a socket disconnect and this information propagating you might see some reconnect attempts that will stop eventually.

One endpoint is all very well but more is always better right? So let’s go up one more layer to figure out how endpoints are pooled to create sophisticated connection pools on a per node and service basis.

The Service Layer

The Service  layer manages one or more endpoints per node. Each service is only responsible for one node – so for example if you have a Couchbase cluster of 5 nodes with only the KV service enabled on each then if you inspect a heap dump you’ll find 5 instances of the KeyValueService .

In older client versions you were only able to configure a fixed number of endpoints per service through methods like kvEndpoints , queryEndpoints  and so forth. Due to more complex requirements we’ve deprecated this “fixed” approach with a powerful connection pool implementation. This is why instead of i.e. queryEndpoints  you should now use queryServiceConfig  and equivalents.

Here are the current default pools per service in 2.5.9 and 2.6.0:

  • KeyValueService : 1 endpoint per node, fixed.
  • QueryService : from 0 to 12 endpoints per node, dynamic.
  • ViewService : from 0 to 12 endpoints per node, dynamic.
  • AnalyticsService : from 0 to 12 endpoints per node, dynamic.
  • SearchService : from 0 to 12 endpoints per node, dynamic.

The reason why KV is not pooled by default is that connection handshaking is way more costly (remember all the handlers in the pipeline) and the traffic pattern is usually very different from the heavier query based services. Experience from the field has shown that increasing the number of KV endpoints only makes sense in “bulk load” scenarios and very spiky traffic where the “pipe” of one socket is just too small. If this is not properly benchmarked it could also be that adding more sockets to the KV layer can degrade your performance instead of improving it – I guess more is not always better.

The pooling logic can be found here if you are curious, but it’s worth examining certain semantics in there.

During the connect phase of the service, it ensures that the minimum number of endpoints is established up front. If the minimum equals the maximum, dynamic pooling is effectively disabled and the code will pick one of the endpoints for each request:

This can be observed from the logs right away during bootstrap:

When a request comes in it is either dispatched or if another endpoint needs to be created (there is still room in the pool) that is handled as well (slightly simplified):

Note that if we can’t find a suitable endpoint and the pool is fixed or we have reached our ceiling then the operation is scheduled for retry, very similar to the endpoint logic when it is not active or writable.

In pooled HTTP based services we don’t want to keep those sockets around forever so you can configure an idle time (which is 300s by default). Each pool runs an idle timer that regularly examines the endpoints if they have been idle for longer than the configured interval and if so disconnects it. Note that the logic always ensures that we do not fall below the minimum number.

Common Connection-Related Errors

Now that you have a good idea on how the SDK handles sockets and pools them, let’s talk about a couple of error scenarios that can come up.

Request Cancellations

Let’s talk about the RequestCancelledException  first.

If you are performing an operation and it fails with a RequestCancelledException  there are usually two different causes:

  • The operation circled around inside the client (without being sent over the network) for longer than the configured maxRequestLifetime .
  • A request has been written to the network but before we got a response the underlying channel was closed.

There are other less common reasons (i.e. issues during encoding of a request) but for the purpose of this blog we will focus on the second cause.

So why do we have to cancel the request and not retry it on another socket that is still active? The reason is that we don’t know if the operation already has caused a side effect on the server already (for example a mutation applied). If we would retry non-idempotent operations there would be weird effects that are hard to diagnose in practice. Instead, we tell the caller that the request has failed and then it’s up to the application logic to figure out what to do next. If it was a simple get request and you are still in your timeout budget you can retry on your own. If it’s a mutation you need to either put some more logic in place to read the document and figure out if it has been applied or you know it can be sent again right away. And then there is always the option to propagate the error back to the caller of your API. In any case it’s predictable from the SDK side and won’t cause any more harm in the background.

Bootstrap Issues

The other source of errors that is worth knowing about are issues during the socket connect phase. Usually you’ll find descriptive errors in the logs that tell you what is going on (for example wrong credentials) but there are two which might be a little harder to decipher: The connect safeguard timeout and select bucket errors during rebalance.

As you’ve seen before, the KV pipeline contains many handlers which work back and forth with the server during bootstrap to figure out all kinds of config settings and negotiate supported features. At the time of writing each individual operation does not have an individual timeout but rather the connect safeguard timeout kicks in if it takes longer than the connect phase is allowed to in terms of total budget.

So if you see the ConnectTimeoutException  in the logs with the message Connect callback did not return, hit safeguarding timeout.  what it means is that one operation or the sum of all of them took longer than there was budget for and another reconnect attempt will be performed. This is not harmful in general since we will reconnect, but it is a good indication that there might be some slowness on the network or somewhere else in the stack that should be looked at more carefully. A good next step would be to start wireshark / tcpdump and record the bootstrap phases to figure out where the time is spent and then either pivot to the client or the server side depending on the recorded timings. By default the safeguard timeout is configured as the socketConnectTimeout  plus the connectCallbackGracePeriod  which is set to 2 seconds and can be tuned via the com.couchbase.connectCallbackGracePeriod  system property.

One of the steps during bootstrap since we added support for RBAC (role based access control) is called “select bucket” through the KeyValueSelectBucketHandler . Since there is a disconnect between authentication and having access to a bucket, it is possible that the client connects to a KV service but the KV engine itself is not yet ready to serve it. The client will gracefully handle the situation and retry – and no impact to an actual workload is observed – but since log hygiene is also a concern we are currently improving the SDK algorithm here. If you want you can follow the progress at JVMCBC-553.

Final Thoughts

By now you should have a solid understanding of how the SDK manages its underlying sockets and pools them at the service layer. If you want to go digging into the codebase, start here and then look at the respective namespaces for service  and endpoint . All the Netty channel handlers are below the endpoint  namespace as well.

If you have further questions please comment below! The next post will discuss the overall threading model of the SDK.

Author

Posted by Michael Nitschinger

Michael Nitschinger works as a Principal Software Engineer at Couchbase. He is the architect and maintainer of the Couchbase Java SDK, one of the first completely reactive database drivers on the JVM. He also authored and maintains the Couchbase Spark Connector. Michael is active in the open source community, a contributor to various other projects like RxJava and Netty.

Leave a reply