June 19, 2014

Introducing libcouchbase 2.4

libcouchbase 2.4 (Developer Preview 1) is here. It offers large architectural improvements and several new features, improving over previous versions.

Note: Builds for the developer preview are no longer listed in the archive. You probably want the beta build (or the release build, when that becomes available). As always, the latest builds are found at http://packages.couchbase.com/clients/c/index.html

API Documentation for the DP1 version may be found at http://docs.couchbase.com/sdk-api/couchbase-c-client-2.4.0-dp1

Internal Improvements

Packet Structures

Codenamed packet-ng, this version of libcouchbase started out as an attempt to refactor packet handling in such a way that packets were considered first class objects. The request packet is the core currency of the library as it binds the user requested cookie together with the server reply. 

In 2.4, request packets are encapsulated in the mc_PACKET structure which contains information about the cookie, the buffers for the packet itself, and the state of the packet (i.e. received, flushed, retried, errored, pending). The packet structure comes along with the mcreq module which provides a unified API for allocating, freeing, analyzing, and rescheduling packets to individual servers.

Packet Queues and Buffers

Packets are now inserted into a queue (or an mc_PIPELINE) structure which contains the ordering of the packets as a linked list. Packets are added to the queue in the order they are scheduled.

Since I/O efficiency is better with contiguous buffers, the mc_PACKET structure itself does not contain the buffer within its own object, but rather a special pointer to a region within a contiguous buffer managed by a special in-order contiguous allocator. This allows packets to live as "independent" objects while having their actual network data be tightly packed in sequence. Like the network buffers themselves, each packet object is also allocated using a separate instance of this allocator.

The allocator lives in the netbuf system which also contains structures and routines for efficiently handling buffer fragments and properly preparing them for being sent to the network, while handling conditions such as partial sends.

I/O Improvements

The I/O system has been refactored and modularized within the lcbio module (src/lcbio). The notable addition is that of the lcbio_CTX structure which contains efficient and unified routines for socket reads, writes, and error handling, abstracting the underlying I/O model (e.g. completion-based like IOCP or libuv; or event-based like select or libevent) from its API.

Robustness during Configuration Changes and Failures

Configuration changes and failures are now handled gracefully. When a new configuration is received and the related server object (mc_SERVER) needs to change positions, its TCP connection is kept in tact, and it is traversed for any commands (packets) which are no longer mapped to it. For each of those commands, the mc_PACKET structure is duplicated and placed in the proper queue, while the possibly underlying send buffer is still sent out to the network and its response ignored. This allows us to keep the TCP stream in tact and simply swallow the related (and anticipated) error response coming from the server.

If a TCP connection is suddenly broken and no new configuration has arrived, the related packets may be placed inside a retry queue or immediately failed. Which commands are retried and which commands are failed can be configured by the user.

More Tests

As this version of the library has been refactored to modularize as many systems as possible, it means that testing each of the modules becomes simpler as they have more well defined behavior and fewer dependencies. Many new tests have been added dedicated to buffer management, packet handling, and raw I/O handling. All of these tests make no use of the CouchbaseMock server or any external resources but are entirely contained and deterministic.

New API Documentation

API documentation is now generated via Doxygen. Doxygen is an open source cross platform documentation generator which generate API documentation based on source code comments. This will allow our API documentation to be more up to date - so that as long as a new API is added and it contains comments, it will feature inside the API documentation, and if an older API is removed, it will disappear from it.

Additionally we've formally added interface attributes to all of our APIs to help you determine the stability and roadmap for a particular API call. This allows us to clearly convery if a specific interface is experimental (or volatile), or if it may be used in production code with confidence that it will not be modified or removed in later versions.

New Features

SSL Support for Couchbase Enterprise 3.0

Version 2.4 contains support (via OpenSSL) for communicating with the server using the SSL protocol. SSL support is implemented entirely in one of the layers inside lcbio and thus resides underneath lcbio_CTX. As such, SSL support is virtually transparent to most systems in the library. By default the library will still connect in a non-encrypted mode (your SASL password will still be encrypted if possible, though)

Connection String ("DSN") Support

Also new is support for a new way of specifying how to connect to the cluster. As more and more connection options are added to the library it was necessary to provide a uniform format for users to declare how and what they want to use when connecting to the cluster. Brett Lawson proposed a new URI-like format which allows one to specify connection options in a clear, concise, and unambiguous format. Using a URI format allows things such as being able to specify these settings inside a configuration file (so you don't have to manually parse multiple settings and then match them to appropriate struct fields).

Since libcouchbase is mostly used as a core layer of higher level libraries (such as Python, Node.JS and Ruby), exposing a string connection option makes it easy for all these languages to share a common interface and a common codebase when specifying how to connect to the cluster.

As a demonstration, a connection string like couchbase://foo.com,bar.com,baz.com/mybucket?operation_timeout=5000000&detailed_errcodes=true  will use foo.combar.com, and baz.com as nodes to connect to the bucket mybucket, applying an operation timeout of 5 seconds and enabling detailed error codes (another new feature in the library).

New Request APIs

A new set of (volatile) request APIs were added in this version to form the basis of the APIs of the next major release of the library. These APIs operate on a single command at a time and follow an enter/leave pattern, where a user "enters" a scheduling context, schedules a bunch of commands, and then "leaves". In contrast to the 2.x APIs where each command would implicitly schedule a flush to the network, these new APIs will only schedule a flush when "leaving" their current context. This allows efficient construction of multiple batched operations without having to allocate an array of command structures to do so; thus for example:

lcb_sched_enter(instance);
char buf[4096];
lcb_error_t err;
for (size_t ii = 0; ii < 10; ii++) {
  lcb_CMDGET cmd = { 0 };
  sprintf(buf, "Key_%d", ii);
  LCB_KREQ_SIMPLE(&cmd.key, buf, strlen(buf));
  err = lcb_get3(instance, NULL, &cmd);
  if (err != LCB_SUCCESS) {
    break;
  }
}
if (err == LCB_SUCCESS) {
  lcb_sched_leave(instance);
  // Schedules a flush lcb_wait(instance);
} else {
  lcb_sched_fail(instance); // None of the commands are flushed;
}

 

You may now give libcouchbase raw memcached packets to dispatch to a server and receive a raw memcached packet in reply. This allows lower level access to packet functionality and allows you to build a proxy server. The feature is implemented in such a way that the response buffers are not copied over to the callback and may be kept alive outside the callback, so that you do not need to copy over GET responses into a temporary buffer for processing. Likewise the request packet itself can also optionally not be copied, but have a callback invoked when it is no longer needed by the library.

 

Additionally, value data (for lcb_store3() in the new scheduling API) may optionally be placed in the library without copying. This allows large creation requests to be more efficient in terms of memory allocation.

 

New Cluster Configuration APIs

 

A new callback has been added to the library notifying the user if the inital bootstrap has succeeded or failed. This was previously done using the error callback (lcb_set_error_callback()) and the configuration callback (lcb_set_configuration_callback), where the error callback would be invoked upon an initial error, and the configuration callback invoked when the cluster received a new configuration. The error callback however would also be invoked each time a specific node failed, making clients fail prematurely if multiple nodes were passed and only the first one in the list failed. The new bootstrap callback is invoked only once, and only during the initial creation with a definite error code indicating either bootstrap success or failure. For non asynchronous clients you can simply use lcb_get_bootstrap_status() and not need to rely on a callback:

lcb_t instance;
struct lcb_create_st cropt = {
  .version = 3,
  .v.3.dsn = "couchbase://cbnode1,cbnode2/mybucket"
};
lcb_error_t err = lcb_create(&instance, &cropt);
if (err != LCB_SUCCESS) {
  // handle error;
}
#if I_AM_BLOCKING
err = lcb_connect(instance);
lcb_wait(instance);
if ((err = lcb_get_bootstrap_status(instance)) != LCB_SUCCESS)
{
  printf("Failed to bootstrap: %s\n", lcb_strerror(instance, err));
}
// do commands
#else /* I AM ASYNC */
static void bootstrap_callback(lcb_t instance, lcb_error_t err) {
  if (err != LCB_SUCCESS) {
    printf("Couldn't bootstrap");
  } else {
    lcb_GETCMD gcmd = { 0 };
    LCB_KREQ_SIMPLE(&req.key, "foo", 3);
    lcb_sched_enter(instance);
    lcb_get3(instance, NULL, &gcmd);
    lcb_sched_leave(instance);
  }
}
lcb_set_bootstrap_callback(instance, bootstrap_callback);
lcb_connect(instance); // Return to event loop, or call lcb_wait()
#endif

Additionally, an lcb_refresh_config() callback has been added to forcefully make the client request a new configuration from the cluster. This is useful to "force" a reconfiguration in cases where many timeouts are being encountered, or to enforce a customized refresh policy within the application.

 

Finally, the vbucket API has been exposed, allowing inspection of the current configuration being used by the library. The new API is located in libcouchbase/vbucket.h (inside the headers directory). 

#include <libcouchbase/vbucket.h>
lcbvb_CONFIG *config;
lcb_error_t err;
err = lcb_cntl(instance, LCB_CNTL_GET, LCB_CNTL_VBCONFIG, &config);
// Check error
printf("Revision of current config is %d\n", lcbvb_get_revision(config));
printf("Cluster has %u servers\n", lcbvb_get_nservers(config));

You may also use the revision to determine if the client has received a new configuration.

 

Comments