March 15, 2010

Want to know what your memcached servers are doing? Tap them.

It is possible to dump parts of the cache by using “stats dump ...” to get some of the keys in the cache (and then you can go ahead and fetch each value). Here is an example on how to do that: stats dump 1 10 xxxx get <item>  Wouldn't it be better if you could eavesdrop on the memcached server? People running memcached on Solaris can do this already by using DTrace. (see http://blog.northscale.com/northscale-blog/2009/08/mrroboto-the-memcached-ami-story.html for an example). But it would be nice to create a solution that works for others as well. Dustin Sallings and I teamed up and started to to design an interface that would allow you to get a stream of notifications from your memcached server. Among the requirements for the solution were:

  • Engine neutral. People may have their own specialized memcached storage engines, and what good is a new interface if it cannot work with their specialized engines. This means that the protocol has to be extensible.
  • Use the binary protocol for maximum portability. The binary protocol is unambiguous; hardware, operating system and programming language neutral. By using the binary protocol we can tap remote servers.

The tap interface opens up a lot of possibilities for memcached software developers:

  • Observability – We can attach a tap client to the server and see when items are added/modified/deleted
  • Replication – Why not let a memcached server listen for tap events from another server? Why stop with one? Let's create a circle of  10 servers and let them tap the next one in the circle (please note that this creates a window of inconsistency from the time you perform an operation on the first node until it is “replicated” throughout the circle).
  • Persistence – This is one of the most popular requests I see these days. Instead of creating a specialized engine that does persistence, we can connect a tap stream that receives all of the mutation events from the memcached server and stores the data to a persistent media.

So how does the Tap interface work? It works by taking advantage by the memcached engine implementation. In this implementation the memcached core is responsible for the protocol handling and network IO, and the engine is responsible for feeding the core with a stream of events. Let's walk through an example: When the server receives a “TAP Connect” message from the client, the memcached core will try to get a tap stream from the storage engine by calling get_tap_iterator(). With the iterator in place, the memcached core will try to get the next event to send to the tap client whenever the socket is writable. If the engine doesn't have any events to send to the client, the iterator should return TAP_PAUSE (and call notify_io_complete to notify the core to start walking the iterator again). So how does the iterator look? At first glance you may think that it looks too complex, but let's take a closer look at the prototype:  tap_event_t tap_iterator(ENGINE_HANDLE* handle, const void *cookie, item **item, void **engine_specific, uint16_t *nengine_specific, uint8_t *ttl, uint16_t *flags, uint32_t *seqno); The API documentation gives detailed information about each parameter, but I would like to highlight one of the parameters. The engine_specific parameter allows each storage engine to pass on extra information in each TAP message so that the receiving engine can recreate the event (e.g., this could be bucket information used by the bucket engine, or extra security information if you create a security aware engine). The memcached server is also capable of receiving TAP messages and passing them on to the storage engine. Whenever a TAP message is received, it will call the tap_notify function in the engine interface:         ENGINE_ERROR_CODE tap_notify(ENGINE_HANDLE* handle, const void *cookie,                                         void *engine_specific, uint16_t nengine, uint8_t ttl, uint16_t tap_flags,                                         tap_event_t tap_event, uint32_t tap_seqno, const void *key,                                         size_t nkey, uint32_t flags, uint32_t exptime, uint64_t cas,                                         const void *data, size_t ndata); This sounds easy doesn't it? If this sounds interesting you should check out the TAP project.

Comments