April 20, 2010

Tuning Memcached Timeouts for a Cloud Environment

These days, more and more apps are running in the cloud, and they're starting to take memcached with them. For example, as we announced earlier this week, nearly 300 applications are using NorthScale's memcached as a service on Heroku's Ruby-based PaaS cloud platform.

In the past, most environments using memcached have run it on a single, controlled LAN: usually the frontend web servers sitting on the DMZ, without even the normal firewall or router sitting between the DMZ and the database. In this environment, one can reasonably expect that server failures are far more likely than even a single dropped packet, and waiting for a retransmit is likely to take longer than a hit to the database, so it makes sense to set extremely aggressive timeouts, on the order of 100-250ms or less, for memcached operations.

In contrast, cloud networking environments tend to be far less controlled, since they're shared with other customers, and even the location of a given service is not necessarily under the control of the user. In these environments, it's not uncommon to have three or more hops between nodes, even in the same datacenter. And with other customers on the same switch or same physical node, one can expect to see the occasional burst of packet loss or high latency.

In such an environment, with even mildly aggressive timeouts, a single dropped packet can cause a query to fail. TCP's initial retransmit timer is 3 seconds. The only way to try again faster than this is to give up and retry right away. Unless your database is slower than this or particularly expensive, it may make sense to leave your client's timeouts set to their (probably relatively aggressive) defaults. However, since timeouts are now more likely to be caused by network issues than server failures, it's important if your client marks servers dead after repeated failures that this be loosened a bit. 2 (the default in Fauna) is probably too tight; 3-5 is probably a better number of failures before marking a server dead.

A lot of applications that weren't born in the cloud won't handle timeouts from memcached very well. For such applications, you might consider tuning the timeouts to be able to handle at least a single dropped packet. The tunables for Fauna (the Ruby client NorthScale recommends since it supports SASL) can be passed as hash parameters to the client constructor. I've included recommendations for handling a single dropped packet:

:connect_timeout - 4.0 - initial connect, TCP will take 3 seconds if a packet is dropped
:rcv_timeout - not set - libmemcached equivalent is MEMCACHED_BEHAVIOR_RCV_TIMEOUT - time to get a response from the server
:poll_timeout - not set - libmemcached equivalent is MEMCACHED_BEHAVIOR_POLL_TIMEOUT - time we wait for the poll call to return
:timeout - 4.0 - default for the :rcv_timeout and :poll_timeout if they aren't separately specofied; this is usually what's used
:retry_timeout - 30 - how long to leave a server marked dead when it hits :server_failure_limit
:server_failure_limit - 3-5 - how many times a server can fail before it's marked dead

All these are in seconds; Fauna does the conversion to the right value for libmemcached internally.

Comments