November 24, 2009

memcached and the client: Database UDFs

NorthScale's own Patrick Galbraith has, for many years now, authored and maintained the MySQL, and now Drizzle, UDFs for memcached.  Last week, Patrick took this one step further with the latest release, version 1.1, which now includes support for "check and set" (a.k.a. CAS) operations.

User Defined Functions are available for a number of different databases.  This allows some kind of stored procedure language or other triggers to execute other code imported into the DB.  In the case of the memcached UDF, this means giving stored procedures the ability to call memcached operations.

The general idea here is pretty simple.  Most applications start with a database, though it's always possible to use web services or flat files.  Regardless of where the data is persisted, to keep the cache always up to date with the System of Record (SoR), one really, really simple approach is to propagate invalidations (i.e. deletes) to the cache whenever you update a record in the SoR.  Databases, either single or sharded, are so popular for managing app data, so they have a role in this pattern.  In the diagram below, when the application needs to update a record based upon user interaction (#1) the database can, if UDF enabled and told how to do so, invalidate that data in the cache (#2).

UDF Pattern Diagram

This isn't for everything since multiple operations may not be enforced as a transaction from the application, but it's simple to set up and works for a great many apps.

In addition to Patrick's excellent UDFs for MySQL and Drizzle, there is pgmemcache for PostgreSQL, and even a prototype of UDFs for Apache Derby (a.k.a. JavaDB).

Oh, and about that new CAS feature Patrick added to the MySQL/Drizzle UDF.  Most memcached users start with the small stuff: gets and sets.  They then find utility for operations like add.  Before long, they're wrastling with how to deal with distributed clients wanting to update an item.  At a high level, this is where "check and set" (a.k.a. CAS) operations come in.  Have a look at the original protocol.txt (or the binary protocol doc) to see how you may use this.  In particular, adding CAS allows one to implement lock-free algorithms frequently required when multiple systems want to update an item in a distributed system.

Jump in on the list off of memcached.org if you're looking for more information.

Comments