March 16, 2010

How I Learned to Stop Worrying and Love Dynamically Loadable Modular Engines

Memcached Keeps You Wanting More

Memcached is a pretty simple system with pretty simple semantics. Many users have wished for just a little more functionality than provided out of the box which has led to several forks and related projects.

To accommodate what are really just minimal differences, lots of projects have spun up as either forks of memcached, or entirely new projects:

•dbcached
•depcached
•memagent
•memcached-pro
•memcachedb
•moxi
•redis
•repcached
•spcached
•tokyo tyrant
•tugela cache
… the list goes on much further than that, not even including in-house forks we know exist inside many organizations.

Most of these are doing the same thing from the client perspective (basic sets and gets of keyed data), but want to do something a little different with the data that comes in. Some provide persistence, some replication, some proxying and some provide novel new operations on your data.

Memcached Storage Engines

Having seen the same pattern over and over, we introduced a storage engine framework to memcached allowing you change the way data is handled without breaking compatibility with clients or neglecting important new features or bug fixes in the front-end protocol handler.

You can sort of think of this as a limited version of what Apache web server modules provide you. You don't need to change the whole server to get something different.

What kinds of things can you do with the memcached storage engine framework? Look at the list above - that's the set of problems we were trying to solve.

New In-Memory Storage Formats
You can write an engine to use an alternate memory allocator, or change the item's in-memory structure if you think you can do better. Facebook's implementation of memcached, for example, reduces the accuracy of the LRU in order to save a considerable amount of space at their scale. We've ensured that this can be easily implemented as an engine.

If you're a fan of C++, you can quite quickly make an engine that stores item objects in an STL map Trond Norbye used this as an example in a blog post last year.

Persistent Storage
One of the first proper engine tests was a hybrid RAM/flash storage mechanism. Trond also did a small demo SQLite storage engine.

Zynga is running a persistent storage engine in production behind its FarmVille and Cafe World applications using NorthScale Membase Server, which is now in private beta.

Multi-Tenancy
Combined with SASL, we were able to make a pretty simple multi-tenant engine allowing completely isolated logical caches to be created within a single memcached instance.

This engine is particularly interesting because it also handles protocol extensions for securely managing the logical containers independently on your server instances.

Engines Without Performance Guarantees
Some of the memcached projects mentioned above act as proxies where they will be communicating with other memcached instances, injecting intelligence between the client and the server.

An author writing a proxy as an engine has it easy. An engine can tell the server that it can't handle a request instantly and go off and do work on its own thread until it's ready. The client will be suspended without affecting any other clients. When the request is ready, it can notify the IO is complete and prepare the response.

This is useful for many things other than proxies, of course. I'd expect any persistent storage engine to implement the same kind of thing.

New Storage Concepts
It should be quite easy to create an engine that stores data considerably differently from the key/value stuff you're used to as well.

If you want to build an abstract data type server, we have the framework for you already and keep the core tested, so you just need to build your part and go.

Where Can We Go From Here?

The one topic we haven't covered here in the engine framework is how an engine might support replication (i.e. handle the repcached case). We've got ideas and prototypes that allow engine writers to do all the things they need to do without adding unnecessary overhead to the rest of the world.

And that's the whole point of the engine branch - you only pay for the features you need.

Comments