A question came up today from a developer looking to migrate to Couchbase from something else.  That “something else” had a JSON document with some metadata in it.  Couchbase separates data from metadata for some good reasons, so then we’d need to strip out this “_id” field. Fortunately it’s fairly easy to write an extension method to do this with the .NET SDK or if you are using POCOs (Plan Ole’ Csharp Objects), use a custom ContractResolver.

The Scenario

Assume you have a document that looks something like this, perhaps stored on disk:

What you want to do to is painlessly remove the id from the document itself and make it the key for the document that you will insert into Couchbase. Once this is done there will be two documents stored in Couchbase: the document itself and the document metadata.

Document metadata? Content? What’s the difference?

Metadata is data about the document itself, but not about the content of the document. It contains the following values:

  • TTL – expiration time of the document
  • CAS – compare and swap value for ensuring optimistic concurrency on a key
  • Flags – SDK specific metadata for transcoding
  • Sequence number – a value used internally within Couchbase for conflict resolution for keys that are updated on different clusters – thing cross data center replication (XDCR)
  • Key – the unique identifier for the document itself

All of this information is useful outside of the content itself, so important that it’s separated and it persists in memory. The metadata size various between Couchbase versions; as of 2.1.0 it is 54k, which is fairly small. Now the content of the document, is the actual JSON or binary data itself.

Using Custom Contract Resolvers w/Extension Methods

There are two things we need to do: get the key value for the “_id” from the document and second is ensure that during serialization that the “_id” value is not persisted with the content. The former requires that we parse the JSON string and extract the “_Id” and then assign it to the new document that we will insert into Couchbase. The latter can be done one of two ways: by using a custom ContractResolver or by manipulating the JSON as a JObject itself. It turns out, to support both POCO’s and the dynamic keyword, you need to do both.

The IgnoreFieldContractResolver

The Couchbase .NET SDK by default uses the NewtonSoft JSON Framework for .NET. When you are configuring your client, there is a hook for assigning a custom contract resolver. A contract resolves from the fields from your JSON to your object model. A custom resolver allows you to do things like ignore or modify fields within your JSON…it works sort of like a filter.

Here is the listing for a custom resolver which ignores whatever fieldname you pass into the constructor:

There is not a whole lot going on here, basically we are deriving from DefaultContractResolver and overriding the CreateProperties method. In this case we are omitting the JsonProperty that is the name of the FieldToIgnore field from being serialized. If you now set the ClientConfiguration to use it, like this:

Then all JSON documents that are serialized will have their FieldToIgnore stripped; in our case we used the “_id” field, since we do not want it persisted (since it will become the metdata key).

Extracting the Id and Inserting the JSON w/an Extension Method

Now that we have a contract resolver which will strip the “_id” field from any JSON we insert using the client, we can extract the id for the document (the value of “_id”) and use it as the key for the insert.

Note that there are two (main) cases for storing JSON (from an SDK perspective) in Couchbase. You can store a POCO which represents the JSON document or you can insert the JSON document as a dynamic Type. Each requires special consideration, but it’s pretty easy to write and extension method which abstracts this:

The “special” consideration here for dynamic types is that you cannot rely on reflection over T, since T will be an object. You need to create a JObject first and then use that to get the value of “_id”.

Once you have this extension method I place, you can write simple code like this to pull a JSON file from disk, extract the key and insert it into Couchbase:

Notice that the for the POCO you target the “Id” field and for the dynamic you target the “_id”, that is simply because for the dynamic we pull the value directly from the JObject, thus it will reflect the casing and conventions of the original JSON.

Now if you look at the JSON document in Couchbase Managment Console, you'll see that the “_id” field was stripped from the document and used for the key:

Getting the Source:

If you want to play around with the source I used for this post, it's in couchbase labs on Github. The intention of the project (couchbase-net-contrib) is to provide extensions and plugins that are commonly used when working with the Couchbase SDK, but probably won't make it into the actual SDK. It's intended to be community driven, so feel free send a pull requests with any contributions you feel would be useful for others!

Posted by Jeff Morris, Software Engineer, Couchbase

Leave a reply