The non-digital part of your life is full of documents. Your desk is covered in them. Some of them are structured (invoices, business cards), some of them are less structured (notes, sketches), and some are somewhere in between or a collection of both (notebooks, sketchpads).

Most database technologies have broken this common experience. Generally the cause is having to force our natural concepts into unnatural digital holes. The creation of Relational Databases for instance was driven by storing data only once to avoid duplication due to space limitations imposed by the technology of the time.

Relational and other methods technology-focused abstractions are good for what they are, but what if you didn’t have to change how you thought and worked when you moved from natural concepts (like documents) to building software.

Let’s take a look at how natural we can be when designing documents for Couchbase.

Open Beer Data

The dataset we’ll take a look at is originally from OpenBeerDB.com and is licensed under the Open Database License. Originally the data was relational, and the downloads from the site contain SQL dumps of the original databases.

Thankfully, it’s easy enough to re-assemble the data back to it’s more natural form. We’ll dig into the migration of the data from SQL to NoSQL later in a “Migrating to Couchbase” post.

First, let’s look at the two types of documents we have.

Beer (of course)
{
“_id”: “beer_1554_Enlightened_Black_Ale”,
“_rev”: “1-191ae52a6c773fd7749b65ffd9ae8044”,
“brewery”: “New Belgium Brewing”,
“name”: “1554 Enlightened Black Ale”,
“abv”: “5.5”,
“description”: “Born of a flood and centuries-old Belgian text, 1554 Enlightened Black Ale uses a light lager yeast strain and dark chocolaty malts to redefine what dark beer can be. In 1997, a Fort Collins flood destroyed the original recipe our researcher, Phil Benstein, found in the library. So Phil and brewmaster, Peter Bouckaert, traveled to Belgium to retrieve this unique style lost to the ages. Their first challenge was deciphering antiquated script and outdated units of measurement, but trial and error (and many months of in-house sampling) culminated in 1554, a highly quaffable dark beer with a moderate body and mouthfeel.”,
“category”: “Belgian and French Ale”,
“style”: “Other Belgian-Style Ales”,
“updated”: “2010-07-22 20:00:20”
}

Here’s the document describing New Belgium Brewing’s 1554 Enlightened Black Ale. Of course, you probably already knew that because you read the document. You didn’t (thankfully!) have to read something like this:

id name brewery_id abv style_id
1 1554 Enlightened Black Ale 2 5.5 3

Which, while highly normalized and relational, is not terribly informative by itself. Even with the addition of the two other tables in this relationship, it’s still a bit of a pain to read and mentally reassemble.

In addition to who the beer name and who brews this beer, we also find it’s ABV, a description, and a category and style. The only “id” mentioned in the document is a natural-ish, constructed ID for the beer itself. There are relationships mentioned here, but they’re still humanly readable, and still (as we’ll see later) retrievable.

Now lets take a look at New Belgium Brewing’s own document…

Brewery
{
“_id”: “brewery_New_Belgium_Brewing”,
“_rev”: “1-e405d6f86ec028a4fe0d18be0a6d4fa1”,
“name”: “New Belgium Brewing”,
“address”: [
“500 Linden Street”
],
“city”: “Fort Collins”,
“state”: “Colorado”,
“code”: “80524”,
“country”: “United States”,
“phone”: “1-888-622-4044”,
“website”: “http://www.newbelgium.com/”,
“description”: “We’ll set the scene: 1989. Belgium. Boy on bike. (OK, make that a young man of 32). As our aspiring young homebrewer rides his mountain bike with fat tires through European villages famous for beer, New Belgium Brewing Company was but a glimmer in his eye. Or basement. For Jeff Lebesch would return to Fort Collins with a handful of ingredients and an imagination full of recipes. And then there was beer. Jeff’s first two basement-brewed creations? A brown dubbel with earthy undertones named Abbey and a remarkably well-balanced amber he named Fat Tire. To say the rest was history would be to overlook his wife’s involvement. Kim Jordan was New Belgium’s first bottler, sales rep, distributor, marketer and financial planner. And now, she’s our CEO.”,
“geo”: {
“loc”: [
“-105.07”,
“40.5929”
],
“accuracy”: “RANGE_INTERPOLATED”
},
“updated”: “2010-07-22 20:00:20”
}

You’ll notice the ID for this document is similar, but now we’re using a “brewery_” prefix. The prefixes on both these document ID’s help with human beings finding them again (as do the name-based IDs). It is possible to use UUIDs, numeric, or date-based IDs. It can be helpful, however, to be able to “hand craft” an ID for look-up vs. always having to load the results of a View Query to find what you’re looking for in the database.

This document looks not unlike you might expect any structured version of a business card to look. In fact, there are likely JSON-based contact formats we could have used instead.

Conclusion

Couchbase is schema-less, so tossing these two documents into the database takes no more time than typing them up and adding them to the database. The true value of the structure, values, and key names come to play later in the process rather than being required before we’re even quite sure what we’re building.

In the next installment, we’ll look at some considerations for determining when to use one document for a collection of data and when to use multiple documents.

Author

Posted by Benjamin Young

Benjamin Young is a User Experience Engineer at Couchbase specializing in cushion and seat cover design for Apache CouchDB and bucket juggling for Membase.

Leave a reply