How to Move from MySQL to Couchbase Server 2.0: Part 1

MC Brown on December 16, 2014

If you have a database built using MySQL, you might be wondering if, and more importantly how, that database (and your application) can be moved to Couchbase. The biggest stumbling block is not the technical aspects of creating the Couchbase or storing your information (although they are important), instead it’s about looking at your data in a different way, and then knowing how that changes how your application works.

We’re going to start by looking at how you can turn your MySQL database structure into Couchbase Server, and how querying your database in Couchbase Server differs from the methods used in MySQL.

Think about the data structure first

MySQL (and other SQL and table based databases) force you to think about your data in terms of tables. All your data is a table, and when storing complex structures, an individual piece of data may be split across one or more tables. For some applications and datatypes this is a perfectly logical and reasonable way of approaching and storing your data. But for some applications the table structure does not map very well to the data you want to store.

Let’s take a typical example, a recipe database. This is something I know well, since Cheffy.com is built on top of MySQL. The basic table structure is a core table, called recipe, which contains the recipe name, subtitle, description, and servings. Other relevant information about the recipe, such as the list of ingredients, method steps, metadata, and keywords are stored in other tables, linked to the original recipe by a unique recipe ID. You can see this, very basically, in the figure below.

???

There are some potential benefits to this structure — certain operations for example can be very straightforward and easy. For example, want to find all of the recipes that have the ingredient ‘carrot’? You can perform a query on the ingredients table looking for ‘carrot’, and from that, get a list of matching recipes. By using joins you can get a list of the recipes, their title and other info from the recipe table, by searching on the ingredient table, using the join to connect the recipe ID in the ingredients table to the recipe ID in the recipe title.

???

While this kind of search is easy, collecting all of the information about a recipe, for example when you want to display the recipe to the user, can be really complicated. You could do it with a single query, but it can sometimes be easier to do it with multiple queries, one to get the recipe data, one for the ingredients, another for the metadata and so on. Within the application layer, this is done automatically to build up an object, which is then used as the basis for displaying the recipe to the user in a formatted way.

For many users and applications, either a special layer is built to do this, or one of the many object-relational mapping systems to used to map the data from the underlying tabular structure, into the top-level object that the application (and user) is familiar with. Recipes here are one example, but it extends to a wide variety of other applications, including invoicing (invoice, supplier, destination, invoice lines) and blog posts (post content, keywords, creator, comments).

Such table-based solutions are not inherently bad, but for this type of situation, the key information is being stored across multiple tables, and that means keeping the multiple tables in synchronization with each other. For example, what happens when a record is deleted? You have to delete all the other records that might have linked to the original, either manually, or using cascading deletes). Similarly, when it comes to loading the information about a recipe, you end up running 5-10 queries the information together.

Couchbase takes a different approach. Instead of multiple tables in which you can store different pieces of information, within Couchbase you store a single structure (in JavaScript Object Notation JSON) format. The JSON format allows for a complex structure of fields, arrays, objects, and scalar types, that can be combined together into an entire record. That means that you can now represent your old, multiple-table, entity (recipe, blog post), as a single ‘document’.

{
“title” : “Carrot and Coriander Soup”
“servings” : 4,
“subtitle” : “Delicious with wholemeal bread”,
“preptime” : 8,
“cooktime” : 12,
“totaltime” : 20,
“ingredients” : [
{
“amount” : 250,
“ingredient” : “Carrots”,
“measure” : “g”
},
{
“amount” : 75,
“ingredient” : “Coriander”,
“measure” : “g”
},
{
“amount” : 250,
“ingredient” : “Vegetable Stock”,
“measure” : “ml”
}
],
“method” : [
“Chop carrots”,
“Cook all ingredients in pan”,
“Liquidize”
}
],
}

Now everything to do with your recipe is in one place, and we can load the recipe in one operation from your Couchbase database.

There is no structure or definition to the content; any document in a Couchbase database can contain anything and any structure. You can, however, apply a validation routine that checks that the structure of the document supplied to the database. Validation can cover both the fields and their contents.

Also, bear in mind that because there is no strict structure, there is additional flexibility in what, and how, you store the information. Adding a new section to your recipe document that holds data about who supplied the recipe to your database is a case of extending the document structure.

There is also no notion of multiple tables. There is just the database, and documents contained within that database. If you want to support different types of information within the same database then you can add a field to the document. For example, to identify a recipe you might do:

{
“type” : “recipe”,
“title” : “Carrot and Coriander Soup”
“servings” : 4,
“subtitle” : “Delicious with wholemeal bread”,

}

The identification of the record type can be used in other areas of the database system to help you recognize (and select) the data you are loading.

If everything is a document, how do I get a list of records?

In the first section I talked about how with MySQL you could construct a simple SQL statement to get a list of recipes that include carrots. In MySQL this works by searching for values in the ingredients table to find recipe IDs, and then use a join to get the recipe titles from the recipe table. For speed you would ordinarily use an index to improve the response times of the query, to save individually looking at each record.

In Couchbase everything is a document, and there is no built-in method to search the field of a table, since there are no fields and no tables either. Because there is no rigid structure in place (and no way for the database engine to identify the fields in a free-form document), how do we do any operations that normally operate on a list (or table)? Everything from a simple ‘list all recipes’ to ‘find all recipes with an ingredient of carrot’ rely on the use of a list somewhere.

Couchbase supports a construct called a view. Views are similar in principle to the views within MySQL, except that in Couchbase, views are the only way to get lists of documents out of your database, not a method of getting an alternative .

A view in fact defines three things:

  • The structure of the information included in the view. You can think of this as defining the structure of the table, just as you would define a table within MySQL.
  • The fields or information that can be searched. A view outputs two elements, a key and a value. The keys form the method by which you can search, or more specifically, select, the database content that you want.
  • An index on the structure and keys. The index is used to improve the searching of the data reported in the view.

Views are defined within a design document, in JavaScript, using a function that accepts a a document. When the view is constructed, every document in the database is supplied to the view, and the view then emits the information that you want to appear in the output. Don’t worry about the JavaScript, the JavaScript is executed on the server, not the client.

So, returning to MySQL for a moment, a query (without a WHERE clause), selects the fields to be returned the query output, and constructs a list of matching rows in the output. So, looking at an SQL statement:

When running a query on MySQL, the MySQL server takes the information in the table or tables, and then constructs the list of records (and corresponding fields) to be returned, like this:

On Couchbase, Views construct a list of records from the individual document information, creating an index as a side effect of that process. The result is a list of all the documents generated by the view.

???

In MySQL, when you execute a query with a WHERE clause, the index (hopefully) is used to help select the records you want to choose:

???

In Couchbase, the generated view is your table, and when you query the view, Couchbase uses the key values (and the associated index that was generated), with the query values that you specify filtering out the returned information to generate the final list of matching records.

This method of specifying the tables that you want to query enables you to simplify and optimize the way you extract information from the database. But it also means you need to give some more thought to how you want to query the information.

Next Time

Now you know how MySQL stores and queries information, and how that knowledge that can be translated when you migrate information to Couchbase Server 2.0. Next time, we’ll start building some queries based on what we’ve learned, and start looking at more advanced queries and the migration process.

Author

Posted by MC Brown

Mc Brown is a Vice President, Technical Publications & Education at Couchbase. Responsible for building the documentation & supporting information.

7 Comments

  1. ThankGod Adeyi June 23, 2013 at 1:45 pm

    Hi,

    1. is it possible to connect to Mysql database while working with CouchBase from a java application

    2. is it possible to demonstrate the scalability claim of CouchBase on a single system?

    1. Matt Ingenthron June 24, 2013 at 5:14 am

      On #1, it\’s definitely possible to connect to both MySQL and Couchbase simultaneously, no problem.

      On #2, generally yes. Couchbase has a lot of benefits over other types of databases (both SQL and other NoSQL) when deployed on a distributed system but even some of the aspects around managing memory, including the cache, actively show scalability benefits on a single system.

      1. ThankGod Adeyi June 24, 2013 at 9:18 am

        Hi,
        thanks for your response.The reason why i ask is that, We are looking at possbility of Migrating a university\’s portal records from Mysql to Cauchbase after my thesis. SO they want to me to demonstrate practically, scalability on a single or distributed system. Do you have any paper or link that explain the memory management on a single system that you said in your post.
        thanks

      2. ThankGod Adeyi June 26, 2013 at 9:07 am

        Ok. apart from installing cauchbase server and jdk, what environment can i use to develop a GUI java program that can do such a migration. I am a newbe in java but with much passion for this implementation. Please help me with some guide

  2. ThankGod Adeyi June 24, 2013 at 9:18 am

    Hi,
    thanks for your response.The reason why i ask is
    that, We are looking at possbility of Migrating a university\’s portal
    records from Mysql to Cauchbase after my thesis. SO they want to me to
    demonstrate practically, scalability on a single or distributed system.
    Do you have any paper or link that explain the memory management on a
    single system that you said in your post.
    thanks

    1. As you have discussed see before with Matt, you can connect you Java program to MySQL and Couchbase in the same time.

      Couchbase has now an integration with Talend ( http://www.couchbase.com/press… ). Talend is an ETL (Extract Transform Load) that will allow you to design graphically the flow to move your data from MySQL to Couchbase.

      The Couchbase team is current working on a tutorial that shows exactly that, stay tuned.

  3. Is the index kept? What is the life cycle of the view and its index?
    If new documents are added after the view is constructed, are they also added to the index, or is the index created for a single \”query\”, then discarded?
    What is the Couchbase name for a query (vs a view or index)?
    Is a couch base view created once on startup?
    Is a couchbase view the same as a map?

Leave a reply

Stay informed about Couchbase Blogs