Given the architecture and design of NoSQL systems, especially the document family databases like Couchbase that do not enforce schema on write, data modeling for NoSQL can often be challenging when migrating from relational systems to NoSQL or when building applications ground up using NoSQL. Infact, data modeling is often a critical success factor to Couchbase deployments, and progressively optimizing data models can be a huge application performance booster given the rapidly evolving nature of big data applications.

Traditionally, data modeling for relational systems was the science of identifying the data objects, their relationships with one another, and their accurate representation, that laid the foundation for a good database design. Modeling for NoSQL on the other hand, explores application specific access patterns e.g., “what are the types of questions users would like to answer with this data?”  This in turn dictates the kind of queries that need to be supported and focuses on how best to lay out the data for performance optimization. These considerations cause us to change our data modeling approach from the traditional constraints of RDBMS(schema on write) to modeling data for the specific application (schema on read).

Another differentiator between RDBMS and NoSQL systems which impact modeling paradigms is the concept of normalization and denormalization of data. While the RDBMS systems emphasized data normalization to understand strict relationships and meet rigid storage limitations back in the day, flexible NoSQL systems lean towards denormalization of data since data is distributed across clusters and redundancy can facilitate the scaling of data reads. Often, the ideal data model is a combination of both approaches, depending on the use case.

Hence, accurate data modeling continues to remain a core discipline for success with NoSQL databases.

Now, let’s dive into modeling using erwin DM NoSQL. erwin DM NoSQL offers three major functionalities:

Forward engineering

The process of converting your relational models to Couchbase compatible JSON models.

Transformation

Ability to choose your desired form of transformation (normalized, denormalized, custom) for your models.

Reverse engineering

Ability to import the schema from production data in Couchbase to the erwin environment.

The following is the step-by-step guide to modeling data for the Couchbase Data Platform using erwin DM NoSQL:

Set-up:

Step 1:

Request for an account here and login to erwin DM NoSQL using your credentials.

Step 2:

Push an Entity-Relationship model generated from erwin DM or any other relational modeling tool in “XML” format using the import option to erwin DM NoSQL.

Step 3:

Visualize your ER diagrams in erwin environment.

Transformation

Step 1:

Choose to transform the models using normalized, denormalized or custom transformation.

Step 2:

a.Normalized transformation:

Normalization is typically the process of organizing data in the database by creating separate tables and establishing relationships to remove duplication. The objects or entities in this process are usually referenced. When creating JSON models, referenced tables would typically be separate documents.

b.Denormalized transformation:

In this process, the data objects are usually embedded. Since, similar objects are embedded in many documents this organization exhibits redundancy. Denormalization is often found to improve performance significantly as we don’t need joins to fetch the data needed. This is often adopted in NoSQL systems. In this tool, the transformed model generated using denormalization is a combination of embedded and referenced objects.

In the fig above, purchase order, order item are embedded objects whereas Publisher, Store name and BookReturn are referenced objects.

c.Custom transformation:

The erwin DM NoSQL typically analyses the data organization using your E-R diagram and transforms your models to Couchbase compatible JSON models based on certain rules. However, since you as an app developer or an application owner know your application the best, we provide you the ability to customize your models. You have the option to reference certain embedded objects and we provide you some guidelines within the tool to help you make a right choice.

Note: You can clone any of these models and tweak them by adding or removing attributes, properties etc.,

Forward engineering:

Step 1:

Download the model created on your local system.

Step 2:

Set up Couchbase clusters on AWS and access web console using port 8091


Step 3:

Use secure copy and copy the downloaded file from your local machine to your Couchbase EC2 instance.

Syntax (OSX): scp -i path to key -r path to directory ec2-user@hostname:~

Ex:  scp -i /Users/chaitra.ramarao/Downloads/US_East.pem -r /Users/chaitra.ramarao/Downloads/Cl_De_EMOVIES\ r9.64_20180329185059  ec2-user@ec2-54-152-108-80.compute-1.amazonaws.com:~

Step 4:

Login to your AWS instance using ssh

Syntax(OSX):ssh -i path to key.pair ec2-user@remote ip

Ex: ssh -i /Users/chaitra.ramarao/Downloads/US_East.pem ec2-user@ec2-34-203-230-73.compute-1.amazonaws.com

Step 5:

Verify if the files are copied by listing them using “ls”

Step 6:

Navigate to the file bulkInsert.sh

Ex:Cd /De_movies../scripts/5.x to locate bulkInsert.sh

Step 7:

Make bulkInsert file an executable and set the path

Chmod +x bulkInsert.sh

PATH=/opt/couchbase/bin:$PATH

Step 8:

Execute bulkInsert script with following syntax

./bulkInsert.sh  <hostname> <portnum>  <couchbase instance username> <password> <bucket name to load the data>

Ex : ./bulkInsert.sh localhost 8091 Administrator password sample

You can see the modeled sample data loaded into Couchbase with bucket name specified by you.

The script typically creates models on junk data, it can be modified by the application developers to create models for actual data samples and use them to deploy to Couchbase using the deployment scripts provided.

You can also query the data once loaded into Couchbase using N1QL (SQL for JSON) and test the accuracy and efficiency of your models using Query workbench and query planner.

Reverse engineering:

Step 1:

Set up connection to Couchbase using connection manager as shown below:

Step 2:

Choose reverse engineer Couchbase

Step 3:

Schema Import of Couchbase production data

Step 4:

Visualize the reverse engineered Couchbase models in erwin environment  

Continue to tweak your models periodically to ensure your models are optimized to provide the best performance.

Data modeling using erwin will help you to accelerate your time to market, provide you the interface for visualizing your modeling process and tremendously improve the accuracy of your models. Better data models ensure better performance and stronger success with Couchbase.

Do share your comments here or reach out to us at forums.couchbase.com

Posted by Chaitra Ramarao, Sr. Product Manager, Couchbase Inc.

Leave a reply