So you downloaded the new 4.5 Release of Couchbase and now you are ready to get started. If you are a Developer or even a DBA you will definitely enjoy some of the major new features rolled into 4.5. So after you start and feel that all is well with the world you get a sudden burst of content from a different team that has access to Couchbase Server. You think for a moment, wait, how am I supposed to have visibility into the data entity, the other team has given me nothing yet and you have to create some initial queries for your adhoc reporting team tomorrow! I can not stress the importance of this need enough! This is because schema with JSON documents doesn’t inherently exist per se, much less inferring schema-less JSON documents, which is a very big thing! So what to do…….
Automatic-Schema discovery, to the rescue.
In 4.5 we have a new tab you may have noticed from past versions. So let us take a look at the new Query Workbench tab a little more closely. At first glance it all seems empty, you notice the textbox to enter your query, and the output window where your results can be displayed in JSON, Table or Tree format depending on your choice of visuals. However note the area titled Bucket Analysis. When you first create a bucket and add some JSON data you may see something like below indicating that your new bucket “travel-sample” does not have any content indexed, not even the lowly primary index.
All this means is that your bucket has zero indexes and it is ready for you to add an index, at least a primary index to identify your data.
So let us go ahead and create at least 1 index! Go ahead and enter.
Create Primary Index ON
Once you do this you will see the bucket shifts from the Non-indexed Bucket section to the Fully queryable Bucket section. This will identify that you can now automatically discover what is inside your JSON documents, by default this analyzes a sample size of 1000 documents. If the bucket contains less than the sample size, then all documents will be used.
You will see the name of the buckets that you can Query. You will also see a list of Indexed fields that someone may have added.
Keep in mind that within couchbase you can have many different types of JSON documents or different entities of data within a single buck. An example of this variety of data can be for example, customer details, user profile, product details, etc. In Couchbase, there is no direct concept of tables but instead can leverage the JSON document to delineate a table like representation such as a ‘type’ attribute which would contain a value that would be equivalent to a table name. This naming type allows you to group your JSON documents easily.
If we expand this as shown below we will now be able to correlate these automatic schema types. In the following screenshot you will see the data organized in groups by their type or “flavor” these flavors are the different values, they would be equivalent to a table name in a relational DB.
As you can see the flavor discovered or inferred has an attribute called type with 3 different values (airport, airline, route). This allows you to see for each type what is the associated attributes that belong to that JSON document type. This visualization along with the list of the attributes with their data types, whether it's a date, a string, integer or an array will be displayed.
This allows you to be able to formulate additional possible indexes you could create onto different attributes of your choosing. If you notice below there are already attributes that are displayed in Bold, this means which ones have an existing index on them.
If you want to dive deeper and actually get some sample items and visualize the data output in a table like structure then the infer results is the perfect way to see the automatically derived schema as well as sample documents to go along with this schema. It also can serve as a quick analytics of all your data set and possibly even some total doc count broken down with values and percentages down to the attribute level!
That's all great, but i’m picky I want an alternate way to derive my schema!
We are able to auto discover or infer the schema 1 of two ways, either via the bucket analysis or through the query command of
Notice that once this is executed you will see the results in Json, Table or Tree view. We have chosen to view it as a Table structure to be able to visualize this better.
This allows you to quickly be able to see the flavors or the distinct groupings of data based on type attribute in this case, and see the variations of data in one snapshot. You are also able to get a guide to see if all of your data has the exact same schema or what percentage may vary, and what particular attributes may have differences within your data set.
So what does this mean for me
This means that you know have a data platform in the NoSQL realm that can quickly provide for you an auto schema discovery feature that is unparalleled compared to other NoSQL packages out there. So imagine you can begin to generate very quick and simple analytics that could integrate into your BI tools such as Tableau or Informatica to name a few and get insight into your data in mere milliseconds!
Visit www.couchbase.com/download to get Couchbase 4.5 and use auto-schema discovery
Until next time….