Couchbase 6.6 comes with a much needed feature, Import documents using the Couchbase Admin Web Console. This provides an easy way to quickly import small datasets in a variety of formats to compliment cbimport which is a more comprehensive command-line solution, with a lot more data import options. 

In this blog post, we will look at some use cases and some gotchas when Importing data.

Checking out the feature

Import Documents is accessed by Clicking on the Documents link on the left panel and the Import Document button in the blue panel at the top of the page. 

The fields are all self explanatory, but I’ll take a look at importing a small dataset (just 5 lines) to demonstrate the feature. I have created 4 files with different formats on my laptop to demonstrate the feature. Note that we do not necessarily need an empty Destination Bucket, but for this test, I created a bucket test which does not have any documents yet.

Let’s import our JSON List dataset.

  1. I clicked on the Select File to Import button and selected airport.json, a file with 5 JSON documents on my laptop.
  2. Before actually importing the data, the screen shows a sample of the File Contents in different formats. This serves as a quick check.
  3. Next I have a choice of setting the key of the document while importing, in the Import With Document ID radio buttons. The choices are UUID, or where possible, a Value of Field.
    • Note when choosing Value of Field:
      • This field has to be in every document with a non-null, unique value. 
      • The tool does not ensure that candidate ID fields have unique values across every document. The Import UI only checks to ensure that the field is present in every document.
      •  If you select an ID field with duplicate values, then older documents will get overwritten by new documents with the same ID.
    • For now, I will stick with the UUID choice and go ahead with the Import.
  4. Next, I selected test as my Destination Bucket.
  5. Then, I click on the Import Data button at the bottom of the screen.

Import is successful and it also displays the number of documents imported in a pop up box.
The screen also shows a helpful cbimport command in case the data size is large.

 Lets import the same set of documents, but this time, choose the Value of Field like so:


I will choose id.
Now, that I have imported the same set of documents twice, but with different keys, I will now have 10 documents in the bucket. Let’s check this out by clicking on the Document Editor button in the blue panel on top.

Here we see the 10 documents, a set of 5 with the id field as the document key and another set of 5 with the server generated UUID as the key.

Let’s check out 1 document:

Looks good.

Further tests

You can also test importing the same set of data in different formats.

File Formats

JSON List

A JSON list is a list (indicated by square brackets) of any number of JSON objects (indicated by curly braces) separated by commas.

JSON Lines

JSON lines is a file where each line has a separate complete JSON object on that line.

CSV (Comma Separated Variables)

Note:

  • The CSV format “flattens” JSON data and does not support arrays or nested values.
  • The CSV format doesn’t have a well-defined way to support null values. String values in CSV are optionally quoted, so there is no standard way to distinguish the string “null” from the value null. So, after importing from a CSV dataset, the value null will be imported as the string “null”.

TSV (Tab Separated Variables)

Note:

  • The TSV format “flattens” JSON data and does not support arrays or nested values.
  • The TSV format doesn’t have a well-defined way to support null values. String values in TSV are optionally quoted, so there is no standard way to distinguish the string “null” from the value null. So, after importing from a CSV dataset, the value null will be imported as the string “null”.

Explore Couchbase Server 6.6 resources

Author

Posted by Prasad Doddi

Prasad is a Senior Product Manager for Couchbase Supportability, Manageability and Tools. Prior to Couchbase, he worked at IBM in various departments including Development, QA, Support and Technical Sales. Prasad holds a master’s degree in Chem. Engg. from Clarkson University, NY.

Leave a reply