Six thousand years ago, the Sumerians invented writing for transaction processing — Gray & Reuter

By any measure, MongoDB is a popular document-oriented JSON database. In the last dozen years, it has grown from its humble beginnings of a single lock per database to a modern multi-document transaction with snapshot isolation.  MongoDB University has trained a large number of developers to develop on the MongoDB database. 

There are many JSON databases now. While it’s easy to start with MongoDB to learn NoSQL and flexible JSON schema, many customers choose Couchbase for performance, scale, and SQL.  As you progress in your database evaluation and evolution, you should learn about other JSON databases. We’re working on an online training course for MongoDB experts to learn Couchase easily.  Until we publish that, you’ll have to read this article. 🙂 

If you know RDBMS like Microsoft SQL Server and Oracle, we have easy to follow courses to learn do the mapping of your database knowledge to Couchbase with these two courses:

  1. CB116m – Intro to Couchbase for MSSQL Experts
  2. CB116o – Introduction to Couchbase for Oracle Experts

SUMMARY

MongoDB and Couchbase have many things in common. Both are NoSQL distributed databases; Both use JSON model; Both have high-level query languages with support for select-join-project operations; Both have secondary indexes; both have an optimizer that chooses the query plan automatically. ; Both support intra and inter cluster replication.

As you’d expect, there are differences.  Some are more significant than others.  Couchbase is designed to be distributed from the get-go.  For example, the data container Bucket is always distributed — nothing to shard.  Simply add new nodes and the system will automatically distribute. Intra cluster replication requires no new servers — simply set the number of replicas and you’re all set. From the developer interaction perspective, the big difference is the query language itself — MongoDB has a proprietary query language and Couchbase has N1QL – SQL for JSON. MongoDB uses its B-Tree based index for search as well and recently released $searchbeta for the Atlas service using Apache Lucene; Couchbase has a builtin Full Text Search.

Hopefully, the differences in Couchbase are the ones that make your life easier.  Let’s deep dive.

HIGH-LEVEL TOPICS

  1. Resources
  2. Architecture
  3. Database Objects
  4. Data Types
  5. Data Model
  6. SDK
  7. Query Language
  8. Indexes
  9. Optimizer
  10. Transactions
  11. Analytics

RESOURCES

ARCHITECTURE

Laptop Version: 

MongoDB:  Simply install and use the mongod on your laptop with the right parameters; you’re up and running.  Single process to deal with the whole database.  This has changed a little bit in 4.2 where you’d need mongos to run your transactions. All of the MongoDB features (data, indexing, query) are available here — except full text search available only on the Atlas service.

 

 

 

 

Couchbase: Couchbase is different.  It has abstracted each of the services (data, index, query, search, analytics, eventing) and you have the option to choose which of the features you’d want to run on your instance to optimize the resources. A typical installation has data, index, and query.  Search, eventing, and analytics will run on your laptop — install and use them per your use case.

 

 

 

Cluster deployment: As with most NoSQL databases, both MongoDB and Couchbase can scale out. In MongoDB, you can scale by sharding the collection into multiple nodes. You can shard by hash or range.  Without explicit shard, each collection remains in a single shard.  The config servers store the metadata and configuration for the cluster.MongoDB is uniformly distributed and Couchbase is multi-dimensionally distributed.  Mongod process (service) manages data, index and query on every shard (node) whereas Mongos does the distributed query processing and merging from intermediate results and does not manage any data or index.  Mongos acts as the coordinator and mongod is the worker bee. 

Couchbase can be deployed in a uniform distribution with each node managing the data and all services – data, index, query, analytics, and eventing.  Each service is a layer in the traditional database. These services are loosely coupled — they run in different process space and communicate via a network.  Hence they can be deployed uniformly in a single node or distributed multi-dimensionally on a cluster. The choice depends on your workload and SLAs. The data itself is stored in buckets. All the buckets are hash partitioned among given nodes — this is automatic and doesn’t require any specification. When the application has the document keys, it can directly operate on the data without any intervening nodes.  This is one of the key architectural differences contributing to high performance and scale-out of Couchbase.   In addition, there are no config servers. The metadata and its management is built into the core database.   The data service manages data, cluster and replication within a Couchbase cluster. Replication between multiple Couchbase clusters is managed by XDCR.  Read this article to understand the replication mechanisms in MongoDB and Couchbase:  Replication in NoSQL document databases (Mongo DB vs Couchbase)

Inside the cluster deployment.

MongoDB’s cluster components and deployment are explained here and I assume that as prior knowledge.  I’ll avoid repeating.

Couchbase deployment starts with the key-value data service.  This is the (consistent) hash distributed key-value data store. This also has intracluster replication built-in eliminating any need for separate replica servers or config servers.  The query service orchestrates the execution of N1QL queries. Uses GSI (Global Secondary Indexing), FTS (Full-Text Search) indexes as needed.  FTS manages the full-text index and can be queries directly or via N1QL query serviceThe Eventing function enables you to automatically trigger action (by executing Javascript function) upon data mutation.  The Couchbase Analytics engine is an MPP data and query engine.  Makes a copy of the data and redistributes into its nodes, executes the query in parallel for the best performance possible. All of these can be seamlessly used by the rich set of APIs available in our SDKs available in all the popular languages. 

DATABASE OBJECTS

MongoDB has a collection and database as as the logical objects users have to work with. Couchbase traditionally had just the Buckets. Bucket worked both for resource management  (e.g. amount of memory used), security as well as the data container. In 6.5, we introduced the notion of collection and scope as a developer preview.  This bucket:scope:collection hierarchy is analogous to RDBMS’s database:schema:table.  This makes the database more secure and a better multi tenant.  In 6.5, without the developer preview, each bucket uses a default scope and collection, making the transition seamless.

RDBMS

MongoDB

Couchbase

Database

Database

Bucket

Table

Collection

Bucket

Future: Collection

Row

Document (BSON)

Document (standard JSON)

Column

Field/Attribute

Field/Attribute

Partition (Table/collection/bucket)

Not partitioned by default.

Hash & range partitioning (sharding) is supported manually.

Partition (hash automatic)

Notes to Developers

In MongoDB, you start with your instance (deployment) and create databases, collections and indexes.

In Couchbase, you start with your instance and create your buckets and indexes. Each bucket can have multiple types of documents, so each document should have an application designated field for recognizing its type. {“type”: “parts”}. Since each bucket can have any number of types of documents, you should avoid creating too many buckets. This also means, when you create an index you’ll be interested in creating an index for each type: customer, parts, orders, etc.  So, the index creation will include a WHERE clause for the document type.

CREATE INDEX ix_customer_zip  ON customer(zip) WHERE type = “customer”;

SELECT * FROM customer WHERE zip = 94040 AND type = “customer”

Each MongoDB document contains an explicitly provided or implicitly generated document id field _id.

In Couchbase, the users should generate and insert an immutable document key for each document.  When inserting via N1QL, you can use the UUID() function to generate one for you.  But, it’s a good practice to have a regular structure for the document key.

DATA TYPES

MongoDB’s data model is BSON and Couchbase data model is JSON. The proprietary BSON type has some types, not in JSON.   JSON has a string, numeric, boolean (true/false), array, object types.  BSON has a string, numeric, boolean, array, object, binary, UTC DateTime, timestamp, and many other custom proprietary extensions,  The most common difference is the DateTime and timestamp.  In Couchbase, all time-related data is stored as string in ISO 8601 format.  Couchbase N1QL has a plethora of functions to extract, convert, and calculate on the time.  Full function details are available in this article

Data Type

MongoDB

Couchbase

JSON

Numbers

BSON Number

JSON Number

{ “id”: 5, “balance”:2942.59 }

String

BSON String

JSON String

{ “name”: “Joe”,”city”: “Morrisville” }

boolean

BSON Boolean

JSON Boolean

{ “premium”: true, ”pending”: false}

datetime

Custom Data format

JSON ISO 8901 String with extract, convert and arithmetic functions

{ “soldate”: “2017-10-12T13:47:41.068-07:00” }

MongoDB:

{ “soldate”: ISODate(“2012-12-19T06:01:17.171Z”)}

spatial data

GeoJSON

Supports nearest neighbor and spatial distance.

“geometry”: {“type”: “Point”, “coordinates”: [-104.99404, 39.75621]}

MISSING

Unsupported

MISSING

NULL

JSON Null

JSON null

{ “last_address”: null }

Objects

Flexible JSON Objects

Flexible JSON Objects

{ “address”:  {“street”: “1, Main street”, “city”: Morrisville, “zip”:”94824″}}

Arrays

Flexible JSON Arrays

Flexible JSON Arrays

{ “hobbies”: [“tennis”, “skiing”, “lego”]}

ALL ABOUT MISSING

MISSING is the value of a field absent in the JSON document or literal.

{“name”:”joe”}  Everything but the field “name” is missing from the document.  You can also set the value of a field to MISSING to make the field disappear. Traditional relational databases use three valued logic with true, false, and NULL.  With the addition of MISSING, N1QL uses 4-value logic

You have the following expressions with MISSING.  

IS MISSING

Returns true if the document does not have a status field

FROM CUSTOMER WHERE status is MISSING;

IS NOT MISSING

Returns true if the document has a status field

FROM CUSTOMER WHERE status is NOT MISSING;

MISSING AND NULL

MISSING is a known missing quantity

null is a known UNKNOWN. You can check for null value similar to MISSING with IS NULL or IS NOT NULL expression.

Valid JSON:  {“status”: null}

MISSING value

Simply make the field of any type to disappear by setting it to MISSING

UPDATE CUSTOMER SET status = MISSING WHERE cxid = “xyz232”

DATA MODELING

RelationshipMongoDBCouchbase 
1:1
  • Embedded Object (implicit)
  • Document Key Reference
  • Embedded Object (implicit)
  • Document Key Reference
1:N
  • Embedded Array of Objects
  • Document key Reference
  • Query with $lookup operator
  • Embedded Array of Objects
  • Document key Reference
  • Query with INNER, LEFT OUTER, RIGHT OUTER, NEST, UNNEST  joins
N:M
  • Embedded Array of Objects
  • Arrays of objects with references
  • Difficult to query with $lookup operator
  • Embedded Array of Objects
  • Arrays of objects with references
  • Query with INNER, LEFT OUTER, RIGHT OUTER, NEST, UNNEST  joins

PHYSICAL SPACE MANAGEMENT

Index TypeMongoDBCouchbase 
Table StorageFile system directoryFile system directory
Index StorageFile system directoryFile system directory
Partitioning – DataRange and hash sharding are supported.Hash partitioning

Stored in 1024 vbuckets

Partitioning – IndexTied to the collection sharding strategy since all (sub) indexes are local to each mongod node.Always detached from Bucket

Global Index (can use a different strategy than the bucket/collection)

Supports hash partitioning of the indexes.

Range partitioning, partial indexing is manual via partial indexes.

SDKs

My personal knowledge of both SDKs is limited.  There should be equivalent APIs, drivers, and connectors with the two products.  If not, please let us know.

SDKMongoDBCouchbase 
JavaMongoDB java driverCouchbase Java SDK, 

Simba & CDATA JDBC

CMongoDB C Driver

ODBC driver

Couchbase C SDK,

Simba & CDATA ODBC

.NET, LINQMongodb .NET provider.Couchbase .NET provider

LINQ provider

PHP, Python, Perl, Node.jsMongoDB SDK on all these languagesCouchbase SDK on all these languages
golangMongodb go sdkCouchbase go sdk

QUERY LANGUAGE

SELECT:   Mongo has multiple APIs for selecting the documents.  find(), aggregate() can both do the job of simple SELECT statements. We’ll look at aggregate() later in the section.

INSERT

In MongoDB, providing _id is optional.  If you don’t provide its value, Mongo will generate the field value and save it.  Providing document KEY is mandatory in Couchbase.

UPDATE

DELETE

MERGEMERGE operation on a set of JSON documents is often required as part of your ETL process or daily updates.  MERGE statement can involve complex data sources with complex business rule based predicates.  Couchbase provides the standard MERGE operation with the same semantics.  In MongoDB, you had to write a long program do this, but then some of the set operation rules (e.g. each document should ONLY be updated once) are difficult to enforce from an application.  In Couchbase, you can simply use the MERGE statement, just like RDBMS.

DESCRIBE:

JSON data is self-describing and flexible. MongoDB Schema helper is available via Compass visualization in the Enterprise Edition only.

Couchbase has INFER to analyze the understand the schema. Both the query service and the analytic service can infer schema.

    1. Query service INFER command
    2. Analytics Service has array_infer_schema() function.

Here’s the INFER output example.