You can’t judge a book by its cover, but you can judge the architecture of a distributed system by its topology.

If two distributed systems are equally effective, is the one with the simpler topology the one with the better architecture? This article compares the architecture of two document databases and two wide column stores by looking at their topologies.

Wide Column Store

Topology #1

wcs_one

 

Wow. There is a lot going on here. There are four nodes types and multiple components per node.

Topology #2

 

wcs_two

Nice. Simple. There is one node type.

Which wide column store would you choose?

  • Which one is going to be easier to deploy?
  • Which one is going to be easier to maintain?
  • Which one is going to be easier to scale?
  • Which one is going to be more resilient

I believe the less moving parts, the better.

Apache HBase

 wcs_hbase

Apaceh HBase sits on top of Apache Hadoop, so there are a lot of nodes types and components. Apache Hadoop requires name nodes and data nodes for HDFS. It requires job trackers and task trackers for map / reduce.  Apache HBase requires master servers, region servers, and a Zookeeper cluster. The Apache HBase, HDFS, and map / reduce components can be co-located. However, they don’t have to be.

The master server and the name node may be single points of failure. However, multiple name nodes can be deployed, as can multiple master servers. That being said, there will be problems if the name nodes are unavailable, the master servers are unavailable, and / or the Zookeeper cluster is unavailable.

Apache Cassandra

 

wcs_cass

There is one node type. That’s it. Clients communicate directly with the nodes. There are no single points of failure. There are no dependencies on independent nodes or separate clusters.

Document Databases

Topology #1

 

doc_db_one

Wow. There is a lot going on here. There are four node types and two layers of logical groupings.

Topology #2

 

doc_db_two

Nice. Simple. There is one node type.

Which document database would you choose?

  • Which one is going to be easier to deploy?
  • Which one is going to be easier to maintain?
  • Which one is going to be easier to scale?
  • Which one is going to be more resilient?

I believe the less moving parts, the better.

MongoDB

 

doc_db_mongo

The MongoDB topology is similar to the Apache HBase topology. The difference is that clients to not directly connect to the nodes. The client requests are proxied by the router nodes. The router nodes retrieve shard information from the config nodes. A shard consists of a replica set. A replica set consists of multiple nodes and an arbiter.

Like Apache HBase, the router node and the config node may be single points of failure. However, like Apache HBase, multiple router nodes and multiple config nodes can be deployed. That being said, there will be problems if the router nodes and / or the config nodes are unavailable.

Couchbase Server

 

doc_db_cbs

There is one node type. That’s it. Clients communicate directly with the nodes. There are no single points of failure. There are no dependencies on independent nodes or separate clusters.

Summary

A great architecture balances flexibility and simplicity. There is value in a modular architecture. There is value in a simple architecture. However, modularity does not have to be reflected in the topology of a distributed system. Couchbase Server is a modular, distributed system. A single instance is compromised of multiple components and multiple services. However, the modularity is not forced on administrators. It is an aspect of the distributed system itself, not its deployment.

Join the conversation over at reddit (link).
Join the conversation over at Hacker News (link).

Author

Posted by Shane Johnson, Director, Product Marketing, Couchbase

Shane K Johnson was the Director of Product Marketing at Couchbase. Prior to Couchbase, he occupied various roles in developing and evangelism with a background in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.

2 Comments

  1. Nicely written! Very effective.

  2. […] only does Couchbase Server delivers the highest performance, it’s easy to scale (link) and it’s consistent […]

Leave a reply