Introduction to remote links

Couchbase is excited to announce its new Remote Links Analytics Service feature in the latest Couchbase Server 6.6 release. Remote links enable real-time operational analytics to obtain and analyze data from multiple Couchbase data clusters and datacenters in a separate cluster dedicated to the Analytics Service.

Customer use case

Prior to the 6.6 release, the Analytics Service was available within one cluster, but the service and its analyses were tied to that cluster. Several of our retail, lifestyle, and travel customers were performing analytics for their business lines (e.g., e-commerce, marketing, supply chain, etc.) in separate Couchbase clusters. They expressed a desire to unify data from various operational applications into a centralized analytics cluster. This motivated our engineering and product teams to help address this customer need. You can read more about other Analytics use cases here.

How do remote links work?

Remote links allow for the ingestion of data from the Data Service, a remote Couchbase cluster into an Analytics cluster. This is achieved in three simple steps:

  1. Set up a remote link by using a REST API call or the command-line interface (CLI)
  2. Create a dataset in the Analytics cluster on the remote link configured above
  3. Query the dataset using SQL++ (or your favorite BI tool)

Let’s walk through a simple example. iWorks, an e-commerce company, sells iPhone accessories online. The order data is stored in one Couchbase cluster in a bucket called “ecommerce” with docType “order”. The customer data is stored in a second Couchbase cluster in a bucket called “customer360” with docType “customer”. iWorks would like to use the Analytics Service to combine and analyze order data along with customer data to determine the top 3 customers by sales. The illustration directly below is prior to setting remote links:

Sample customer data:

Sample order data:

Let’s follow the three steps from above with sample setup code along with a SQL++ query.

Step 1: Set up remote links

We’ll create two remote links on a new Analytics cluster using a REST API call. (Alternatively, you can use the CLI to create remote links.) Let’s first set up “order” remote link. We will need to provide:

    • Analytics cluster hostname
    • Analytics user credentials
    • Remote link name (in this case remoteOrders)
    • Dataverse name (if different from default)
    • Link type as couchbase
    • Order cluster hostname
    • Order user credentials
    • Specify the desired encryption type (in this case none)

Let’s now set up the “customer” remote link on the Analytics cluster. This step is similar to the one listed above, except we have to provide a new remote link name (in this case remoteCustomers) along with customer cluster host details and credentials. In this case we choose “full” as the encryption type (for illustration purposes) and we include the required certificate parameter.

The certificate in targetClusterRootCert.pem can be retrieved from the web console of the target cluster.

 The certificate can be retrieved by first navigating to the Security tab on the left-hand navigation bar and then to the Root Certificate tab in the horizontal control bar.

The illustration below is after both remote links are set up:

Step 2: Create datasets and connect remote links

Using the Analytics workbench, we’ll now create two datasets named “orders” and “customers” on the two remote links we created above:

Next, we’ll go ahead and connect both the remoteOrders and remoteCustomers links to allow data ingestion to take place from the Orders and Customers data cluster to the Analytics cluster. This demonstrates the powerful NoETL feature of JSON analytics. To be clear, no ETL is needed to move our NoSQL JSON data from one system to another before being able to analyze it. This saves time and processing power, enabling us to analyze the data right away and in its natural (application) form on the Analytics cluster.

Step 3: Query using SQL++

As the last step, we can now run the SQL++ query listed below (looks exactly like SQL :)) to join orders and customers to get the top 3 customers with the highest sales.

Here are the JSON query results:

Woohoo! Remote links worked and we are now able to combine and analyze customer and order data together. Users can now develop a variety of complex ad hoc queries for further data exploration, answer new business questions, and bring in additional Couchbase data sources.

Benefits

Here are key benefits that come from using remote links:

  • Extend Analytics’ reach. Ingesting data from multiple clusters enables more data to be consolidated. Use cases include combining and correlating data from multiple locations or multiple applications, as we have just seen.
  • Lower Analytics’ total cost of ownership. The possibility of an independent Analytics cluster can reduce or eliminate the need for Analytics nodes to be included in each individual cluster, again as we have seen in the example above.
  • Enable even faster time to insight. Customers can gain more insight immediately by performing correlations across different datasets without requiring the data of interest to first be published to a data warehouse. Notice how few steps were needed to enable us to analyze our data; no ETL was involved and the data was immediately available.

Summary

Remote links help lower TCO, improve resource utilization, and enable hybrid transactional/analytical processing (HTAP) for NoSQL solution development and deployments, as is often needed in modern applications. Remote links allow users to bring more data together in a single place, which enables organizations to gather more insights and do more correlation-style analyses across different datasets drawn from different clusters.

You can learn more about Remote Links here. Register here for our upcoming “What’s new in release 6.6 webinar”.

Explore Couchbase Server 6.6 resources

 

Blogs

Docs and Tutorials

Webpages and Webinars

What’s New in Couchbase Server 6.6

What’s New in Couchbase Server 6.6?

New Features in Couchbase Server 6.6: Analytics, Backup, Query, and More

Eventing Improvements (Timers, Handlers, and Statistics)

Couchbase Server 6.6 Release Notes

Couchbase Analytics Service

Remote Links – Analyze Your Enterprise With Couchbase Analytics

Try the Couchbase Index Advisor Service

What’s New in Couchbase Server (Product Page)

External Datasets – Extend Your Reach With Couchbase Analytics

Set Up Analytics Remote and S3 Links Using REST API

Compare Editions

Announcing Flex Index With Couchbase

Create External Datasets Using Data Definition Language (DDL)

 

Introducing Backing Up to Object Store (S3)

Set Up Analytics Remote and S3 Links Using CLI

 

Import Documents With the Web Admin Console

  

Co-author

Idris Motiwala, Principal Product Manager

Idris is a Principal Product Manager, Analytics at Couchbase with 20+ years experience in design, development and execution of software products at both Fortune 500s and startups leading teams in digital transformation, cloud and analytics. Idris holds an MS in Technology Management and certifications in product management.

Posted by Till Westmann

Till Westmann is a Senior Director Engineering at Couchbase working on the Analytics Service. Before joining Couchbase Till built data management software at Oracle, 28msec, SAP, BEA Systems, XQRL, and Xyleme. He is a member of the Apache Software Foundation and the Vice President of the Apache AsterixDB project. Till holds a PhD from the University of Mannheim in Germany.

Leave a reply