This technical article showcases the life of a Full Text Search query through a distributed Couchbase system as it originates from a client to until a response is received.

Before diving into the server world, a little about the client – here’s where or what it could be …

  • on a system where the user has an application/program that uses one of Couchbase’s SDKs or simply a curl/HTTP (REST) request typed up to connect to a distributed Couchbase cluster
  • the Couchbase search UI on one of the servers in the cluster
  • the Couchbase query workbench on one of the servers in the cluster
  • the Couchbase query command line interface on one of the servers in the cluster

Think this might be the right time to know (if you don’t already) that Couchbase hosts a distributed Full Text Search service that requires configuration. This means that if you set up a Full Text Search index over one such cluster to ingest data from a Couchbase bucket, you will be allowed to partition the index among multiple servers.

If you’re unfamiliar with how to set up a Full Text Search index on Couchbase, now may be a great time to check out our documentation here. For details on the various kinds of queries that the search service supports, click here.

An advanced user would likely split their index into enough partitions so the indexed data is evenly (more or less) distributed across the cluster of servers hosting the “search” service. They will also NOT need to worry about searching across the distributed data when they shoot a query to the system – the search services will communicate within the cluster to put together a complete response for the user.

Before we go into the details, here’s a GIF illustrating a user that receives a response after sending a request (query) from a client. Their Couchbase cluster has a Full Text Search index that’s partitioned into six parts over three servers hosting the search service.

A Full Text Search query once built at the client can be targeted to any server in the Couchbase cluster hosting the search service. Here are the stages it goes through …

  1. The server that the client targets the search request to assumes the role of the orchestrator or the coordinating node once it receives the external request.
  1. The coordinating node first looks up the index (making sure it exists).
  1. The coordinating node obtains the “plan” that the index was deployed with. The plan contains details on how many partitions the index was split into and all the servers’ information where any of these partitions reside.
  1. The coordinating node sets up a unique list of servers that it needs to dispatch an “internal” request to. A server in the Couchbase cluster is eligible if and only if it hosts a partition belonging to the index under consideration.
  1. Once the internal requests have been dispatched by the coordinating node to each of the servers, it’ll wait to hear back from them. Simultaneously, if any of the index’s partitions are resident on the coordinating node – search requests are dispatched to each of those partitions as well (disk-bound).
  1. Those servers in the cluster that receive the “internal” request from the coordinating node will forward it to each of the index partitions they host (disk-bound).
  1. Separate search requests that are dispatched concurrently to all index partitions resident within a server and the server waits to hear back from them.
  1. Once the server hears back from all the partitions it hosts, it merges the results obtained from each of the partitions before packaging them into a response and shipping it back to the coordinating node.
  1. So to sum it up, the coordinating node waits for responses from …
    • each of the index partitions resident within the node
    • each of the servers in the cluster that it dispatched the internal request to
  1. Once all the results from the local index partitions and the remote index partitions are obtained, the coordinating node merges all of them, packages them into a response,and ships them back to the client where the request originated.

This sequence of steps is what we, here at Couchbase refer to as the Full Text Search service’s scatter-gather operation.

The search request that the user needs to  put together at the client encapsulates the search query and optionally – a number of filters for pagination, sorting, scoring, etc. Click here for documentation on search request settings and available options. Here’s a sample search request on the index travels using curl …

Like the search request, the search response is also in JSON format. Here’s a sample response showing the top three hits aggregated from six index partitions for the above request …

As for the contents of the search response …

  • The “status” field contains …
    • the “total” number of partitions for the index that the results were obtained and merged from
    • the number of those partitions that were “successful” in responding to the request
    • the number of those partitions that “failed” to respond to the request for whatever reason
  • The “request” field holds the search request (query) that was received from the client.
  • hits” is an array of objects, where each object is essentially a result or a document hit.
    • The “index” field within each object is the index partition where the document resides
    • The “id” is the document key as inserted into the couchbase bucket
    • The “score” is the relevance of the document for the search parameters, calculated using the tf-idf algorithm.
    • The “sort” indicates what field was used to order the hits obtained.
  • total_hits” is the total number of results available for the search request (query).
  • max_score” is the max tf-idf (relevance) score for any of the document hits.
  • took” is the amount of time in nanoseconds taken by the coordinating node to process the request and ship the aggregated response.
  • facets” contains relevant content if the search query requests for any facet/aggregated information.

Additionally, not included in this search response …

  • If any of the index partitions failed to respond to the request due to any error, a separate “errors” sub field will be included indicating the nature of the error(s). Here’s a sample …
  • The above example is a situation where the search service responds to the user with a “partial” response of results aggregated from only those index partitions that were successful. The user should investigate the health of a cluster and what’s caused some of the index partitions to fail or become unresponsive.

Hope you find this article of some help. Get the latest release of Couchbase Server from the Downloads page!

Author

Posted by Abhinav Dangeti, Software Engineering, Couchbase inc.

Work on Couchbase's Distributed Full Text Search

Leave a reply