How many times have you written a SQL query that used a
LIKE operator and some wildcards to find text in a string? What kind of performance did you get when you ran that against millions of records? It was probably not good, right?
While you could use wildcards in SQL, it probably isn’t the best way for most scenarios. This includes N1QL queries. Instead, meet Full Text Search (FTS), a technology that has been around for a while, but recently included in Couchbase Server.
The Full Text Search found in Couchbase is based on Bleve, a search and indexing library written in Golang.
Fuzzy Searching with Full Text Search in Couchbase
When comparing FTS against a wildcard N1QL or SQL query, there is more to compare against than just the performance aspect. Take for example, its ability to match in a fuzzy fashion. Let’s assume we had the following query:
SELECT * FROM `default` WHERE message LIKE '%bananas%';
What happens if the database has a record containing Bananas with a capital B? In this circumstance it would not be included in the results. What happens if you’re like me and can’t spell, leaving results like bannanas in the database? It will not be included in the results either.
With Full Text Search, the following query can be done with a fuzziness factor:
Ignoring that SQL and FTS have a different query syntax, the above would offer two factors of fuzziness. This means that two characters in the search term can be altered to get to our result. Our data could include any of the following:
It isn’t limited to just the four above, but you get the idea.
However, what if our database contains bandana which in this case isn’t a typing mistake. Do we really want searches for legitimate other items appearing in our results?
When executing a Full Text Search query for a term or set of terms, each result is scored based on how relevant it is to the initial search query. You can wrap your business logic around the scored results.
The Different Types of Queries in Couchbase’s Full Text Search
There are many different types of FTS queries that can be performed in Couchbase. The type of query I’ve been mentioning so far is best classified as a Match Query, where the search term is used to match against the index with and without fuzziness.
A few other types of queries include, but are not limited to:
- Match, Phrase, Fuzzy, Prefix, Regexp, Wildcard, Boolean Field
- Conjunction, Disjunction, Boolean, Doc ID
- Date Range, Numeric Range
- Query String
Each type of query is designed for a different task. More about what they do can be found in the official documentation regarding types of queries.
Creating a Full Text Search Index in Couchbase
Before searching can happen, an index must be created. This should not be confused with N1QL indexes as these are two very different things.
Within the Couchbase Administrative Dashboard, you’ll have the opportunity to create Full Text Search indexes.
To create an index, you’ll need to name it and assign it a bucket. A mapping must happen, so you must specify which document property represents the type of JSON document it is. Take the following for example:
type property obviously says that this particular document is a
With the JSON type mapped to the document type, further mappings must occur. For example, let’s say that this particular index is for
person documents. A new type mapping must be created and named appropriately to what might exist in the document’s
Now depending on how you want to index your fields, you can either choose to index everything, or index only certain properties. If you want to index certain properties, you would add a new child field underneath the type mapping that we had just created.
Indexing certain properties means that when search is executed, only those properties will be searched, not every field in the document.
Including FTS in Your Applications
With an index created, we probably want to include functionality in an application using one of the various Couchbase Server SDKs.
Let’s say we wanted to do a Match Query with Node.js. The code might look something like the following:
var SearchQuery = Couchbase.SearchQuery;
var query = SearchQuery.new("INDEX-NAME-HERE", SearchQuery.match("SEARCH-QUERY-HERE"));
query.fields(["FIELDS", "TO", "RETURN", "WITH", "PATH", "DEFINED"]);
In the above example, a new query is created against some specified index. The search term is passed into this query. The query can be further customized to include certain fields in the response as well as if the search hit should receive HTML markup to highlight it.
The same thing can be accomplished in Java with the following code:
MatchQuery fts = SearchQuery.match("SEARCH-QUERY-HERE");
SearchQuery query = new SearchQuery("INDEX-NAME-HERE", fts);
query.fields("FIELDS", "TO", "RETURN");
Notice the similarities between the two very different languages? The APIs between the SDKs are meant to be similar and they can easily be extended, for example, a fuzziness value can be added.
Full Text Search (FTS) is an amazing feature in Couchbase Server 5.0 and above. It allows you to search very efficiently and more natural than adding a bunch of wildcard characters to a N1QL query.
For more information on FTS, check out the Couchbase Developer Portal.