One of my favorite new features being previewed in Couchbase Server 4.5 is Full Text Search, or FTS. Last year it was released as a standalone Developer Preview, and now you can try a new Developer Preview that's fully integrated with Couchbase 4.5. Cécile already wrote a blog post about using FTS in a Couchbase backed content management application; here I want to give you an introduction to the feature set.
What is fulltext and why should I care?
Fulltext allows you to search and find what you're looking for even without exact matches. Just like the LIKE keyword in SQL? Not really. It's something else. LIKE allow the use of wildcards, which is quite different. Take the following example:
'Couchbase is Awesome'
If your query contains “field LIKE '%is Awesome%'“, it will match. If it contains “LIKE '%is awesome%'“, it won't because LIKE is case sensitive. Some SQL languages support the ILIKE keyword that is case insensitive. Currently it is not supported by N1QL but you can use the TOLOWERCASE functions and get the same effect. It also won't match if you do a typo like “LIKE '%is awesom%'“. So LIKE lets you match an exact part of a field.
Fulltext does so much more. It will match 'couchbase awesom'. This means it's case insensitive, it can ignore unimportant words like 'is' (stop word is the technical term), and is tolerant to mistakes like typos.
Is that all? No, let me throw out the mandatory new release feature list:
- Text Analysis with several prebuilt analyzers: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Thai, Turkish
- Different query types
- Term, Phrase, Match, Match Phrase, Prefix
- Conjunction, Disjunction, Boolean
- Numeric and Date Ranges
- Query String (A convenient query syntax for calling most of the types listed above, like “ale -hoppy +sweet”)
- Scoring (tf-idf)
- Stored Fields, Result Highlighting, and Results Snippets
- Faceting: Terms Facet, Numeric Range Facet, Date Range Facet
All these features are backed up by Bleve. It's an open source full-text search and indexing project written in Go and initiated by Couchbase's engineers. Couchbase FTS is the integration of Bleve into a Couchbase cluster.
What can you do with Fulltext search?
You know that magnifying glass you see on every website and that you put on every website wireframe without thinking twice about it? It's often fulltext search. Google has set user expectations to a fulltext level. You want to be able to run faceted search or show a search result in context. And when you think about it, you can say the same for every highly successful website like Spotify, Netflix or other digital economy champion.
I thought I could use ElasticSearch or SOLR for Fulltext…
Short answer: yes, you can. There are Couchbase Server connectors for both. Right now Couchbase FTS is still in developer preview and has not announced a GA date, so you shouldn't be replacing anything just yet. In the long run, those products might have features you need that FTS doesn't. Couchbase FTS won't be the silver bullet if fulltext search is the core of your business. For a lot of seach use cases, though, Couchbase FTS will be enough so that you don't have to deploy another tool, on another box, keep it in sync, monitor it, make sure the connector's working right, manage it and so forth.
Couchbase FTS is the integration of Bleve with Couchbase Server. It allows us to create a fulltext index based on the contents of the JSON documents stored in Couchbase. A normal index is a list of documents and the words they contain (technically, terms are stored, not words, but we're among friends here). A fulltext index is a type of inverted index: it tells you, for a given word, a list of all the the documents where that word appears.
Here's how you can use FTS to create your own full text indexes.
Starting in Couchbase 4.5, all indexes have been regrouped under the Index tab (that is conviently placed next to the Query tab, sending you to our cool new query workbench). In the Index tab you will find Global Indexes used by N1QL, Views and Full Text.
The screen is split into three parts. The first one lets you run fulltext queries, the second lets you define fulltext indexes and the third lets you define aliases to fulltext indexes (or other index aliases). Why would you need an alias for you index? It adds a level of indirection that is useful for your app. Most of the time when working with fulltext indexes you spend time tweaking it to get just the results you want. Maybe you make an index, use it for a while, see how it performs, try another index with slightly different settings, maybe you eliminate some junk documents, etc. If you use an alias, you don't have to change the name of the index every time in your app. Instead, you just change which index your alias points to. Both indexes are working the whole time, and you just switch from one to the other.
To start, you need to create an index. I am going to create a simple one called “beer-ft.”
Once the index is created, go back to the Full Text tab. Now you can select the index and run a fulltext query. This gives you all documents that contains the word 'ale' in any of their fields. You can constrain the search to a specific field. For example, if you want to know which documents have 'ale' in the name field, you can write 'name:ale'.
If you want to know which ales are lighter, you might be tempted to use 'light ale' as query string. This will actually give you the list of documents that contains ale and light. If you use '”light ale”' in quotes, you will get the list of document that contains “light ale”.
There are other ways to tune you query, take a look at Bleve's documentation here. Indexes also have more advanced functionalities like type flitering, custom fields mapping, child mapping for embedded/nested structures, custom analyzer and more.
Will FTS be GA with Couchbase Server 4.5?
No. FTS will ship integrated with Couchbase Server 4.5 GA, but FTS will not be supported for production use.
Will there be differences between DeveloperPreview and 4.5 GA?
There will be some differences between the two releases. We're working hard on performance and seamless multi nodes support. Right now you can only have Couchbase FTS run on one node in a cluster. Obviously, it's important that FTS is resilient and able to handle topology changes, failovers and rebalance events. Also, APIs will almost certainly change in some small ways.
What's the best way to try out Couchbase FTS?
The beer-sample dataset that ships with Couchbase Server works well. The general rule about Couchbase developer previews not being performance tuned applies here – this is a chance to try out the functionality and API, but running it on large amounts of data you might encounter issues. If you decide to use your own dataset, we recommend keeping the size modest, say 10,000 documents or less.
Are you still supporting the Elasticsearch and SOLR connectors?
Yes, absolutely. We won't cover everything Elasticsearch or SOLR can bring you. These connectors will be maintained.
Is there any N1QL integration plans?
Yes, although we need to get to GA quality first.
What about a Kibana integration?
When people hear about Fulltext these days they think of Elastic and especially their mighty dashboard Kibana. That said, we don't have any plans to build an integration. Then again, it's open source software, so maybe someone will surprise us…