Couchbase FTS is our new Fulltext Index and Query engine. We previously introduced it last year, I invite you to read this post and the reference documentation for a better understanding of what it can do for you.

If you are still here, or maybe you are back, here’s what’s new for FTS in this Couchbase Developer Release.

Type Mapping

In the past there was only one way to declare a type mapping and that was by telling FTS which field (by default the field “type”) you want to use to distinguish document types. While using a common field as a type selector for all your documents is useful and a common practice, it’s not the only one. So we have added to other way to distinguish types between documents.

  • DOC ID up to separator: The type identifier is the prefix of the document key, up to but not including the given character.
  • DOC ID with regex: For advanced users, you can specify a regular expression that matches the type identifier.

How does this translate to our Travel Sample? All the documents key have the same pattern. Here are some examples:

airline_10, airline_10748, hotel_6445, hotel_9905, landmark_10019, landmark_9838, route_10009, route_14273, route_9807

As you can see they all start with a type, than an underscore, than a number. If we use the underscore as a separator, they all start with their type as a String, than the underscore and the number. So we can use the DOC ID up to separator option by setting the option to ‘_’. This will be equivalent if saying that the field containing the type of the document is ‘type’. Which is exactly what we use to do.

Now we can identify another pattern here. All DOC ID start with lowercase letters up to the underscore character. Their length is between 5 and 8 characters. Than we have numbers. We can translate that observation into the following regular expression: ^[a-z]{5,8}

This expression will match the beginning of the string(thanks to ^) all characters between ‘a’ and ‘z’(thanks to [a-z]) with a length going fro 5 to 8(thanks to {5,8}).
So you can use the DOC ID with regex option by setting the option to ^[a-z]{5,8}. Regex are sometimes complicated but will allow you to do more advanced filtering.

Sort

Another new feature in this release is Sort. And you are going to say that sort was already available in the previous. And you’d be right. Every document matching the search have a relevance score and results are always sorted by descending score. However you can now specify your very own Sort order. We added a sort field for the FTS query that works as follow:

"sort" : [ "country", "state", "city", "-score" ]

Here results will be sorted by country first, than state and city if country, state city are similar. Then if they are all the same same they are sorted by score descending. Prefix any sort fields with the ‘-’ character to make sort descending. You also need to make sure that all the fields in the sort array are stored in the index. This is how it looks like in a full FTS query:

{ "explain": false, "fields": [ "title" ], "highlight": {}, "sort": ["country", "state", "city", "-score", "-_id"], "query":{ "query": "beautiful pool" } }

If you are using Java, let’s say the travel-sample app for instance, it’s easy to add custom Sort. Just open the Hotel service and go to the findHotels method. It should look like this:

The query to be executed is basically this line of code: SearchQuery query = new SearchQuery("hotels", fts).limit(100);
And all you have to do is append a call to the sort method: SearchQuery query = new SearchQuery("hotels", fts).limit(100).sort("country", "state", "city", "-score")

More advanced options are also available.

Backend

Something that has changed but won’t be visible to the end user is that we have changed the FTS indexes backend. We went from using ForestDB to using Moss. moss is a simple, fast, ordered, persistable, key-val storage library for golang. it stands for “memory-oriented sorted segments”. Here’s the list of features taken from the README:

  • ordered key-val collection API
  • 100% go implementation
  • key range iterators
  • snapshots provide for isolated reads
  • atomic mutations via a batch API
  • merge operations allow for read-compute-write optimizations for write-heavy use cases (e.g., updating counters)
  • concurrent readers and writers don’t block each other
  • optional, advanced API’s to avoid extra memory copying
  • optional lower-level storage implementation, called “mossStore”, that uses an append-only design for writes and mmap() for reads, with configurable compaction policy; see: OpenStoreCollection()
  • mossStore supports navigating back through previous commit points in read-only fashion, and supports reverting to previous commit points.
  • optional persistence hooks to allow write-back caching to a lower-level storage implementation that advanced users may wish to provide (e.g., you can hook moss up to leveldb, sqlite, etc)
  • event callbacks allow the monitoring of asynchronous tasks
  • unit tests
  • fuzz tests via go-fuzz & smat (github.com/mschoch/smat); see README-smat.md

I won’t go into details as this is quite low-level. However we would like to know if you seek more informations about low-level architectural stuff like this. Would you like to see more in depth presentation of moss or any other components of Couchbase Server? If so please make your self known on twitter or on the comments below.

Author

Posted by Laurent Doguin, Developer Advocate, Couchbase

Laurent is a Paris based Developer Advocate where he focuses on helping Java developers and the French community. He writes code in Java and blog posts in Markdown. Prior to joining Couchbase he was Nuxeo’s community liaison where he devoted his time and expertise to helping the entire Nuxeo Community become more active and efficient.

Leave a reply