Couchbase Functions is being introduced in the Couchbase Server 5.5 release under the Couchbase Eventing Service. Couchbase Functions gives you the ability to move data-driven business logic closer to your data. User-defined business logic can be triggered in real-time on the server when data changes as a result of the interactions occurring on web and edge applications. When compute resides closer to data, it is important to understand how compute behaves when the data, that it is listening to, changes. We will try to understand how the Eventing Service reacts to the ordering of the mutations.

Let’s get started and understand the behaviour with a simple example. Let’s create a test Function with the following code and assume the default settings(i.e, 3 workers) for the Function.

Note: Choose ‘Everything’ in the Feed Boundary in all of the below operations during Deployment of the Function.

In the Source Bucket to which this Function listens to, let’s insert around 10 documents with  increasing numerical doc IDs. In the application log file for this Function, you will see something similar to the following entries.

 Try undeploying and deploying the above Function and we observe that ordering of the IDs(/changes) is not the same. You may repeat this step a few times to reinforce this observation.

Take-Away#1 : Function does not process the documents in the order in which they were inserted.

Now, let us delete one of the documents that was inserted (in our example, I have deleted DocId#2).  We immediately observe the following entry in the log (which is correct):

Now, let us Undeploy the Function and Deploy it back again. We observe the following ordering:

We observe that:

  • “Created Doc ID:” “2”  is missing
  • “Deleted Doc ID:” “2” appears before in the order of processing and not latter.

Take-Away#2 : De-Duplication

Successive operations (/changes/mutations) to a document are coalesced (dedup’ed), when they occur in rapid succession, by the Couchbase Server so that overhead on the disk and memory is minimized. Couchbase Server sends only the latest version of a document in the DCP stream.

In the above example, this is the reason why OnUpdate handler is not triggered as Deletion of DocID was the latest in the sequence of operations on the Document; that is, when UPDATE and DELETE happen in order, then they are coalesced to DELETE, which happens to be later in the timeline. That is, when multiple Updates(or even a single Update) to a document is followed by the Deletion of the document, then only the Delete event is seen by Couchbase Functions, as the updates are coalesced into the final event – that is the document’s Deletion.

A similar behaviour will be seen, if a document undergoes multiple updates over a small window, and then a Function consumes the changes; only the latest change happening to the document will be seen and the intermediate changes will be lost. This is the case only when a new Function is deployed on an existing bucket with many changes happening to a bucket.

If the Function is deployed and changes happen to a document, then each change will be handled by the Function. But, if the number of changes happening to a document is very high in a small time interval, Couchbase Server still does some amount of Deduplication; and this might lead to not all changes triggering the Function.

That is, if 10 documents were Inserted and one of them Deleted, then when a Function is deployed, it is not guaranteed that the Delete operation(on the Inserted and then the Deleted Document) will be seen at the end by the Function.

Take-Away#3 : Function does not process the mutations in the order they were done.

So, what is going on in here? In Part-2 of this blog series, we will dive into under the hood of the Couchbase Eventing Service and understand how Eventing Workers process the mutations.

Author

Posted by Venkat Subramanian, Product Manager

Venkat dabbles in product development and product management and has been developing data/analytics platforms & products. Significant chunk of his experience has been with Oracle, where he transitioned from being an Engineer in Oracle’s Enterprise Manager team to Product Manager for Oracle's BI/Analytics suite of products. He has worked in startups in the past helping develop machine-learning/NLP products and distributed decisioning systems. He lurks around at @venkasub.

Leave a reply