One of the more impressive aspects of using N1QL is the ability to use prepared statements. What are prepared statements? Why use them? What's the big deal? Consider the following example from the travel-sample bucket and dataset that are bundled with every installation of couchbase: Consider users who create accounts, log in, and search for flights. We might want a quick way to identify users who've used the system, but have never booked a flight. This is a common ad targeting use case, and creates an opportunity for a touchpoint. A simplified query might look like:
SELECT name FROM
travel-sample WHERE _type='User' AND ARRAY_LENGTH(flights)=0;
Lets go over the steps required to run this query from our application. We'll use nodejs for our example, and all of the SDKs using the 2.0 API will function in an identical manner.
- First, we have to create the N1QL query from a string.
- Second, we serialize the request and send it to CBQ-ENGINE on the couchbase server node using HTTP REST.
- Third, CBQ-ENGINE must parse the query.
- Fourth, CBQ-ENGINE's Optimizer must determine an execution plan.
- Fifth, CBQ-ENGINE must create the execution plan.
- Sixth, CBQ-ENGINE executes the plan.
This is a substantial amount of work to issue a simple query, especially for queries that are repeated multiple times. There are instances in your application where the same query can be executed thousands of times in an hour. How do we eliminate that overhead of parsing/analyzing/creating a plan?
Prepare to speed up!
We can eliminate the overhead by preparing the statement for execution. We use one additional parameter in our N1QL Query, adhoc=false.
What does that mean, exactly? It means that the SDK will pass a query to CBQ-ENGINE, telling the engine the first time the query is issued that “I want you to generate an execution plan, that I can re-use, and return it to me in a compressed format.” After that first time, whenever the application calls that particular query, the SDK will pass the compressed plan to CBQ-ENGINE, eliminating all of the parsing/analyzing/creating steps and just going straight to execution
So how does it work? When a query is passed in and the adhoc flag is set to false, the SDK will store the local compressed query plan in the cache for re-use in the future; when the query is issued again, the SDK will use the local compressed plan from the cache. The SDK makes use of an LRU (least recently used) cache to conserve memory: the SDK will begin removing the oldest entries when the cache reaches 5000 queries in size.
All of this is managed internally within the SDK. As a user, the only thing you need to do is set the adhoc property to false.
An Ounce of Prevention
Some best practices to consider when using prepared statements
- Currently, prepared statements are SDK instance specific. The local LRU cache of prepared statements is specific to each instance of the SDK the application is using. There is currently no shared cache of statements between clients.
- If your application is extremely adhoc in nature, prepared statements are a bad choice for these types of queries. The overhead of preparing negates any performance gain realized by being able to repeatedly use the same execution plan.
- Prepared statements are execution plans frozen in time. Topology changes that move, alter, or delete indexes will cause the current execution plan to invalidate. The SDKs have built in retry logic to handle these corner cases
- If your intend on issuing the same query repeatedly with different parameters, use parameterized queries rather than embedding the parameter values within the query string. This allows you to use the same plan with different parameters.