The N1QL Query Language Now Has a Per-Request Memory Quota

Unlike other services, the SQL++ query language so far hasn’t had the option to adjust its memory footprint. Until now.

With the release of Couchbase Server 7.0, the Query Service now includes a per-request memory quota.

Background

The principal reason for this disparity between SQL++ (formerly N1QL) and other Couchbase services boils down to one simple fact: while the bulk of the memory consumption for services like Data or Index is caches – and whenever new documents come in – there is always something that can be evicted should space be in high demand. The bulk of query service operations rely on transient values (either fetched documents or computed values) which start their life in one stage of an individual request and then expire before the request ends.

In SQL++ there’s nothing to evict and replace. There’s no balancing act to keep the clock ticking. If resources are not available, the only option is failure.

Furthermore, parts of the Eventing, Index and Full-Text Search code run inside the query service. They use SQL++ memory resources, but SQL++ has no control over them.

While SQL++ memory consumption is not an issue in general – requests quickly load and discard documents and the world is a happy place – from time to time, an odd greedy request comes along and spoils the game for everyone. This issue needed to be fixed for Couchbase users.

But let’s disregard (for the moment) the components that SQL++ has no control over, and let’s consider if a node-wide transient value pool would even be desirable.

The operation of such a value pool might work something like this: Whenever a request needs a value, it allocates the corresponding size from the global pool, and as soon as it has finished with it, it returns it to the pool. When memory runs out, all allocations fail until enough memory is freed.

But what happens when a greedy request shows up? It grabs as much as it can and doesn’t let go. And all the other frugal requests? With no evictions possible, the only option is failure. The other requests end one by one – in error and with an error – until the culprit finally fails.

This behaviour is akin to the teacher sending the whole class to the principal’s office after being hit with some chalk, rather than investigating and sending just the culprit.

The query service has now grown eyes on the back of its head and can see who threw the chalk.

Enter the Per-Request Memory Quota

When the per-request memory quota is turned on, each query request gets its own pool. Memory tracking operates as usual, but now when the pool is exhausted, it’s only the culprit that fails.

“But a node-wide setting would be much more practical!” I hear you say. And you’re right.

We’ve implemented one in stealth: the query service allows a fixed number of requests running at any one time. This option is controlled by the servicers setting, and it defaults to four times the number of cores on the query node. The overall node memory quota amounts to the number of servicers times the per-request quota.

The two quotas are intimately intertwined: the per-request quota is explicit because we wanted to be clear that it’s individual requests that are being tracked, not the node in its entirety.

How Do I Use It?

There’s two settings for the per-request memory quota in SQL++:

- The /admin/settings node REST parameter memory-quota
- The /query/service request REST parameter memory_quota

These settings express in megabytes the maximum amount of memory a request can use at any one time.

The default memory-quota is zero, i.e., the memory quota is turned off. The memory_quota setting always overrides the node-wide setting as long as the requested value does not exceed it.

Let’s look at a few examples. This command below sets your memory quota to 10MB for the whole node and replicates the setting to all your other nodes:

curl http://localhost:8093/admin/settings -u Administrator:password -H "content-type: application/json" -d '{"memory-quota": 10, "distribute": true}'

1	curl http://localhost:8093/admin/settings -u Administrator:password -H "content-type: application/json" -d '{"memory-quota": 10, "distribute": true}'

And this command sets your memory quota to 10MB for the single request, as you can see below:

curl http://localhost:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=true'

1	curl http://localhost:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=true'

Finally, this one below sets your memory quota to 10MB for the duration of the cbq session:

cbq&gt; \set -memory_quota 10;

1	cbq> \set -memory_quota 10;

Responses and System Keyspaces

If your memory quota is set (by whatever means), then several SQL++ features may contain additional information, including:

- Metrics
- Controls
- System keyspaces

Let’s take a closer look at each of these responses.

Responses: Metrics

The metrics section of the response contains a usedMemory field showing the amount of document memory used to execute the request.

   "metrics": {
       "elapsedTime": "1.651329ms",
       "executionTime": "1.439388ms",
       "resultCount": 1,
       "resultSize": 153,
       "serviceLoad": 2,
       "usedMemory": 63
   }

"metrics": {

"elapsedTime": "1.651329ms",

"executionTime": "1.439388ms",

"resultCount": 1,

"resultSize": 153,

"serviceLoad": 2,

"usedMemory": 63

}

If no document memory is used, this metric is omitted. The same omission occurs with mutations or errorCount as well.

Responses: Controls

The controls section of the response also reports the memory quota according to your settings. Here’s what it looks like:

   "controls": {
       "scan_consistency": "unbounded",
       "use_cbo": "true",
       "memoryQuota": "10",
       "stmtType": "SELECT"
   }

"controls": {

"scan_consistency": "unbounded",

"use_cbo": "true",

"memoryQuota": "10",

"stmtType": "SELECT"

}

System Keyspaces

System keyspaces – both system:active_requests and system:completed_requests – also contain memory quota information. The usedMemory and memoryQuota information shows up here as well. Take a look at the example below:

       "active_requests": {
           "elapsedTime": "2.604037201s",
           "executionTime": "2.603863968s",
           "memoryQuota": 10,
           "node": "127.0.0.1:8091",
           "phaseCounts": {
               "fetch": 15936,
                "primaryScan": 16362
            },
           "phaseOperators": {
               "authorize": 1,
               "fetch": 1,
               "primaryScan": 1
           },
           "phaseTimes": {
               "authorize": "721.012µs",
               "fetch": "588.220088ms",
               "instantiate": "11.547µs",
               "parse": "1.317113ms",
               "plan": "167.599824ms",
               "primaryScan": "29.589176ms"
           },
           "remoteAddr": "127.0.0.1:55084",
           "requestId": "da0d5ed0-a3d9-4ad2-86af-ae483fdadbce",
           "requestTime": "2020-11-19T15:45:27.368Z",
           "scanConsistency": "unbounded",
           "state": "running",
           "statement": "select * from `travel-sample`",
           "useCBO": true,
           "usedMemory": 3187392,
           "userAgent": "curl/7.58.0",
           "users": "Administrator"
       }

"active_requests": {

"elapsedTime": "2.604037201s",

"executionTime": "2.603863968s",

"memoryQuota": 10,

"node": "127.0.0.1:8091",

"phaseCounts": {

"fetch": 15936,

"primaryScan": 16362

"phaseOperators": {

"authorize": 1,

"fetch": 1,

"primaryScan": 1

"phaseTimes": {

"authorize": "721.012µs",

"fetch": "588.220088ms",

"instantiate": "11.547µs",

"parse": "1.317113ms",

"plan": "167.599824ms",

"primaryScan": "29.589176ms"

"remoteAddr": "127.0.0.1:55084",

"requestId": "da0d5ed0-a3d9-4ad2-86af-ae483fdadbce",

"requestTime": "2020-11-19T15:45:27.368Z",

"scanConsistency": "unbounded",

"state": "running",

"statement": "select * from `travel-sample`",

"useCBO": true,

"usedMemory": 3187392,

"userAgent": "curl/7.58.0",

"users": "Administrator"

}

How Is Memory Used, Anyway?

Before we delve into some of the mechanics of the memory quota operation, we should probably learn a little bit about how a request uses memory.

As you have probably already guessed, the usedMemory metrics field has been introduced to gauge the memory requirements of an individual statement before it’s allowed to run. Let’s do a couple of experiments and see how it behaves, starting with this one:

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=true'
...
    "status": "success",
    "metrics": {
        "elapsedTime": "6.353013176s",
        "executionTime": "6.352882377s",
        "resultCount": 31591,
        "resultSize": 95381286,
        "serviceLoad": 2,
        "usedMemory": 3264905
    }
}

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=true'

...

"status": "success",

"metrics": {

"elapsedTime": "6.353013176s",

"executionTime": "6.352882377s",

"resultCount": 31591,

"resultSize": 95381286,

"serviceLoad": 2,

"usedMemory": 3264905

}

As you can see above, the used memory is not the size of the result set.

Let’s try again, but this time without formatting. That way, the size of the result set is as close as possible to the size of the data in storage:

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=false'
...
"status": "success",
"metrics": {"elapsedTime": "1.559014343s","executionTime": "1.558937894s","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 588514}
}

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=false'

...

"status": "success",

"metrics": {"elapsedTime": "1.559014343s","executionTime": "1.558937894s","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 588514}

}

Well, it’s not the size of the data fetched either.

Let’s try to remove the cost of displaying the results to the screen:

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&amp;memory_quota=10&amp;pretty=false'&gt;/tmp/res 2&gt;&amp;-; tail /tmp/res
...
"status": "success",
"metrics": {"elapsedTime": "854.674127ms","executionTime": "854.575802ms","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 188626}
}

$ curl http://127.0.0.1:8093/query/service -u Administrator:password -d 'statement=select * from `travel-sample`&memory_quota=10&pretty=false'>/tmp/res 2>&-; tail /tmp/res

...

"status": "success",

"metrics": {"elapsedTime": "854.674127ms","executionTime": "854.575802ms","resultCount": 31591,"resultSize": 36754134,"serviceLoad": 2,"usedMemory": 188626}

}

Same query, same format, but different storage, and yet a different amount of memory used.

The takeaway: For some types of statements, the memory consumption is more a function of the circumstances of that particular run than the statement itself.

The Request Execution Phase Operation

The execution phase of a request employs a pipeline of operators that execute in parallel. Each operator receives values from the previous stage, processes those values, and then sends them onto the next.

The operator value exchange infrastructure includes a values queue so that each operator is not blocked by the previous one or the next one. (In fact, the execution engine is more complicated. Some operators are inlined into others and some only exist to carry out orchestration work, so value queues are not always involved, but still.).

For example, a simple query like this:

SELECT * FROM bucket WHERE field = constant

1	SELECT * FROM bucket WHERE field = constant

uses an Index scan to produce keys, which are sent to a Fetch to retrieve documents from the key-value cache, which are sent to a Filter to exclude documents that do not apply, and those that do are sent to a Projection to extract fields and marshal them into JSON (if necessary) and finally passed to a Stream which writes them back to the client.

Values that complete the course are eventually disposed of during garbage collection.

For the example above, if there were available cores to execute all these operators in parallel – and all operators executed at exactly the same speed – there would never be more than five documents traversing the pipeline at any one time, even though the request might process any number of documents.

Of course, a Scan might produce keys much faster that a Fetch could gather documents and marshalling could be expensive. Sending results over the wire back to the client might be slow, so even if there are cores available, the queues described above will be used as buffers for values waiting to be processed along the line. In turn, this temporarily increases the amount of memory a request needs to process the sequence of incoming values.

This pattern explains why both making the Projection more efficient (pretty=false), or Stream (sending to a file rather than the terminal) has a beneficial effect on memory consumption: faster operators mean fewer values stuck in the value exchange queues.

With request load increasing, the SQL++ kernel has more operators to schedule, meaning that while they are not run, the value queue for the previous operator increases in size, meaning even more memory is required to process individual requests. All told, loaded nodes use more memory than those with little activity.

The Memory Quota Operation

For the purpose of the previous discussion, I have ignored all those cases in which memory grows without values being exchanged: hash JOINs, ORDER BYs and GROUP BYs being a few examples that spring to mind.

Those particular cases are handled by the first mode of operation of Memory Quota: sort, aggregate or hash buffer grows beyond a specific threshold, memory quota throws and error, and the request fails.

However, as we have seen, there are a number of circumstances which cause memory consumption grow without fault on the request part.

In these cases, the Memory Quota feature employs techniques to try to control memory usage and help requests complete without needing excessive resources.

The Consumer Heartbeat Feature

A pipeline works well if both producers and consumers proceed at the same pace.

Should the producer not execute, the request just stalls. However, if the consumer doesn’t execute, not only the request stalls, but the producer’s value queue also increases in size.

To counter this possibility, the consumer operator is equipped with a heartbeat, which is monitored by the producer. When the consumer is waiting but does not attempt to receive values after a set number of successful send operations on the producer’s part, the producer will yield until the consumer manages to execute.

This approach is not an exact science, as unluckily the language used to develop SQL++ does not permit it to yield to specific operators. But it works as a cooperative effort: if enough producers yield, all consumers will have a fair shot at having kernel time, which means that memory usage should naturally decrease.

The Per-Operator Quota

Since yielding is not an exact science, individual operators might accrue a substantial memory usage even when consumers manage to run from time to time, because individual consumers still get less kernel time than their producers.

To address this disparity, SQL++ also has a per-producer memory pool. When this pool is exhausted, a producer yields (and does not fail), resuming operations when the consumer receives a value.

This yielding causes prior producers to exhaust their own pool and yield, thereby allowing the whole request to progress without consuming the entire request pool, possibly (but by no means necessarily) at the expense of throughput.

Miscellaneous Tricks

Up to this point, queries have relied on the garbage collector to return value memory to the heap, and the memory manager allocates value structures.

As part of the memory tracking effort, we have introduced techniques to mark the memory as unused before the garbage collector itself gets CPU time and manages to process all the pending unused values.

There are also small, ad hoc pools to store some unused value structures, already allocated and available for reuse, so that the garbage collector doesn’t have to be exercised over and over again for specific types of dynamic memory allocation and is instead free to process memory that matters.

Conclusion

Prior to the Couchbase Server 7.0 release, the Query Service had a history of being a bit laissez faire with per-request memory usage. Now it has a clear set of carrots and sticks for keeping memory usage under control. I hope it serves you well.

Don’t just read about it; try it out for yourself:<br/ >Download Couchbase Server 7

Marco Greco, Software Architect, Couchbase

Products

See How Capella Stacks Up

See How Capella Stacks Up

By Industry

By Need

Why NoSQL

What is NoSQL and why choose it?

Popular Docs

By Developer Role

Developer Playground

Start a Capella session

Resource Center

Education

Certification Exams 2023

Get Couchbase certified

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

The N1QL Query Language Now Has a Per-Request Memory Quota

Background

Enter the Per-Request Memory Quota

How Do I Use It?

Responses and System Keyspaces

Responses: Metrics

Responses: Controls

System Keyspaces

How Is Memory Used, Anyway?

The Request Execution Phase Operation

The Memory Quota Operation

The Consumer Heartbeat Feature

The Per-Operator Quota

Miscellaneous Tricks

Conclusion

Author

Posted by Marco Greco, Software Architect, Couchbase

Leave a reply Cancel reply