Let’s do a little thought experiment.

Yeah, I know, thinking. Who wants to do that?

Wait!

Before you tune me out and browse on to the next post on sexy indexes…

At least give me a couple of minutes.

Let’s say you have a website.

Well, not just any website…

That’s a little too generic.

OK, let’s say you’ve got a travel website.

Someplace where people come to make airline reservations.

I think we’ve all used one of those at some time.

So, your users come to your site, and want to see what flights are available.

What’s the first thing they do?

Do they write a question out in long-hand?

Not these days.

They probably start with selecting where they want to leave from.

Their local airport.

And then they probably want to select where they want to go.

So, two choices, both airports.

You could make them guess what airports are around.

I mean, I usually just enter the town that I need to go to and let the website figure out what airport I need to fly to.

But let’s assume that your users already know the airports at both ends of their trip.

Makes it easy for our little thought experiment.

So, you need to present the user with a list of airports to choose from.

Yeah, it’ll be a long list.

Seems there’s a lot of airports scattered around this big-ole world.

Just loading up our travel-sample bucket gives us almost 2 thousand.

Man, that’s a lot of airports!

Someone did a lot of data-entry…

But the good thing about airports is that don’t change that often.

I mean, yeah there’s new ones being built…

And old ones being left for ruin…

But that all happens over time.

Usually if one airport is abandoned, it’s often because a newer, shinier one got built.

And it takes a long time to build a new airport.

It’s not like they’re throwing them up every day.

So, back to our list of airports…

Long or not, you’ll have to provide some form of list of airports for the user to choose from.

And if your website if very busy, there could be a lot of users.

And we all want our websites to be busy.

So let’s just go ahead and assume that our website not just busy…

It’s very busy.

Millions of users every day.

Thousands of users every minute.

That’s a lot of times that you’re having to serve up that list of airports!

So, let’s start by assuming that your airport documents in your Couchbase bucket are structured like the ones in our travel-sample bucket.

Hey, it comes with our Couchbase Server product, may as well use it!

Makes things easy…

So, just listing the airports using a simple N1QL query:

Gives us this:

Hmm, not going to be easy finding what our users need in this. Maybe if we sort it on the FAA airport code, and then eliminate those where the code is null…

That’s better, but it’s more data than we need to be providing to the website.

So, let’s reduce what we’re returning to the FAA code, airport name, city, and country:

Ok, now we’re getting down to what we’re looking for.

So, if we query this we’re getting , oh, let’s say about a 50-60ms response time.

Not bad.

But with thousands of requests for this list every minute…

Hmm, maybe we can speed things up a bit.

Let’s make it a covered query by adding our own index that includes everything we need.

And now we re-run the query and get a response time in around 17.5 ms.

Much better.

But is it possible to do even better than this?

I mean, this list will be requested thousands of times every minute.

Those milliseconds will add up.

So, what if we took the results of this query, and saved it as a single document?

Let’s call it “airport_list”.

So now, if we run a query selecting the whole document with the “USE KEYS” clause:

This is giving us a response time around 14.5 ms.

Hmm, saved another 3 whole milliseconds!

And we might save another half-millisecond or two if we use the key-value access and get the document by its ID directly from the data service.

For a document that needs to be served thousands of times a minute.

Millions of times a day.

Those milliseconds will add up.

Yeah, I know. Airports change from time to time.

Yeah, but they don’t change very often.

Yes, this one document will need to be replaced every so often.

But that’s an operation that isn’t serving a high-activity website.

So who care’s how slow (comparatively) that process may be.

Plus, I no longer need my covering index!

I can save a little bit of space on my index server!

Woo-hoo! Bonus!

Yeah, I know. I get excited about some odd things…

OK, so that was an exercise in shaving milliseconds off our response time. What about a query that takes a bit longer and does more?

Let’s say you run a call-center, and it’s important to keep track of how quickly your team is picking up incoming calls…

OK, let’s get a little more specific.

Let’s say you want to have a dashboard showing how many calls have been answered within five seconds, ten seconds, and the total number of calls that have come in today.

Something like…

So, you start with an index on the startTime and callType properties, limiting it to documents of type “cdr”, only to find it takes about a second to run this query.

And this isn’t the only query you want to use to populate your dashboard…

Ugh, this is going to be as slow as molasses!

OK, so let’s build a new index with all of the properties in it, making this a covered query, only to find that, while it’s improved, it still takes around 100 milliseconds.

Hey, that’s a 10X improvement! That’s great, isn’t it?

Only your dashboard still refreshes like it’s running in molasses.

Thin, watery molasses, but still…

Hmm, what can we do to improve this?

What if, instead of using this query to feed the dashboard, we take the output and use it to create a new document with just the results?

Something with a known name, like call_stats_<some date>…

And we can run this query on a timer, using a cron job, or trigger it using the Couchbase Eventing service.

Only if we trigger it from the Eventing service, we probably want to run it with a scan consistency of at least at_plus to include the document update you are using to trigger the query.

But now, when we retrieve the result document, we’re achieving response times in the low single-digit milliseconds, so close to a 1000X improvement in performance!

And now we’ve got a responsive dashboard!

WOO-HOO!!!

Now we’re talking turbo-booster speed!

So, what is the lesson from both of these two scenarios?

Well, by taking any processing we needed to do on the data and making them background tasks, so that our interactive data requests involve no processing, we’ve made things very speedy…

We’re talking faster than a speeding bullet fast!

Excuse us Superman, we’re coming through…

So, was that thought experiment really that painful?

Now on to those sexy indexes…

Couchbase, empowering data nerds everywhere…

(Hey Peter, I think I’ve got our new slogan here!)

Author

Posted by Davis Chapman

Davis Chapman calls himself a Solution Architect, claims to be employed by Couchbase, and is supposedly part of our Professional Services team. He says that he’s been in the industry for decades, and has been involved in application development for most of that time. Hmm, we'll have to check on that...

Leave a reply