SQL++ queries can access data stored in your Couchbase cluster in a variety of ways. There are situations where having the business logic as part of your data queries can also be beneficial. SQL++ supports this with User Defined Functions (UDFs) that have been available since Couchbase 7.0

In this blog post, we create a UDF in JavaScript that queries points from a user’s location, dynamically, using SQL++. Additionally, we perform the same distance calculations with a UDF in Python within the Analytics service.

Our geospatial query use case

Our application will generate geographic points of interest from our database that are near a user’s GPS location, similar to services like Google Maps, shown in the screenshot below. For this example, we will use the travel-sample dataset that is available in a sample bucket provided by Couchbase.

In particular, we are interested in seeing the landmarks and airports around the user’s current location. This cannot be achieved using a SQL++ query directly as the distance calculation is based on the real-time geographic location of the user. SQL++ supports defining UDFs in JavaScript to perform custom logic in the queries.

Map shwoing searhc for local points of interest

Calculating distances from GPS coordinates

There are many ways to calculate the distance between two sets of GPS coordinates. In this example, we will calculate the distance using the Haversine Formula. It gives the approximate distance between two GPS coordinates by considering the path to be a sphere rather than a straight line distance.

JavaScript code to calculate geographic distances is shown in this sample:

We define two JavaScript functions–one that performs the conversion between degrees and radians and another that calculates the distance in kilometers between the GPS coordinates from the other function.

Importing UDFs into Couchbase

These JavaScript functions can now be imported into Couchbase using the REST API, as shown below with the curl command:

After this step, the UDF can be defined in the web console’s Query Editor:

Here, the math reference is to the JavaScript library that we created to evaluate the UDF.

The UDF can now be tested with sample GPS coordinates in SQL++ using the Execute Function as shown below

We can observe that the function works as intended when we provide the GPS coordinates manually to the function.

Connecting the UDF to Couchbase data

In the travel-sample dataset, we have GPS coordinates of the landmarks and airports along with other places of interest like hotels.

We can integrate them into our queries like the one below:

This query returns a list of the ten closest airports to the user’s location (51.509865, -0.118092). We provide the latitude (a.geo.lat) and longitude (a.geo.lon) fields that are embedded in the documents using the power of SQL++.

Similarly, we can calculate the points of interest around the user using the landmark collection in increasing distance from the user:

Results of the query showing nearby landmarks:

If we run these queries for multiple users at the same time, we might run into performance issues as we are using the compute resources that are part of the Couchbase cluster. 

In such scenarios, Couchbase Analytics could reduce the impact on your cluster. Couchbase Analytics is designed to efficiently run complex queries over many records. Complex queries can include large ad hoc join, set, aggregation, and grouping operations–any of which may result in long-running queries, high CPU usage, high memory consumption, or excessive network latency due to data fetching and cross node coordination.

User Defined Functions with Couchbase Analytics

Couchbase Analytics allows us to define and use User Defined Functions in Python but at the time of writing requires you to enable Developer Preview mode in Couchbase Server. This can be done with the following command:

The next step is to create a Python package, locally, on the development environment with the Python UDF. In this case, the UDF is a method to calculate the distance between two GPS coordinates.

Here, we calculate the geodesic distance (the shortest distance between points along a curved path) between the two GPS coordinates with the help of a library, geopy.

Packaging the UDF

To package the library, we use a package shiv that can package the code along with its requirements for any platform. Here, we are using Linux as Couchbase Server is running on a Linux environment within Docker.

In order to upload this binary package with the UDF to Couchbase, we need to use the REST API for the Analytics service. 

This uploads the packaged UDF into the pylib library in the default scope (formerly dataverse) of the Analytics environment. We can now define the UDF in the Analytics Workbench.

This definition statement indicates that we are defining a UDF named distance_in_km that can be called from the Python function calculate_distance defined in class distance_calculation within the Python module distance.

Now we can use the UDF in our Analytics queries the same way we used the UDF in our SQL++ queries.

Mapping Data from Data Service in Analytics Service

In order to query the data in the Data Service from the Analytics Service, we need to map the travel-sample data collections in Analytics that creates a real-time shadow copy of the data in the Data Service in Analytics. For this example, we need to map the collections with geographical data, namely the landmark, airport and hotel collections from the inventory scope in the travel-sample bucket.

Running the Analytics UDF against Couchbase Data

Now, we can run the same queries that we were running before in SQL++ but with the Analytics Service. Only the UDF name has changed. The rest of the interface is similar to what we had with the SQL++ queries. The results will also be similar to the earlier results.

This approach is suitable when we want to run these queries without impacting the Data Service which is commonly used for our transactional data. The data is synced between the Data and the Analytics Service internally in real-time in Couchbase.

Summary

In this blog post, you’ve learned to create a User Defined Function (UDF) in JavaScript that  calculates distances between two GPS coordinates. You saw how to import the UDF into Couchbase and then integrate it into a SQL++ query to power applications that could provide points of interest around a user. We also showcased how you can perform the same distance calculation in a Python-based UDF using the Analytics service to reduce impact on your transactional Couchbase cluster.

For further reading and reference, see the following resources:

Author

Posted by Nithish Raghunandanan

Nithish is an engineer who loves to build products that solve real-world problems in short spans of time. He has experienced different areas of the industry having worked in diverse companies in Germany and India. Apart from work, he likes to travel and interact and engage with the tech community through Meetups & Hackathons. In his free time, he likes to try stuff out by hacking things together.

Leave a reply