16665891 10211247083188500 2291386332147570222 o

Daniel Ancuta is a software engineer with several years of experience using different technologies. He’s a big fan of “The Zen of Python,” which he tries to apply not only in his code but in his private life as well. You can find him on Twitter: @daniel_ancuta

Geospatial queries: Using Python to search cities

Geolocation information is used every day in almost every aspect of our interaction with computers. Either it’s a website that wants to send us personalized notifications based on location, maps that show us the shortest possible route, or just tasks running in the background that checks the places we’ve visited.

Today, I’d like to introduce you to geospatial queries that are used in Couchbase. Geospatial queries allow you to search documents based on their geographical location.

Together, we will write a tool in Python that uses geospatial queries with Couchbase REST API and Couchbase Full Text Search, which will help us in searching a database of cities.

Prerequisites

Dependencies

In this article I used Couchbase Enterprise Edition 5.1.0 build 5552 and Python 3.6.4.

To run snippets from this article you should install Couchbase 2.3 (I am using 2.3.4) via pip.

Couchbase

  1. Create a cities bucket
  2. Create a cities search with geo field of type geopoint type. You can read about it in the Inserting a Child Field part of the documentation.

It should look like the image below:

image 1

Populating Couchbase with data

First of all, we need to have data for our exercise. For that, we will use a database of cities from geonames.org.

GeoNames contains two main databases: list of cities and list of postal codes.

All are grouped by country with corresponding information like name, coordinates, population, time zone, country code, and so on. Both are in CSV format.

For the purpose of this exercise, we will use the list of cities. I’ve used PL.zip but feel free to choose whichever you prefer from the list of cities.

Data model

City class will be our representation of a single city that we will use across the whole application. By encapsulating it in a model, we unify the API and don’t need to rely on third-party data sources (e.g., CSV file) which might change.

Most of our snippets are located (until said otherwise) in the core.py file. So just remember to update it (especially when adding new imports) and not override the whole content.

 

CSV iterator to process cities

As we have a model class, it’s time to prepare an iterator that will help us to read the cities from the CSV file.

Insert cities to Couchbase bucket

We have unified the way to represent a city, and we have an iterator that would read those from csv file.

It’s time to put this data into our main data source, Couchbase.

 

To check if everything we wrote so far is working, let’s load CSV content into Couchbase.

 

At this point you should have cities loaded into your Couchbase bucket. The time it takes depends on the country you have chosen.

Search cities

We have our bucket ready with data, so it’s time to come back to CitiesService and prepare a few methods that would help us in searching cities.

But before we start, we need to modify the City class slightly, by adding the following method:

That’s a list of methods we will implement in CitiesService:

  • get_by_name(name, limit=10), returns cities by their names
  • get_by_coordinates(lat, lon), returns city by coordinates
  • get_nearest_to_city(city, distance=’10’, unit=’km’, limit=10), returns nearest city

get_by_name

 

get_by_coordinates

 

get_nearest_to_city

As you might notice in this example, we used RawQuery and SortRaw classes. Sadly, couchbase-python-client API does not work correctly with the newest Couchbase and geo searches.

Call methods

As we now have all methods ready, we can call it!

Where to go from here?

I believe this introduction will enable you to work on something more advanced.

There are a few things that you could do:

  • Maybe use a CLI tool or REST API to serve this data?Improve the way we load data, because it might not be super performant if we want to load ALL cities from ALL countries.

You can find the whole code of core.py in github gist.

If you have any questions, don’t hesitate to tweet me @daniel_ancuta.

 

This post is part of the Community Writing program

Posted by Laura Czajkowski, Developer Community Manager, Couchbase

Laura Czajkowski is the Snr. Developer Community Manager at Couchbase overseeing the community, our incentive programs, Experts and Champions group, meetups, and defining our presence at developer events. She’s also responsible for our monthly developer newsletter and engaging with our community in various forms.Laura has been active in Open Source communities since 2000 and has been involved in various activities, including leading and organising conferences on software testing, documentation, and advocacy. Laura is an Open Source advocate and regular conference speaker who is passionate about getting people – everyone from primary school students to technology professionals – involved in Open Source communities both on IRC and in face-to-face discussions, she is easily found online at @czajkowski on twitter and on freenode.

Leave a reply