This brief intro to several geospatial data concepts covers spatial databases, spatial indexing, and using GeoJSON data in NoSQL.

What is Spatial Data?

Spatial data are data types (files, databases, web services) that encode geographic information for use in location-aware applications.

When writing a book on web-based mapping 15 years ago, my readers were forced to learn a stack of mostly new technology. Geographers had to learn the tech and developers had to learn the domain. That included web servers, SQL applications, and a bit of PHP. But the big “new thing” was using spatial data. Although more prevalent today, application developers still need to understand how to work with this domain-specific spatial data type.

What is GIS?

Cartography is as old as civilization itself. From crossing oceans to planning cities, information was collected and transcribed onto paper.*

Just as we digitized text and numbers, cartographers started putting point, line, and polygon data into digital systems.

This created the Geographic Information Systems (GIS) domain. Spatial data could be drawn as a graphical layer on a map but also analyzed geographic characteristics and location relationships. Forest management, land use planning, transportation, surveying, and more, all took advantage of these systems.

Geospatial Blueprint view of the lands of the Leland Stanford Jr. University, City of Palo Alto and Estate of Timothy Hopkins.

1900 (est.) Map of Stanford University by C. O. Taylor, from DavidRumsey.com item 1345400.

These are vector data types, because they are built from points of data (think x, y, z coordinates) and a direction that connects them. String several points together into a line or close the loop and make a polygon area.

Raster data, on the other hand, is made of images or grids of digital values in the form of pixels – e.g., satellite imagery, elevation models, etc. This type of data is out of scope for this article.

* Bay Area folks are encouraged to check out ancient maps at the David Rumsey Map Collection online housed at Stanford with physical assets

Spatial was Special

Relational database management systems had spatial object support but were largely used by spatial analysts. These special types of vector data and databases were somewhat esoteric for application and web developers.

You had to have special knowledge to use special data from special GIS software. The data had to then be loaded into a spatial database for further analysis and application connectivity.

Of course, once Google Maps came out, people started to become more aware of spatial data and the trend moved toward third-party map data integration, but that’s another story.

Geographers typically used ESRI’s desktop mapping software to create (static) map images and PDFs. (More recently the free and open source QGIS.org project has taken the world by storm.) They also could export that data to be used in an entirely different database. Web developers could import that data into a spatial database and then query it with a location-aware web-mapping application layer to generate images for their users.

Map of vector data showing a normalized population distribution in QGIS

Map of virus and population distribution by QGIS author Kurt Menke.

What is a Spatial Database?

A spatial database — also known as a “geospatial database” — is built to capture and store the points, lines, and areas of cartographic information that we refer to as spatial data. Often these databases were pretty ordinary but had extensions to handle binary objects (BLOBs) in one of their fields – SQL Server, Oracle, PostgreSQL, Ingres, SQLite all have spatial addons.

Issues grew more complex as data volumes grew and workflows morphed. Instead of just producing and sharing data, developers were being pressured to produce more information and knowledge from the data instead. There was also a drive for interoperability and efficiency.

This presented new opportunities to create standards for web-optimized geospatial services – for sharing images of maps or raw tabular data to be used in another location-aware application. While ESRI’s file-based formats (shapefiles, geodatabase) were popular for desktop users, many web developers wanted data in an enterprise database like PostgreSQL which has been the leading open source spatial database for years.

Once the data was in a database, developers were able to take more control over the workflow. Likewise, an increasing number of spatial analysts started to use the spatial functionality of the database systems.

What is GeoJSON?

As even more technical standards evolved for spatial data on the web, developers used JSON (JavaScript Object Notation) format with geographic encoding standards built-in into it. GeoJSON was born and could hold fields of attribute data alongside coordinate pairs of points to define vector points, lines, and polygons. Based on JSON which allows multiple embedded objects, lists, and key/value mappings – it was well suited for geographic data.

Here is a sample of a basic line with two fields of attribute data:

Developers know that JSON is one way of getting data out of a flat file or database from their geographer colleagues. When requested, a spatial DB converts from a tabular format into JSON before sending it over to the consuming application.

NoSQL Alignment

The nice thing with NoSQL databases is they were built to use JSON directly. With Couchbase, which I am most familiar with, the flexible schema is also a bonus—not requiring schema validation of a strict tabular structure as part of an application. This makes it easier to adjust as needed, especially during prototyping phases where data types and schemas may be changing.

Not all spatial data is in a tabular form, nor would it make sense to be, so why force it? For example, a user profile for an application may want to have hierarchical data where an address is an optional object in the NoSQL data model. Or perhaps there might be an option to have several phone numbers, but not everyone may have one. In these cases, you don’t want to maintain empty columns in a table just in case.

When you start thinking about spatial data as just another object with a list of coordinates, you’ll see how well it aligns with NoSQL databases.

The distributed computing nature of NoSQL always adds luxury to using them for geospatial applications because they are built to handle tough workloads. By using cluster-based computing, spatial data can grow over time and more query resources can be readily added as needed.

These are just a few reasons why enterprises choose NoSQL data platforms.

Geospatial Database Functions

Drawing Maps

The most common function for developers accessing spatial data is to make a map – whether online, in a mobile app, or on a desktop. This can take a few forms but is mainly a request for a specific set of records of documents that the application then renders for the user.

Rendering maps is a domain in its own right, but most online APIs that help do this (e.g. MapBox, Leaflet) have some basic defaults that get users up and running quickly. To be honest, most online maps just show some point markers on top of an image. So this is one use case for map data but is not really what GIS users would choose when performing analysis.

Beyond just requesting a set of documents, a spatial database needs to allow filtering for specific features/documents based on a query. Naturally, the use of SQL languages that give a WHERE clause helps make this possible. There are more advanced geographic ways of filtering data – e.g., for a specific region/area – we’ll cover that as well.

Spatial Indexing

To do more advanced spatial filtering to draw maps, or to even do what is called spatial joins (joining records based on their proximate location, like a store within a region), you need to have the geographic data indexed. Just like you would index other fields in a database, there are special indexing methods for geospatial field types.

Spatial databases typically just compute a rectangle around each of the features in a dataset and use that as a rough index for queries. This is also known as a minimum bounding rectangle (MBR) and a type of R-Tree index is used too.

spatial indexes shown in QGIS map

Spatial index bounding boxes shown in QGIS

Querying data then uses these indexes to find the spatial features that overlap or are within a distance of another location. They use the MBR to see how close the features may be to one another so it can ignore ones that are not close enough to matter.

Couchbase builds spatial indexes using its full-text search service. By specifying which field contains geographic data, the index will know what kind of index to build. Subsequent search requests with a geospatial component will use those indexes, though the user never needs to see them or interact with them directly.

Configuring a geopoint type in the Couchbase search web GUI. That field contains GeoJSON point data.

Configuring a geopoint type in the Couchbase search web GUI. The ‘geo’ field contains GeoJSON point data. Click for the blog showing this being used.

Spatial Query Language

Mapping applications are great at hiding more complex spatial queries being used behind the scenes. For example, when a user turns a layer on or off, the application must change which queries are being run, ignoring any data it does not need. Likewise, when a user zooms into a map, the map window changes and the underlying queries then request data only for that new map window. If an application requested all data, all the time it would be unusable.

The queries themselves use functions that take a few different kinds of variables:

  • bounding box or area for the query
  • layer or table names to include
  • distance from a given point to query
  • coordinates for a polygon that defines the query area

Relational databases do this through the underlying SQL queries. NoSQL systems vary in their approaches. Couchbase, for example, can do the document request using N1QL/SQL-based language for NoSQL queries.

Then downstream applications can use the geospatial objects directly. This blog, for example, shows how the R programming language easily requests data and uses the Leaflet mapping package to draw results. No advanced processing was needed, just a basic N1QL/SQL request.

But when you want more geospatial-related queries, Couchbase uses a set of geospatial search requests (from the full-text search service) using a radial distance from a point, within a rectangular area, and within a provided polygon shape. This allows extraction of just the documents/records of interest – no additional spatial libraries or tools needed.

Advanced Spatial Operations

But queries are only half the battle! Full-fledged GIS applications and spatial databases have many more data generation functions. This includes calculating the center point (centroid) of polygons, boolean operations between multiple overlapping shapes (intersect, difference, union), and conversion to different serialization formats (binary, text, XML, JSON).

The Open Geospatial Consortium (OGC) maintains the specifications for all of these features under the title of Simple Features for SQL. There are many different spatial feature types and functions in that specification. Here is a sampling.

OGC SF SQL geospatial types and functions

OGC Simple Geospatial Features list of types and functions compared to other standards. From https://www.ogc.org/standards/sfs

Spatial joins are also very popular for connecting, say, a set of points and find the polygons they fall within. Or connect one layer of polygons to another layer of features. Then the fields from both datasets can be used in maps or reports.

All of these are very powerful and useful for analysts. However, the spatial indexes only help in parts of the queries. The more complex part is doing the computational geometry to compute new features. There is no easy way to do that, so resource management (and a good coffee maker) is really important.

Conclusion

This has been a super brief intro to many different concepts. I hope it gives you a few touchpoints for learning more about the spatial database domain. Here are some more geospatial-centric blogs and resources from Couchbase.

Download the Couchbase NoSQL database for free. Develop on a laptop, easily deploy to the cloud when ready to scale. Install and then follow along with these tutorials:

 

Author

Author

Posted by Tyler Mitchell

A guest blogger, advisor, consultant, and writer on database topics with a focus on product marketing ideas and a specialty in geospatial topics. Tyler worked as a Couchbase Product Manager (SDK, Full-Text Search) and as a Product Marketing Manager. See LinkedIn for more details on books he's written and other roles in the database ecosystem.

Leave a reply