PostGIS generalization


I am working on a project that needs to display all the neighborhood polygons in Baltimore City at one time. The file is relatively detailed -

It has water removed:water removed

...and the edges of neighborhoods can get pretty complex:
detailed edges

Which mean that tons of unecessary polygon nodes are being sent from the backend, when, at the zoom level and the level of detail the map users need, the high level of detail is a total waste.

Getting a performance baseline

I am storing the data in a postgis database, and using nodejs to run the query and convert the response to geojson and send it to the web map.

When I zoom all the way out and turn on neighborhoods, the backend performs great, but I run in to problems trying to transfer the large dataset over the internet.

large file size

Damn! My network is pretty good, so it was able to send that 4.6mb geojson file from New York (where the server is) to my laptop (in Baltimore) in just under 2 seconds. That is the time it took starting with clicking the 'show' button on my computer, to the backend receiving the request, running the postgis query, formatting as geojson, and my laptop receiving the file. When you think about all that magic, 2 seconds is pretty freaking amazing, but to the end-user, it just looks like really shitty performance.

Here is what the query looks like to draw that big ass file:

'SELECT label, id, ST_AsGeoJSON(geom) AS geom '+
'FROM neighborhoodwgs84 '+
'WHERE ST_Intersects(
  geom, ST_GeometryFromText (\'POLYGON((${swY} ${swX},${neY} ${swX},${neY} ${neX},${swY} ${neX},${swY} ${swX}))\', 4326 ));'

that query, translated to human, I am selecting the label, id, and geometry fields. The actual dataset has several other fields but I don't care about them. I am converting the geom field, which is stored as a blob, to a geojson object. If I wanted to just pull all of the features in the neighborhoodswgs84 table, I wouldn't need the WHERE clause at the bottom. The WHERE clause says, basically, 'Only return those neighborhoods where the neighborhood's geom intersects this bounding box that I am passing the coordinates from on the front end as variables'. the ${variablename} syntax is just an es2015 (new javascript) way to concatenate.

Still - 4.6 megabytes is bad. It would be extra terrible on a mobile connection, too...

Generalizing on the Backend

The geo-vudu masters who make postgis came to the rescue with the ST_SimplifyPreserveTopology function. Those bad-asses allow us to simplify our geometry BEFORE converting it to geojson and sending it to the front end. One-upping themselves, they even preserve topology so the polygons we are simplifying will still fit together.

We just have to slip it inside of the SELECT clause and specify a simplification tolerance. Tolerance basically means that within the defined distance, simplify all nodes that are in that tolerance distance to a single node. The tolerance is in whatever the layers linear units are. My neighborhoods are unprojected, in the WGS84 datum, so that means the tolerance is in degrees. Weird.

Slightly modifying the SELECT clause from the above query, I set the simplification to be .1 to start and left everything else the same:

SELECT label, id, ST_AsGeoJSON( ST_SimplifyPreserveTopology( geom, .1 )) AS geom

The whole neighborhood dataset is just over 82kb! That's down from 4.6mb and took just over half a second round trip! It looks dumb as hell, but it sure is fast:
so bad

Let's try .001 tolerance...
That puts us at 116kb! That's doable. It looks significantly less awful, too.

When you zoom in it looks really dumb again...

When the user is zoomed in, after some trial and error, a tolerance of .00005 looks pretty good, and has a pretty small file size!
zoom zoom

So I guess that solves my problem for now. At specific zoom levels I will change the tolerance with javascript...

In the code - I am now passing a zoomLev parameter from the front end map - and depending on the zoom level, I set the tolerance variable:

var tolerance = '.001';
  if(zoomLev > 12){
  tolerance = '.00005';

var myQuery = `SELECT label, id, ST_AsGeoJSON(ST_SimplifyPreserveTopology(geom, ${tolerance})) AS geom `+
`FROM neighborhoodwgs84 `+
`WHERE ST_Intersects( geom, ST_GeometryFromText ( 'POLYGON((${swY} ${swX},${neY} ${swX},${neY} ${neX},${swY} ${neX},${swY} ${swX}))', 4326 ));`

Other Options

I am not really going to go there in this post, but some other excellent options would be:

  • Transfer topojson instead of geojson from the backend. That would require creating another table and pre-calculating the topojson object. Basically, topojson has one big object with all the edges of the polygons or lines in one big file. Individual features reference what they need from that single feature. It cuts way down on the amount of data that needs to get sent over the internet/file size because if two features share an edge, that edge is only stored once and referenced by both unique features instead of storing the exact same edge in two features and wasting space.
  • Finding a neighborhood that doesn't have that damn water removed. Seriously, those piers and shit greatly increase the level of complexity of those polygons and take a bunch of extra space.

To see the code from this post, check out the BaltimoreTreemap project on github. The above examples can be found in: treemap/server/app/controllers/neighborhood.controller.js

comments powered by Disqus