One of the biggest challenges when working with large sets of data is to find the least costly workflow that you have to follow in order to get the most accurate answers.
Let’s say you have a huge dataset composed of all sorts of geometry features (points, lines, areas, etc.) and you want to do a bit of cleaning – because messy and redundant information is no fun!
So you might be thinking “Hmmm… which are the areas that have an unnecessary high density of points?”
The same issue can arise when working with OpenStreetMap data. This can be easily solved using PostGIS and a command-line tool that we’ve created and used.
Note: The following steps require a Linux environment, Postgresql 9.x, PostGIS 2.x, Osmosis 0.43+, QGIS 2.12.2+
Getting the data
Download a *.osm.pbf file using the command line:
In the same folder, download SCOPE – databaSe Creator Osmosis Postgis loadEr.
Make sure to set the file to be executable by using
chmod +x scope.sh
Load the data
Using SCOPE and following the instructions on the screen, load the *.osm.pbf into a database.
SCOPE automatically creates the database with hstore and PostGIS extensions and the pgsnapshot schema.
Play with the data
For example, using the find_duplicate_nodes query, we can see that this building (@20.805088941495338, -104.92877615339032), appears on the same spot 23 times!
The one next to it (@20.8054225, -104.9278152) appears 22 times!
The node density for these areas (@20.4411867, -97.3172739) is too high – 168 nodes!
Also, 171 nodes for a small fence segment (@46.7487683, 23.559687)!
Feel free to fork the GitHub repository and modify the code to suit your needs! Also, if you feel inspired, you can suggest a better and shorter name or acronym for SCOPE!