One of the biggest challenges when working with large sets of data is to find the least costly workflow that you have to follow in order to get the most accurate answers.
Let’s say you have a huge dataset composed of all sorts of geometry features (points, lines, areas, etc.) and you want to do a bit of cleaning – because messy and redundant information is no fun!
So you might be thinking “Hmmm… which are the areas that have an unnecessary high density of points?”
The same issue can arise when working with OpenStreetMap data. This can be easily solved using PostGIS and a command-line tool that we’ve created and used.
Note: The following steps require a Linux environment, Postgresql 9.x, PostGIS 2.x, Osmosis 0.43+, QGIS 2.12.2+
Getting the data
Download a *.osm.pbf file using the command line:
wget https://s3.amazonaws.com/metro-extracts.mapzen.com/san-francisco_california.osm.pbf
This is the metro extract for San Francisco, provided by Mapzen. Geofabrik is also a very good resource for OSM data extracts.
In the same folder, download SCOPE – databaSe Creator Osmosis Postgis loadEr.
wget https://github.com/baditaflorin/osm-postgis-scripts/blob/master/scope.sh
Make sure to set the file to be executable by using
chmod +x scope.sh
Load the data
Using SCOPE and following the instructions on the screen, load the *.osm.pbf into a database.
SCOPE automatically creates the database with hstore and PostGIS extensions and the pgsnapshot schema.
Play with the data
Now that you have the data set up, you can easily query it using the DB Manager from QGIS and some PostGIS scripts.
Interesting examples
For example, using the find_duplicate_nodes query, we can see that this building (@20.805088941495338, -104.92877615339032), appears on the same spot 23 times!
The one next to it (@20.8054225, -104.9278152) appears 22 times!
The node density for these areas (@20.4411867, -97.3172739) is too high – 168 nodes!
Also, 171 nodes for a small fence segment (@46.7487683, 23.559687)!
Feel free to fork the GitHub repository and modify the code to suit your needs! Also, if you feel inspired, you can suggest a better and shorter name or acronym for SCOPE!

