This repository has been archived by the owner on Oct 4, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
dekstop/hot-history-import
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Scripts to import an aggregated projection of HOT TM2 edit histories into a Postgres database. By Martin Dittus (@dekstop) in 2014-2017, used as a starting point for my research projects. DISCLAIMER Please only run these after careful study of what they're doing. Parts of the process require significant disk/cpu resources. No need to create more problems for overworked server admins. -- SOFTWARE REQUIREMENTS Misc ETL tools/scripts: https://github.com/dekstop/hot-tm2-scraper https://github.com/dekstop/osm-history-parser https://github.com/dekstop/osm-changeset-parser These require: - Bash, curl. - cmake, a C++ compiler, Boost. - Osmium 2.x libraries -- which in turn requires OSMPBF, GDAL, and likely more. - A Python 2.x environment with GDAL and lxml. - PostgreSQL 9.4 with PostGIS 2.1 (approx). It can take some time to set these up... prepare yourself for lots of version conflicts and badly documented release issues. Geo processing software is still a pain to use in 2017. --- SYSTEM REQUIREMENTS The bandwidth to download ~70GB in OSM history files. Lots of disk space -- in late 2015 it took about 150GB in temp files (can be deleted after import), and 80GB for the database (can be reduced after import, depending on your needs). By early 2017, this has grown to 250GB temp storage, and 180GB for the DB. It then takes about a day for the full import. Key performance bottlenecks are disk scans/seeks, although CPU can be a bottleneck during the initial parsing stages. --- CREATING A DATABASE Run as privileged postgres user. $ DB=hotosm_history_20170306 $ createuser osm --pwprompt $ createdb $DB $ psql -d $DB -c "CREATE EXTENSION postgis;" $ psql -d $DB -c "GRANT CREATE ON DATABASE \"${DB}\" TO osm;" Edit import.sh to reflect the chosen database name. And then: $ ./import.sh The importer script assumes that the "osm" user can run psql commands from the shell without a password prompt. There are various ways of setting this up without leaving the database exposed. The simplest option is the use of a ~/.pgpass file. --- TODOs TODO: unescape unicode html entities in the scraper/parser, e.g. "Savai'i Island" TODO: start a makefile version Makefile examples: https://github.com/stamen/toner-carto/blob/master/Makefile http://mojodna.net/2015/01/07/make-for-data-using-make.html http://bitaesthetics.com/posts/make-for-data-scientists.html Can we organise this as a collection of makefiles? Have one shared stub, then add project-specific modules?
About
Scripts to load a projection of HOT TM2 edit histories into a Postgres database, and compute basic aggregate measures. Used as a starting point for my research projects.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published