diff --git a/documentation/readme.md b/documentation/readme.md index 467e9994..b594bd24 100644 --- a/documentation/readme.md +++ b/documentation/readme.md @@ -1,7 +1,7 @@ # Understanding the Github Repository: The Github Repository (henceforth the repo) is named global-indicators, and the master branch is managed by Geoff Boeing. This section will describe what is information can be found in each part of the repo in a summarized form. For more detailed instruction on to run different parts of the code, please look within folders the code exists within. If you are unfamiliar with Github, we recommend that you read the Github Guides which can be found at: https://guides.github.com/. -There are two work folders and a documentation folder in the repo. The process folder holds the code and results of the main analysis for this project. The validation folder holds the codes, results, and analysis for Phase II validation of the project. In this readme, you will find a summary of what occurs in aspect of the repo. +There are three work folders and a documentation folder in the repo. The process folder holds the code and results of the main analysis for this project. The validation folder holds the codes, results, and analysis for Phase II validation of the project. The analysis folder for output indicator visualization and analysis. In this readme, you will find a summary of what occurs in aspect of the repo. ## Main Directory ### Readme @@ -11,7 +11,7 @@ The repo's readme gives a brief overview of the project and the indicators that There are various documents that are accessible from the main repo. These include - .gitignore: A list of files for the repo to ignore. This keeps irrelevant files away from the main folders of the repo - LICENSE: Legal information concerning the repo and its contents -- Win-docker-bash.bat: A file to smooth out the process of running Docker on a windows device +- Win-docker-bash.bat: A file to smooth out the process of running Docker on a windows device ### Docker Folder The docker folder lets gives you the relevant information to pull the docker image onto your machine and run bash in this container. @@ -26,8 +26,14 @@ The documentation folder contains this readme. The purpose of the documentation ## Process Folder The process folder runs through the process of loading in the data and calculating the indicators. The readme goes step-by-step on the code to run. The configuration folder has the specific configuration json file for each study city. The data folder is empty before any code is run. The process folder also has five python scripts (henceforth scripts). This section will explain what each script and notebook does. This serves as basic understanding of what exists in the Process folder. To understand what steps to follow to run the process, please read the Process Folder’s readme. +### Preprocess Folder +The preprocess folder runs through the process of preparing input datasets. Currently, it contains a configuration file (_project_configuration.xlsx) for the study regions defines both the project- and region-specific parameters, and the series of pre-processing scripts. The pre-processing procedure creates the geopackage and graphml files that are required for the subsequent steps of analysis. It is being coordinated by Carl. Please read the pre_process folder for more detail. + +### Collaborator_report folder +This folder contains scripts to create a PDF validation report that was distributed to collaborators for feedback. Then, preprocessing will be revised as required by the collaborators feedback in an iterative process to ensure that data corroborated with the expectations of local experts. This is part of the effort for Phase I validation. + ### Configuration Folder -The configuration folder contain configuration json files for each of the 25 analyzed cities. The configuration files make it easier to organize and analyze the different study cities by providing file paths for the input and output of each city. This configuration of file paths allows you to simply write the city name and allow the code to pull in all the city-specific data itself. For example, each city has a different geopackage that is labled with 'geopackagePath' in the configuration file. The process code is able to extract the correct geopackage by using the configuration file. In Adelaide's case, 'adelaide_au_2019_1600m_buffer.gpkg' will be called whenever the code retreives 'geopackagePath' for Adelaide. The configuration files allow the project to be more flexible by creating an easy way to add, delete, or alter study city data. +The configuration folder contain configuration files for each of the 25 analyzed cities. The configuration files make it easier to organize and analyze the different study cities by providing file paths for the input and output of each city. This configuration of file paths allows you to simply write the city name and allow the code to pull in all the city-specific data itself. For example, each city has a different geopackage that is labled with 'geopackagePath' in the configuration file. The process code is able to extract the correct geopackage by using the configuration file. In Adelaide's case, 'adelaide_au_2019_1600m_buffer.gpkg' will be called whenever the code retreives 'geopackagePath' for Adelaide. The configuration files allow the project to be more flexible by creating an easy way to add, delete, or alter study city data. ### Data Folder On the repo, the data folder is empty. You are able to download the data for the process and place the data in this folder. Instructions for obtaining the data are below. @@ -46,9 +52,13 @@ Run this script second. After projecting the data into the applicable crs, this 1. Finally, a z-score for the variables is calculated This script must be run first for each sample city before running the aggregation script. +### process_regions.sh +This is a shell script wrapper to run all study regions at once to process sample point estimates (sp.py) in sequence, and can be run using ```bash process_region.sh``` followed by a list of region names. + ### aggr.py Run this script third. This is the last script needed to be run. This script converts the data from sample points into hex data. This allows for within city analysis. It also concatenates each city so that the indicators are calculated for between city comparisons. The concatenation is why the sample points script must be run for every city before running this script. After running the script, Two indicators' geopackages will be created in the data/output folder. + ## Validation Folder The project’s validation phase aims to verify the accuracy of the indicators processed from the data used in the process folder i.e. the global human settlement layer and OSM data (henceforth global dataset). In order to do this, we have three phases of validation. @@ -64,7 +74,7 @@ As of Summer 2020, the validation folder is dedicated to Phase II validation. The Validation Folder’s readme explains how to run the official datasets for both street networks (edges) and destinations. ### Configuration Folder -The validation configuration folder serves a simmilar purpose to the configuration folder in the process folder. The configureation files exsit for each city for which the project has official data. Note, some cities have only edge data, only destination data, or edge and destination data. +The validation configuration folder serves a similar purpose to the configuration folder in the process folder. The configuration files exists for each city for which the project has official data. Note, some cities have only edge data, only destination data, or edge and destination data. ### Data Folder On the repo, the data folder is empty. You are able to download the data for validation and place the data in this folder. Instructions for obtaining the data are below. @@ -73,7 +83,7 @@ On the repo, the data folder is empty. You are able to download the data for val Both the edge folder and the destination folder start with a readme file and a python script. The readme file explains the results of the validation work. Run the python script to conduct Phase II validation. After running the python script, each folder will populate with a csv file containing relevant indicators and a fig folder for the created figures. ### Edge -The edge folder compares the OSM derived street network with the offical street network. +The edge folder compares the OSM derived street network with the official street network. ### Destination The destination folder compares fresh food destinations between the OSM derived data and the official data. This includes supermarkets, markets, and shops like bakeries. diff --git a/documentation/workflow.md b/documentation/workflow.md deleted file mode 100644 index 185340b0..00000000 --- a/documentation/workflow.md +++ /dev/null @@ -1,121 +0,0 @@ -# Workflow for calculating the indicators -The following section discusses what the code in the process does in order to calculate the indicators. - -A summarized description: -1. First, import the city’s street network, pedestrian network, open space configuration, and destinations. In this section, sample points are created along every 30 meters of the pedestrian network. These sample points will serve as the basis for the next section’s analysis. -1. Create local walkable neighborhoods (henceforth neighborhoods) within each city for analysis. Neighborhoods are created by - 1. First, take a 1600 meter radius from each sample point - 1. Second, buffer the edges within this 1600 meter radius by 50 meters -1. Calculate different statistics for each neighborhood within the study region. This includes average population and intersection density. It also includes access to destinations and public open space. Finally, a walkability score is calculated from these statistics. -1. Convert data to the hex-level. - 1. Within-city: Average the statistics from step three into the hexagon level - 1. Relative to all cities: Use z-sores to translate walkscore of hexes so that it can be understood relative to all cities -1. Finally, adjust for population. This allows to understand the indicators in terms of what the average person in the city experiences. This section also creates two indicators that represent the overall city-level walkability summaries, which are intended to show how walkable a city and its areas are for its population on average, compared with all other cities. - - -## Prepare study region input data sources, and city-specific config file -To get started, we need to prepare input datasource in geopackage format (**studyregion_country_yyyy_1600m_buffer.gpkg**, The first entry of yyyy indicates the year the datasource is targetting) for each study region, these include: -| Input data | Geometry | Description | Open data source | -| --- | --- | --- | --- | -| aos_nodes_30m_line | point | Public open space pseudo entry points (points on boundary of park every 20m within 30m of road) | OpenStreetMap | -| clean_intersections_12m | point | Clean intersections (not required; counts are associated with pop_ghs_2015) | OpenStreetMap | -| dest_type | NA (non-spatial) | Summary of destinations and counts | OpenStreetMap | -| destinations | point | OSM destinations retrieved using specified definitions (only require: supermarkets, convenience, pt_any --- use dest_name_full to determine, will need to combine convenience destinations) | OpenStreetMap | -| pop_ghs_2015 | polygon | 250m hex grid, associated with area_sqkm (fixed), population estimate (2015), population per sq km, intersection count, intersections per sq km | Global Human Settlement ([GHSL](https://ghsl.jrc.ec.europa.eu/download.php?ds=pop) | -| urban_sample_points | point | Sample points in urban region (every 30m along pedestrian network) | OpenStreetMap | -| urban_study_region | polygon | Urban study region (intersection of city boundary and GHS 2015 urban centre layer) | [GHSL](https://ghsl.jrc.ec.europa.eu/download.php?ds=pop) | - - -And study region pedestrian network graph: -**studyregion_country_yyyy_10000m_pedestrian_osm_yyyymmdd.graphml** -- A derived 'pedestrian' network based on the OSM excerpt for the buffered study region (a 10km buffer around study region administrative boundary), processed using OSMnx with a custom walk-cycle tag filter (eg. excludes freeways and private roads / paths). -- The first entry of yyyy indicates the year the network is targetting; the date entry of yyyymmdd represents the retrieval date from OpenStreetMap. -- The graphml can be loaded using OSMnx, and other packages that read the graphml format (networkx, gephi, etc). - -Urban sample points are created every 30m along pedestrian network to use as original points (to destination) for spatial network analysis. We adopted this sampling approach as residential address locations are not available to us in most cases. - -Population estimation (pop_ghs_2015) is retrieved from Global Human Settlement website ([GHSL](https://ghsl.jrc.ec.europa.eu/download.php?ds=pop)), and re-aggregated to 250m hex grid. - -Urban study region or boundary is created based on the intersection of official city boundary and GHS urban center layer. This adjustment is to restrain study region within urban areas, in order to better justify the use of OSM resources. - -Daily living destinations typically contain supermarkets, convenience stores, public transportation, and public open spaces. Destination points are retrieved from OSM's Points of Interests database. - -Other input datasource including walkable street network and intersections are retrieved from OSM using OSMnx. - -We rely on OpenStreetMap database to conduct essential spatial analysis, with the idea that once the process are developed, they can be upscaled to other cities. However, modifications will be required to be made to each study region implementation to work in a global context. - -Please see `process/configuration` folder for examples in terms of how to prepare the config file for each study region. And See scripts: [setup_config.py](https://github.com/gboeing/global-indicators/blob/master/process/setup_config.py) for detailed project parameters in the process folder on how cities configuration json files are prepared. - -## Prepare neighborhood level stats -For each sample point, 50m buffer is created along the OSM pedestrian street network within 1600m walking distance radius of each sample point (correspond with 20min walk). Each network buffer could be considered as a "local walkable neighborhood". - -Next, we calculate average population and intersection density for each local walkable neighborhood within study region. -Detailed steps are as follows: -  1. load 250m hex grid from input gpkg with population and intersection density data -  2. intersect local walkable neighborhood (1600m) with 250m hex grid -  3. then calculate population and intersection density within each local walkable neighborhood (1600m) by averaging the hex level pop and intersection density data; final result is urban sample point dataframe with osmid, pop density, and intersection density. - -Then, we calculate sample point accessibility to daily living destinations (supermarket, convenience, & public transport) and public open space, and sample point walkability score. -Detailed steps as follow: -  1. using pandana package to calculate distance to access from sample points to destinations (daily living destinations, public open space) -  2. calculate accessibility score per sample point: transform accessibility distance to binary measure: 1 if access <= 500m, 0 otherwise -  3. calculate daily living score per sample point by summing the binary accessibility scores to all daily living destinations -  4. calculate walkability score per sample point: get z-scores for daily living accessibility, population density and intersection; sum these three z-scores to get the walkability score - -The sample point stats outputs are saved to city-specific output gpkg (**studyregion_country_yyyy_1600m_buffer_outputyyyymmdd.gpkg**, the date entry of yyyymmdd represents the data prepared date). A new layer *samplePointsData* will be created in each city's gpkg. -See scripts: [sp.py](https://github.com/gboeing/global-indicators/blob/master/process/sp.py) in the process folder for details. - -## Generate within-city indicators at the 250m hex grid level -We rely on sample points stats that generated for each city to calculate the within-city indicators for each study region. This process take sample point stats within each study region as input and aggregate them up to hex-level indicators. - -First, we calculate within-city indicators at hex level by taking the average of sample point stats within each hexagon. These sample point stats include pop and intersection density, daily living score, walkability score, and accessibility scores to destinations (supermarket, convenience, public transport and public open space). - -Next, we calculate walkability indicators at hex level relative to all cities. We first take the z-scores (relative to all cities) of pop and intersection density, and daily living generated at the hex level. Then, we sum these three z-scores to get the walkability index relative to all cities. - -These within-city indicators are saved to one output gpkg, named *global_indicators_hex_250m.gpkg*. Layers with hex-level indicators will be created for all study regions. -See scripts: [aggr.py](https://github.com/gboeing/global-indicators/blob/master/process/aggr.py) in the process folder for details. - -Output *global_indicators_hex_250m.gpkg*: - -|indicators | data type | description | -|---- | --- | --- | -| urban_sample_point_count | int | Count of urban sample points associated with each hexagon (judge using intersect); this must be positive. Zero sample count hexagons are not of relevance for output | -| pct_access_500m_supermarkets | float | Percentage of sample points with pedestrian network access to supermarkets within (up to and including) 500 metres | -| pct_access_500m_convenience | float | Percentage of sample points with pedestrian network access to convenience within (up to and including) 500 metres | -| pct_access_500m_pt_any | float | Percentage of sample points with pedestrian network access to public transport within (up to and including) 500 metres | -| pct_access_500m_public_open_space | float | Percentage of sample points with pedestrian network access to public open space within (up to and including) 500 metres | -| local_nh_population_density | float | Average local neighbourhood population density | -| local_nh_intersection_density | float | Average local neighbourhood intersection density | -| local_daily_living | float | Average local neighbourhood daily living score | -| local_walkability | float | Average local neighbourhood walkability score | -| all_cities_z_nh_population_density | float | Z-score of local neighbourhood population density relative to all cities | -| all_cities_z_nh_intersection_density | float | Z-score of local neighbourhood intersection density relative to all cities | -| all_cities_z_daily_living | float | Z-score of daily living score relative to all cities | -| all_cities_walkability | float | Walkability index relative to all cities | - - -## Generate across-city indicators at the city level -We calculate population-weighted city-level indicators relative to all cities. We rely on the hex-level indicators that generated for each city (in *global_indicators_hex_250m.gpkg*) and population estimates (in study region specific gpkg.) to calculate city-level indicators for across-cities comparison. This process take hex-level indicators (i.e. accessibility, pop density, street connectivity, within and across-city daily living and walkability) and population estimates within each study region as input and aggregate them up to city-level indicators using population weight. - - -Output *global_indicators_city.gpkg*: - -|indicators | data type | description | -|---- | --- | --- | -| pop_pct_access_500m_supermarkets | float | Percentage of population with pedestrian network access to supermarkets within (up to and including) 500 metres| -| pop_pct_access_500m_convenience | float | Percentage of population with pedestrian network access to convenience within (up to and including) 500 metres | -| pop_pct_access_500m_pt_any | float | Percentage of population with pedestrian network access to public transport within (up to and including) 500 metres | -| pop_pct_access_500m_public_open_space | float | Percentage of population with pedestrian network access to public open space within (up to and including) 500 metres | -| pop_nh_pop_density | float | Average local neighbourhood population density | -| pop_nh_intersection_density | float | Average local neighbourhood intersection density | -| pop_daily_living | float | Average daily living score for population (within city) | -| pop_walkability | float | Average walkability index for population (within city) | -| all_cities_pop_z_daily_living | float | Average z-score of daily living score for population relative to all cities | -| all_cities_walkability | float | Average walkability index for population relative to all cities| - -The pop_* indicators represent the average experience of population within each study region in terms of overall city-level accessibility, population density, street connectivity, daily living and walkability. - -The all_cities_* indicators represent the overall city-level walkability summaries, which are intended to show how walkable a city and its areas are for its population on average, compared with all other cities. - -The across-city indicators are saved to a output gpkg, *global_indicators_city.gpkg*. A layer with city-level indicators will be created for each study region. -See scripts: [aggr.py](https://github.com/gboeing/global-indicators/blob/master/process/aggr.py) in the process folder for details. diff --git a/process/data/readme.md b/process/data/readme.md index ff12960b..b660e152 100644 --- a/process/data/readme.md +++ b/process/data/readme.md @@ -1,10 +1,19 @@ -# Input Folder +# Input Folder -Data for processing goes in the `input` subfolder. For example, to run Odense, you need `odense_dk_2019_1600m_buffer.gpkg` and `odense_dk_2019_10000m_pedestrian_osm_20190902.graphml` in the `input` folder. +Data for processing goes in the `input` subfolder. For example, to run Odense, you need `odense_dk_2020_1600m_buffer.gpkg` and `odense_dk_2020_10000m_pedestrian_osm_20200813.graphml` in the `input` folder. -# Output Folder +# Output Folder -This subfolder contains hexagon level statistics and city level statistics, generated by the City & Intercity Aggregations script. +This subfolder contains: +City-specific sample point level statistics, generated by the Sample Point Estimates script: +- studyregion_country_yyyy_1600m_buffer_outputyyyymmdd.gpkg, for example, `odense_dk_2020_1600m_buffer_output20200820.gpkg` +Hexagon level statistics and city level statistics, generated by the City & Intercity Aggregations script: - global_indicators_hex_250m.gpkg (hexagon level) - global_indicators_city.gpkg (city level) + +# GTFS Folder + +This folder contains GTFS public transport configuration and data: +- gtfs_config.py +- gtfs_frequent_transit_headway_yyyy-mm-dd_python.gpkg diff --git a/process/readme.md b/process/readme.md index baa872bf..bb0b87a8 100644 --- a/process/readme.md +++ b/process/readme.md @@ -38,19 +38,15 @@ Please follow the instructions below to run the process. 1. ```python setup_config.py``` 1. Make sure to check on the list of cities within the ``setup_config.py``, it should include cities that you plan to analyze 1. You could either delete or add a pound sign (#) before each city you would NOT include in your analysis - 1. ```python sp.py [SPECIFIC CITY NAME].json true``` - 1. Use the file name that can be found under the **process/configuration** folder for each city. Example: For Adelaide, type ```python sp.py Adelaide.json true``` - 1. Only type true if using multiprocessing. On machines with lower capacity, we recommend not including ‘true’ in the command. So, for Adelaide, type ```python sp.py Adelaide.json``` + 1. ```python sp.py [SPECIFIC CITY NAME]``` + 1. Use the file name that can be found under the **process/configuration** folder for each city. Example: For Adelaide, type ```python sp.py Adelaide``` + 1. Alternatively, a shell script wrapper **process_region.sh** exists to run all study regions at once in sequence, and can be run using ```bash process_region.sh``` followed by a list of region names. For example, ```bash process_region.sh Adelaide Auckland Baltimore``` 1. Make sure to run this line of code for each and every city before running ``aggr.py`` script - 1. ```python aggr.py cities.json``` + 1. ```python aggr.py``` 1. Notice that you will get the final indicator geopackadge in **global-indicators/process/data/output** only after you run this ``aggr.py`` script Note that it will take several hours to several days to run these scripts, depending on the size of the study city. -Alternatively, if you would like to run only specific cities to produce the indicators, please do the following before running the aggregation script aggr.py. -1. Go into the **configuration** folder and open the ``cities.json`` file - 1. In ``cities.json``, under the key "gpkgNames", delete the cities if any that are not to be included in your analysis. Save file -1. Run ```python aggr.py cities.json``` ## Run the Jupyter Notebooks (TODO) diff --git a/readme.md b/readme.md index e85abfd8..7873367f 100644 --- a/readme.md +++ b/readme.md @@ -1,20 +1,20 @@ -# Global liveability indicators project - -## Background -RMIT University, in collaboration with researchers from other universities worldwide, is undertaking a project, the Global Indicators Project, to calculate health-related spatial built environment indicators for 25 cities globally; The project aims to make use of open data sources, such as OpenStreetMap (OSM), the Global Human Settlement Layer (GHSL), and GTFS feeds (where available) as input to the indicator processing. After indicators have been derived for a city, members of the team and study region collaborators who have local knowledge of that city will validate these indicators. - -This (proposed) repository contains documentation and process scripts used for calculating the global liveability indicators in the ['Lancet series'](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)01284-2/fulltext) project, 2019. - -The processes are developed to create indicators for our selected global cities (with the potential that these processes could be applied to other study region globally). These indicators are: -1. Population per square kilometre -2. Street connectivity per square kilometre -3. Access to supermarkets within 500 metres -4. Access to convenience stores within 500 metres -5. Access to a public transport stop (any mode) within 500 metres -6. Access to public open space (e.g. parks) within 500 metres -7. Access to a frequent public transport stop (any mode) within 500 metres -8. Daily living score within 500 metres (within and across cities) -9. Walkability scores (within and across cities) +# global-indicators + +A open-source tool in python to compute pedestrian accessibility indicators for cities worldwide using open data, such as OpenStreetMap (OSM), the Global Human Settlement Layer (GHSL), and GTFS feeds (where available). + +This tool presents a generalized method to measure pedestrian accessibility indicators within- and between-city at both city scale and high-resolution grid level. The methodology and the open data approach developed in this research can be expanded to many cities worldwide to support local policy making towards healthy and sustainable living environments. + +The process scripts enable computation of the following indicators of pedestrian accessibility: +1. Urban area in square kilometers +2. Population size and population density +3. Street connectivity: intersections per square kilometer +4. Access to destinations: population access within 500 meters walking distance to: + - a supermarkets + - a convenience store + - any public open space (e.g. parks) + - any public transport stop (any mode) +5. Daily living score (within and across cities) +6. Walkability index (within and across cities) ## Documentation Please refer to the documentation folder readme for more information about this repository.