As part of the requirements for the Master of Disaster Risk & Resilience programme at the University of Canterbury, this research project explored the potential for machine learning models to make free Digital Surface Models (such as the widely-used SRTM) more applicable for flood modelling, by stripping away vertical biases relating to vegetation & built-up areas to get a "bare earth" Digital Terrain Model.
The image below visualises the performance of one of these models (a fully-convolutional neural network) in one of the three test zones considered (i.e. data unseen during model training & validation, used to assess the model's ability to generalise to new locations). A more detailed description is provided in the associated open-access journal article: Meadows & Wilson 2021.
All Python code fragments used during this research are shared here (covering preparing input data, building & training three different ML models, and visualising the results), in the hope that they'll be useful for others doing related work or extending/improving this approach. Please note this code includes lots of exploratory steps & some dead ends, and is not a refined step-by-step template for applying this approach in a new location.
Scripts are stored in folders relating to the virtual environments within which they were run, along with a text file summarising all packages loaded in each environment:
- geo: geospatial processing & mapping
- sklearn: development of Random Forest model
- tf2: development of neural network models
- osm: downloading OpenStreetMap data
The data processed for use in this project comprised the feature data (free, global datasets relevant to the vertical bias in DSMs, to be used as inputs to the machine learning models), target data (the reference "bare earth" DTM from which the models learn to predict vertical bias), and some supplementary datasets (not essential to the modelling but used to explore/understand the results).
A guiding principle for the project was that all feature (input) data should be available for free and with global (or near-global) coverage, so as to maximise applicability in low-income countries/contexts. While these datasets were too big to store here, all can be downloaded for free and relatively easily (some require signing up to the provider platform) based on the notes below.
Digital Surface Models (DSMs)
- SRTM: Downloaded from EarthExplorer under Digital Elevation > SRTM > SRTM 1 Arc-Second Global
- ASTER: Downloaded from EarthData Search ("ASTER Global Digital Elevation Model V003")
- AW3D30: Downloaded from Earth Observation Research Centre (Version 2.2, the latest available at the time)
Multi-spectral imagery
- Landsat-7: Downloaded from EarthExplorer under Landsat > Landsat Collection 1 Level-2 (On-Demand) > Landsat 7 ETM+ C1 Level-2 (surface reflectance bands) and Landsat > Landsat Collection 1 Level-1 > Landsat 7 ETM+ C1 Level-1 (thermal & panchromatic bands), limited to the Tier 1 collection only and a 6-month period centred around the SRTM data collection period (11-22 Feb 2000)
- Landsat-8: Downloaded from EarthExplorer under Landsat -> Landsat Collection 1 Level-2 (On-Demand) -> Landsat 8 OLI/TIRS C1 Level-2 (surface reflectance bands) and Landsat -> Landsat Collection 1 Level-1 -> Landsat 8 OLI/TIRS C1 Level-1 (thermal & panchromatic bands), limited to the Tier 1 collection only and 6-month periods centred around each of the LiDAR survey dates (in 2016, 2017 & 2018)
Night-time light
- DMSP-OLS Nighttime Lights Time Series (annual composites): Downloaded from the NOAA Earth Observation Group
- VIIRS Day/Night Band Nighttime Lights (monthly composites): Downloaded from the CSM Earth Observation Group
Others
- Global forest canopy height: Developed by Simard et al. 2011 and available for download here
- Global forest cover: Developed by Hansen et al. 2013 and available for download here
- Global surface water: Developed by Pekel et al. 2016 and available for download here
- OpenStreetMap layers: Downloaded using the OSMnx Python module developed by Boeing 2017
In order to learn how to predict (and then correct) the vertical biases present in DSMs, the models need reference data - "bare earth" DTMs assumed to be the "ground truth" that we're aiming for. For this project, we used three of the high-resolution LiDAR-derived DTMs published online by the New Zealand Government, accessible to all via the Land Information New Zealand (LINZ) Data Service. The specific LiDAR surveys used are summarised below, from the Marlborough & Tasman Districts (in the north of Aotearoa New Zealand's South Island):
- Marlborough (May-Sep 2018): DTM and corresponding index tiles
- Tasman - Golden Bay (Nov-Dec 2017): DTM and corresponding index tiles
- Tasman - Abel Tasman & Golden Bay (Dec 2016): DTM and corresponding index tiles
To find similar target/reference DTM data in other parts of the world, the OpenTopography initiative maintains a catalogue of freely available sources.
A few other datasets are referred to in the code, not as inputs to the machine learning models but just as references to better understand the results.
- MERIT DSM: Improved DSM developed by Yamazaki et al. 2017, with a request form for the data available here
- New Zealand Land Cover Database (LCDB v5.0): Developed by Manaaki Whenua - Landcare Research, available from the LRIS portal
The broad approach taken is summarised below as succinctly as possible, with further details provided as comments in the relevant scripts.
-
For each available LiDAR survey zone, process the DSMs and DTM in tandem: clipping each DSM (SRTM, ASTER and AW3D30) to the extent covered by the LiDAR survey, and resampling the DTM to the same resolution & grid alignment as each DSM. Various DSM derivatives (such as slope, aspect & topographical index products) are also prepared here.
-
Based on a comparison of differences between each DSM and the DTM (resampled to match that particular DSM), the SRTM DSM was selected as the "base" for all further processing (script).
-
Process all other input datasets - resampling to match the SRTM resolution & grid alignment, masking out clouds for the multi-spectral imagery, applying bounds where appropriate (e.g. for percentage variables):
-
Divide all available data into training (90%), validation (5%) and testing (5%) subsets, and prepare for input to the pixel-based approaches (random forest & standard neural network) and patch-based approach (convolutional neural network) (script).
-
Use step floating forward selection (SFFS) (with a random forest estimator) to select relevant features based on the training & validation datasets (script)
-
Train the random forest model, tuning hyperparameters with reference to the validation data subset (script)
-
Train the densely-connected neural network model, tuning hyperparameters with reference to the validation data subset (script)
-
Train the fully-convolutional neural network model, tuning hyperparameters with reference to the validation data subset (script)
-
Visualise results for the three zones of the testing data subset (unseen during model development) (script)