The tool downloads OpenStreetMap QA Tile information and satellite imagery tiles and saves them as an .npz
file for use in Machine Learning training.
satellite imagery from Mapbox and Digital Globe
pip install label_maker
Note that running this library this requires tippecanoe
as a "peer-dependency" and that command should be available from your command-line before running this.
Before running any commands, it is necessary to create a config.json
file to specify inputs to the data preparation process:
{
"country": "togo",
"bounding_box": [1.09725, 6.05520, 1.34582, 6.30915],
"zoom": 12,
"classes": [
{ "name": "Roads", "filter": ["has", "highway"] },
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=ACCESS_TOKEN",
"background_ratio": 1,
"ml_type": "classification"
}
country
: The OSM QA Tile extract to download. The value should be a country string matching a value found inlabel_maker/countries.txt
bounding_box
: The bounding box to create images from. This should be given in the form:[xmin, ymin, xmax, ymax]
as longitude and latitude values between[-180, 180]
and[-90, 90]
respectively. Values should use the WGS84 datum, with longitude and latitude units of decimal degrees.zoom
: The zoom level to create images as. This functions as a rough proxy for resolution. Values should be given as integers.classes
: An array of classes for machine learning training. Each class is defined as an object with two required properties:name
: The class namefilter
: A Mapbox GL Filter to define any vector features matching this class. Filters are applied with the standalone featureFilter from Mapbox GL JS.buffer
: The number of pixels to buffer the geometry by. This is an optional parameter to buffer the label forobject-detection
andsegmentation
tasks. Accepts any number (positive or negative). It uses Shapelyobject.buffer
to calculate the final geometry. You can verify that your buffer options create the desired labels by inspecting the files created indata/labels/
after running thelabels
command.
imagery
: One of:- A template string for a tiled imagery service. Note that you will generally need an API key to obtain images and there may be associated costs. The above example requires a Mapbox access token
- A GeoTIFF file location. Works with both local and remote files. Ex:
'http://oin-hotosm.s3.amazonaws.com/593ede5ee407d70011386139/0/3041615b-2bdb-40c5-b834-36f580baca29.tif'
- A WMS endpoint
GetMap
request. Fill out all necessary parameters exceptbbox
which should be set as{bbox}
. Ex:'https://basemap.nationalmap.gov/arcgis/services/USGSImageryOnly/MapServer/WMSServer?SERVICE=WMS&REQUEST=GetMap&VERSION=1.1.1&LAYERS=0&STYLES=&FORMAT=image%2Fjpeg&TRANSPARENT=false&HEIGHT=256&WIDTH=256&SRS=EPSG%3A3857&BBOX={bbox}'
background_ratio
: For single-class classification problems, we need to download images with no matching class. We will downloadbackground_ratio
times the number of images matching the one class.ml_type
: One of"classification"
,"object-detection"
, or"segmentation"
. For the final label numpy arrays (y_train
andy_test
), we will produce a different label depending upon thetype
."classification"
: An array of the same length asclasses
. Each array value will be either1
or0
based on whether it matches the class at the same index"object-detection"
: An array of bounding boxes of the form[xmin, ymin, width, height, class_index]
. In this case, the values are not latitude and longitude values but pixel values measured from the upper left-hand corner. Each feature is tested against each class so if a feature matches two or more classes, it will have the corresponding number of bounding boxes created."segmentation"
: An array of shape(256, 256)
with values matching the class_index label at that position. The classes are applied sequentially according toconfig.json
so latter classes will be written over earlier class labels.
imagery_offset
: An optional list of integers representing the number of pixels to offset imagery. For example[15, -5]
will move the images 15 pixels right and 5 pixels up relative to the requested tile bounds.
label-maker
is most easily used as a command line tool. There are five commands documented below. All commands accept two flags:
-d
or--dest
: string directory for storing output files. (default:'data'
)-c
or--config
: string location of config.json file. (default'config.json'
)
Example:
$ label-maker download --dest flood-monitoring-project --config flood.json
Download and unzip OSM QA tiles
$ label-maker download
Saving QA tiles to data/ghana.mbtiles
100% 18.6 MiB 1.8 MiB/s 0:00:00 ETA
Retiles the OSM data to the desired zoom level, creates label data (labels.npz
), calculates class statistics, creates visual label files (either GeoJSON or PNG files depending upon ml_type
). Requires the OSM QA tiles from the previous step. Accepts an additional flag:
-s
or--sparse
: boolean if this flag is present, only save labels for up ton
background tiles, wheren
is equal tobackground_ratio
times the number of tiles with a class label.
$ label-maker labels
Determining labels for each tile
---
Residential: 638 tiles
Total tiles: 1189
Write out labels to data/labels.npz
Downloads example satellite images for each class. Requires the labels.npz
file from the previous step. Accepts an additional flag:
-n
or--number
: integer number of examples images to create per class. (default:5
)
$ label-maker preview -n 10
Writing example images to data/examples
Downloading 10 tiles for class Residential
Downloads all imagery tiles needed for training. Requires the labels.npz
file from the labels
step.
$ label-maker images
Downloading 1189 tiles to data/tiles
Bundles the satellite images and labels to create a final data.npz
file. Requires the labels.npz
file from the labels
step and downloaded image tiles from the images
step.
$ label-maker package
Saving packaged file to data/data.npz
Once you have a packaged data.npz
file, you can use numpy.load
to load it. As an example, here is how you can supply the created data to a Keras Model:
# the data, shuffled and split between train and test sets
npz = np.load('data.npz')
x_train = npz['x_train']
y_train = npz['y_train']
x_test = npz['x_test']
y_test = npz['y_test']
# define your model here, example usage in Keras
model = Sequential()
# ...
model.compile(...)
# train
model.fit(x_train, y_train, batch_size=16, epochs=50)
model.evaluate(x_test, y_test, batch_size=16)
For more detailed walkthroughs, check out the examples page
Install in development mode using
pip install -e .
Tests are run using unittest
. Unit tests are at tests/unit
and
integration tests are at tests/integration
.
You can test a single file like:
python -m unittest test/unit/test_validate.py
or a folder with
python -m unittest discover -v -s test/unit
Full options here
This library builds on the concepts of skynet-data. It wouldn't be possible without the excellent data from OpenStreetMap and Mapbox under the following licenses:
- OSM QA tile data copyright OpenStreetMap contributors and licensed under ODbL
- Mapbox Satellite data can be traced for noncommercial purposes.