The
There is an enormous amount of national/global space-time + datasets that are free and accessible, such as numerous satellite + platforms, weather, terrain, soil, and landscape data. Currently, a + researcher must search through several places for these resources. + This includes publication search engines, specialist aggregators or + repositories, R/Python libraries, statistical packages, GitHub, on + the web, and through personal contacts. Many data layers require a + number of post-processing steps that a user can undertake to extract + meaning, e.g., spatial alignment, temporal means, aggregation in + time. The datasets are then able to be selected and extracted in the + desired format, and stored to either their local desktop, or virtual + desktop with access to a high compute workspace. All of the above is + a non-trivial task and the ideal experience for researchers would be + to be able to find and extract key foundational datasets (such as + climate, landscape, soil, and remote sensing data) at once given the + required spatial, area and temporal range for their analysis.
+The need for a
To get started, some example workflows and tutorials are + provided as:
+-
+
The main goal of
-
+
Retrieve: given set of locations, automatically access and + download multiple data sources (APIs) from a diverse range of + geospatial and soil data sources
+Process: Spatial and temporal processing, conversion to + DataFrames and custom raster-files
+Output: Ready-made dataset for machine learning (training set + and prediction mapping)
+Below is a list of the main features available for the
+
-
+
enabling reproducible workflows via YAML settings files
+automatic data retrieval from geodata APIs for given + locations and dates
+automatic download and spatial-temporal processing of + geo-spatial maps for user-specified bounding box, resolution, + and time-scale
+support for multiple temporal aggregation options and + spatial-temporal buffer
+automatic extraction of retrieved data into ready-made + DataFrames for ML training
+automatic generation of ready-made aligned maps and data for + ML prediction models
+visualisation of downloaded and aligned maps
+support for saving and loading settings via interactive + widgets
+with connectivity support to the Google Earth Engine API, + perform petabyte-scale operations which include temporal + cloud/shadow masking and automatic calculation of spectral + indices
+easy install via conda-forge or PyPI package index
+The following data sources are currently implemented:
+-
+
Soil and Landscape Grid of Australia (SLGA)
+SILO Climate Database, Australia
+ (
National Digital Elevation Model (DEM) 1 Second + Hydrologically Enforced, Australia
+Digital Earth Australia (DEA) Geoscience Earth Observations,
+ Australia
+ (
GSKY Data Server for DEA Geoscience Earth Observations, + Australia
+Radiometric Data, Australia
+Google Earth Engine Data (GEE account needed)
+A detailed list of all available layers and their description can
+ be found in
+
This software was developed by the Sydney Informatics Hub, a core + research facility of the University of Sydney, as part of the Geodata + Harvester project for the Agricultural Research Federation (AgReFed). + If you make use of this software for your research project, please + cite this paper or include the following acknowledgment:
+“This research was supported by the Sydney Informatics Hub, a Core + Research Facility of the University of Sydney, and the Agricultural + Research Federation (AgReFed).”
+AgReFed is supported by the Australian Research Data Commons (ARDC) + and the Australian Government through the National Collaborative + Research Infrastructure Strategy (NCRIS).
+https://corteva.github.io/rioxarray/
+https://rasterio.readthedocs.io
+https://pystac.readthedocs.io
+https://intake.readthedocs.io
+