diff --git a/joss.05205/10.21105.joss.05205.crossref.xml b/joss.05205/10.21105.joss.05205.crossref.xml new file mode 100644 index 0000000000..793b00b4ef --- /dev/null +++ b/joss.05205/10.21105.joss.05205.crossref.xml @@ -0,0 +1,196 @@ + + + + 20230915T183543-56c080647fe3e306816c867fd761fb7ad8825894 + 20230915183543 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 09 + 2023 + + + 8 + + 89 + + + + Geodata-Harvester: A Python package to jumpstart +geospatial data extraction and analysis + + + + Sebastian + Haan + https://orcid.org/0000-0002-5994-5637 + + + Januar + Harianto + https://orcid.org/0000-0002-4803-108X + + + Nathaniel + Butterworth + https://orcid.org/0000-0002-1212-8816 + + + Thomas + Bishop + https://orcid.org/0000-0002-6723-7323 + + + + 09 + 15 + 2023 + + + 5205 + + + 10.21105/joss.05205 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.8339817 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/5205 + + + + 10.21105/joss.05205 + https://joss.theoj.org/papers/10.21105/joss.05205 + + + https://joss.theoj.org/papers/10.21105/joss.05205.pdf + + + + + + Google Earth Engine: Planetary-scale +geospatial analysis for everyone + Gorelick + Remote sensing of Environment + 202 + 10.1016/j.rse.2017.06.031 + 2017 + Gorelick, N., Hancher, M., Dixon, M., +Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: +Planetary-scale geospatial analysis for everyone. Remote Sensing of +Environment, 202, 18–27. +https://doi.org/10.1016/j.rse.2017.06.031 + + + Using spatial interpolation to construct a +comprehensive archive of Australian climate data + Jeffrey + Environmental Modelling & +Software + 4 + 16 + 10.1016/S1364-8152(01)00008-1 + 2001 + Jeffrey, S. J., Carter, J. O., +Moodie, K. B., & Beswick, A. R. (2001). Using spatial interpolation +to construct a comprehensive archive of Australian climate data. +Environmental Modelling & Software, 16(4), 309–330. +https://doi.org/10.1016/S1364-8152(01)00008-1 + + + Digital Earth Australia notebooks and tools +repository + Krause + Geoscience Australia + 2021 + Krause, C., Dunn, B., Bishop-Taylor, +R., Adams, C., Burton, C., Alger, M., Chua, S., Phillips, C., Newey, V., +Kouzoubov, K., & others. (2021). Digital Earth Australia notebooks +and tools repository. Geoscience Australia. + + + Eemont: A Python package that extends Google +Earth Engine + Montero + Journal of Open Source +Software + 62 + 6 + 10.21105/joss.03168 + 2021 + Montero, D. (2021). Eemont: A Python +package that extends Google Earth Engine. Journal of Open Source +Software, 6(62), 3168. +https://doi.org/10.21105/joss.03168 + + + Spectral: Awesome Spectral Indices deployed +via the Google Earth Engine JavaScript API + David Montero + The International Archives of the +Photogrammetry, Remote Sensing and Spatial Information +Sciences + XLVIII-4/W1-2022 + 10.5194/isprs-archives-XLVIII-4-W1-2022-301-2022 + 2022 + David Montero, M. D. M., Cesar Aybar. +(2022). Spectral: Awesome Spectral Indices deployed via the Google Earth +Engine JavaScript API. The International Archives of the Photogrammetry, +Remote Sensing and Spatial Information Sciences, XLVIII-4/W1-2022, +301–306. +https://doi.org/10.5194/isprs-archives-XLVIII-4-W1-2022-301-2022 + + + Geemap: A Python package for interactive +mapping with Google Earth Engine + Wu + Journal of Open Source +Software + 51 + 5 + 10.21105/joss.02305 + 2020 + Wu, Q. (2020). Geemap: A Python +package for interactive mapping with Google Earth Engine. Journal of +Open Source Software, 5(51), 2305. +https://doi.org/10.21105/joss.02305 + + + + + + diff --git a/joss.05205/10.21105.joss.05205.jats b/joss.05205/10.21105.joss.05205.jats new file mode 100644 index 0000000000..af67998420 --- /dev/null +++ b/joss.05205/10.21105.joss.05205.jats @@ -0,0 +1,449 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +5205 +10.21105/joss.05205 + +Geodata-Harvester: A Python package to jumpstart +geospatial data extraction and analysis + + + +https://orcid.org/0000-0002-5994-5637 + +Haan +Sebastian + + +* + + +https://orcid.org/0000-0002-4803-108X + +Harianto +Januar + + + + +https://orcid.org/0000-0002-1212-8816 + +Butterworth +Nathaniel + + + + +https://orcid.org/0000-0002-6723-7323 + +Bishop +Thomas + + + + + +Sydney Informatics Hub, The University of Sydney, +Australia + + + + +* E-mail: + + +21 +6 +2023 + +8 +89 +5205 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Python +Remote Sensing +Environmental science +Geoscience + + + + + + Summary +

Geodata-Harvester is a user-friendly Python + package that enables researchers with reusable workflows and software + tools for automatic extraction, processing, and analysis of + geo-spatial and environmental data. User provided data is + auto-completed with a suitable set of spatial- and temporal-aligned + covariates as a ready-made dataset for machine learning models. All + data layer maps are automatically extracted and aligned for a specific + region and time period.

+

The Geodata-Harvester is designed to be + modular and extensible, offering multiple front-end notebooks and use + case scenarios to encourage interaction and experimentation with the + pipeline. With its connectivity support to the Google Earth Engine + (GEE) API + (Gorelick + et al., 2017) and integrating the latest GEE add-ons + (David + Montero, 2022; + Montero, + 2021; + Wu, + 2020), the software also enables users to perform + petabyte-scale operations, including temporal cloud/shadow masking and + automatic calculation of spectral indices.

+ + Statement of Need +

There is an enormous amount of national/global space-time + datasets that are free and accessible, such as numerous satellite + platforms, weather, terrain, soil, and landscape data. Currently, a + researcher must search through several places for these resources. + This includes publication search engines, specialist aggregators or + repositories, R/Python libraries, statistical packages, GitHub, on + the web, and through personal contacts. Many data layers require a + number of post-processing steps that a user can undertake to extract + meaning, e.g., spatial alignment, temporal means, aggregation in + time. The datasets are then able to be selected and extracted in the + desired format, and stored to either their local desktop, or virtual + desktop with access to a high compute workspace. All of the above is + a non-trivial task and the ideal experience for researchers would be + to be able to find and extract key foundational datasets (such as + climate, landscape, soil, and remote sensing data) at once given the + required spatial, area and temporal range for their analysis.

+

The need for a Geodata-Harvester emerges + from the increasing demand for an extendable, automated, and + reusable system for geo-spatial and environmental data extraction + and machine learning model preparation. The + Geodata-Harvester software allows researchers + to jumpstart their analysis with a ready-made set of + spatial-temporal aligned raster maps and dataframes. Unlike + geodata-handler packages such as osgeo + libraries, + rasterio1, + rioxarray2, + pystack3, + intake + plugins4, the Geodata-Harvester + builds on top of these resources a cohesive workflow for automatic + data extraction from multiple geospatial sources at once. Its unique + features include reproducible workflows via YAML settings files, + connectivity to a wide range of geodata APIs, automatic data + retrieval and processing, and high-level integration of Google Earth + Engine capabilities. The aim of this on-going project is to offer a + flexible all-in-one solution, enabling efficient geospatial research + and machine learning applications.

+ + Tutorials and Workshops +

To get started, some example workflows and tutorials are + provided as:

+ + +

Jupyter + notebooks

+
+ +

Geodata-Harvester + workshop material.

+
+ +

Geodata-Harvester + documentation

+
+ +

Settings_Overview

+
+ +

GEE + harvester project: eeharvest

+
+ +

R-package + wrapper: dataharvesteR

+
+
+ +

Geodata-Harvester + overview

+ +
+
+
+ + Functionality and Key Features +

The main goal of Geodata-Harvester is to + enable researchers with reusable workflows for automatic data + extraction and processing:

+ + +

Retrieve: given set of locations, automatically access and + download multiple data sources (APIs) from a diverse range of + geospatial and soil data sources

+
+ +

Process: Spatial and temporal processing, conversion to + DataFrames and custom raster-files

+
+ +

Output: Ready-made dataset for machine learning (training set + and prediction mapping)

+
+
+

Below is a list of the main features available for the + Geodata-Harvester package. Please check the + project GitHub webpage and notebooks for examples, data selection, + and other settings.

+ + +

enabling reproducible workflows via YAML settings files

+
+ +

automatic data retrieval from geodata APIs for given + locations and dates

+
+ +

automatic download and spatial-temporal processing of + geo-spatial maps for user-specified bounding box, resolution, + and time-scale

+
+ +

support for multiple temporal aggregation options and + spatial-temporal buffer

+
+ +

automatic extraction of retrieved data into ready-made + DataFrames for ML training

+
+ +

automatic generation of ready-made aligned maps and data for + ML prediction models

+
+ +

visualisation of downloaded and aligned maps

+
+ +

support for saving and loading settings via interactive + widgets

+
+ +

with connectivity support to the Google Earth Engine API, + perform petabyte-scale operations which include temporal + cloud/shadow masking and automatic calculation of spectral + indices

+
+ +

easy install via conda-forge or PyPI package index

+
+
+
+ + Data Sources +

The following data sources are currently implemented:

+ + +

Soil and Landscape Grid of Australia (SLGA)

+
+ +

SILO Climate Database, Australia + (Jeffrey + et al., 2001)

+
+ +

National Digital Elevation Model (DEM) 1 Second + Hydrologically Enforced, Australia

+
+ +

Digital Earth Australia (DEA) Geoscience Earth Observations, + Australia + (Krause + et al., 2021)

+
+ +

GSKY Data Server for DEA Geoscience Earth Observations, + Australia

+
+ +

Radiometric Data, Australia

+
+ +

Google Earth Engine Data (GEE account needed)

+
+
+

A detailed list of all available layers and their description can + be found in + Data + Overview. The Geodata-Harvester is + designed to be extendable and new data source modules can be added + (see + adding + new data source guidelines).

+
+
+ + Acknowledgements +

This software was developed by the Sydney Informatics Hub, a core + research facility of the University of Sydney, as part of the Geodata + Harvester project for the Agricultural Research Federation (AgReFed). + If you make use of this software for your research project, please + cite this paper or include the following acknowledgment:

+

“This research was supported by the Sydney Informatics Hub, a Core + Research Facility of the University of Sydney, and the Agricultural + Research Federation (AgReFed).”

+

AgReFed is supported by the Australian Research Data Commons (ARDC) + and the Australian Government through the National Collaborative + Research Infrastructure Strategy (NCRIS).

+
+ + + + + + + GorelickNoel + HancherMatt + DixonMike + IlyushchenkoSimon + ThauDavid + MooreRebecca + + Google Earth Engine: Planetary-scale geospatial analysis for everyone + Remote sensing of Environment + Elsevier + 2017 + 202 + 10.1016/j.rse.2017.06.031 + 18 + 27 + + + + + + JeffreyStephen J + CarterJohn O + MoodieKeith B + BeswickAlan R + + Using spatial interpolation to construct a comprehensive archive of Australian climate data + Environmental Modelling & Software + Elsevier + 2001 + 16 + 4 + 10.1016/S1364-8152(01)00008-1 + 309 + 330 + + + + + + KrauseC + DunnB + Bishop-TaylorR + AdamsC + BurtonC + AlgerM + ChuaS + PhillipsC + NeweyV + KouzoubovK + others + + Digital Earth Australia notebooks and tools repository + Geoscience Australia + 2021 + + + + + + MonteroDavid + + Eemont: A Python package that extends Google Earth Engine + Journal of Open Source Software + The Open Journal + 2021 + 6 + 62 + https://doi.org/10.21105/joss.03168 + 10.21105/joss.03168 + 3168 + + + + + + + David MonteroMiguel D. MahechaCesar Aybar + + Spectral: Awesome Spectral Indices deployed via the Google Earth Engine JavaScript API + The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences + 2022 + XLVIII-4/W1-2022 + https://doi.org/10.5194/isprs-archives-XLVIII-4-W1-2022-301-2022 + 10.5194/isprs-archives-XLVIII-4-W1-2022-301-2022 + 301 + 306 + + + + + + WuQiusheng + + Geemap: A Python package for interactive mapping with Google Earth Engine + Journal of Open Source Software + The Open Journal + 2020 + 5 + 51 + https://doi.org/10.21105/joss.02305 + 10.21105/joss.02305 + 2305 + + + + + + +

https://corteva.github.io/rioxarray/

+
+ +

https://rasterio.readthedocs.io

+
+ +

https://pystac.readthedocs.io

+
+ +

https://intake.readthedocs.io

+
+
+
+
diff --git a/joss.05205/10.21105.joss.05205.pdf b/joss.05205/10.21105.joss.05205.pdf new file mode 100644 index 0000000000..755e1101f6 Binary files /dev/null and b/joss.05205/10.21105.joss.05205.pdf differ diff --git a/joss.05205/media/geodata_harvester.jpg b/joss.05205/media/geodata_harvester.jpg new file mode 100644 index 0000000000..19aa240177 Binary files /dev/null and b/joss.05205/media/geodata_harvester.jpg differ