This repository contains the companion code and the associated shiny app taxharmonizexplorer
of Grenié et al. (2022, MEE). It contains the associated figures of the paper, the code to generate the ones made from code, as well as the code to run the taxonomic harmonization workflows used in the paper.
This repository accompanies the following article:
Grenié, M., Berti, E., Carvajal-Quintero, J., Dädlow, G.M.L., Sagouis, A. and Winter, M. (2022), Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices. Methods in Ecology and Evolution. Accepted Author Manuscript. https://doi.org/10.1111/2041-210X.13802
Version of the code available in this repository have been archived on Zenodo:
Refer to the above mentioned DOI to get the last up-to-date version or use the following citation:
Matthias Grenié, Alban Sagouis, & Emilio Berti. (2021, July 22). Rekyt/taxo_harmonization: Submitted version (Version submitted). Zenodo. https://doi.org/10.5281/zenodo.5380285.
The shiny app is available at https://mgrenie.shinyapps.io/taxharmonizexplorer/
The development version of the shiny app can be run with the following command in RStudio:
shiny::runGitHub("Rekyt/taxo_harmonization", subdir = "taxharmonizexplorer")
If you have a local copy of the repository you can run the app with:
shiny::runApp("taxharmonizexplorer")
The scrips used to build the networks of tools are available in the scripts
folder. They are numbered from 01
to 03
to be run sequentially. They all used the raw data available in the Excel sheet available in data/data_raw/Table comparing taxonomic tools.xlsx
.
This repository also contains scripts to harmonize the taxonomy of the BioTIME database. We here present four different workflows on the raw taxonomy from BioTIME (available at data/data_raw/biotime.txt
):
- Workflow 1 (Torino): working with higher taxonomic groups (standardize species names through higher taxonomic group assignation).
- Workflow 2 (Bogota): matching the entire list of species on different taxon-specific databases.
- Workflow 3 (GBIF only with pre-processed): pre-process the species names and then match the obtained list against GBIF.
- Wrofklow 4 (GBIF only no pre-processing): match raw species list on GBIF.
Results from cleaning are saved in the folder data/data_cleaned/biotime_results/
as csv files. Two columns are present in the csv, parsed
is the input name as obtained using rgnparser
on BioTIME species names; the second column is the named matched in a specific database.
biotime_common.csv
is the original BioTIME names with the parsed version from rgnparser
and the class and phylum to which they belong obtained using rgbif
. The last column common
refer to how the high taxonomies are referred (e.g. "vascular plants" for "Trachephyta").
You can run the entire workflow thanks to the harmonize.R
script available in the scripts folder.
To run the workflow run the following code:
source("scripts/harmonize.R")