See instructions to open notebooks on Binder/Colab here
This repository contains materials and instructions for participants of the Biodiversity Hackathon. Here we outline the various tools, demos, and resources that can be used to access and use the biological and biogeographical data in OBIS.
There are several options available to download data from OBIS, some of which include:
- R package robis
- Python package pyobis
- Full data exports
- OBIS homepage search or advanced dataset search
- OBIS Mapper
- speciesgrids
The robis R package connects to the OBIS API from R. The package can be installed from CRAN or from GitHub (latest development version).
# install from CRAN
install.packages("robis")
# latest development version
remotes::install_github("iobis/robis")
You can use the package to obtain a list of datasets, a taxon checklist, or raw occurrence data by supplying e.g. a taxon name or WoRMS AphiaID. You can also specify whether to include absence records when obtaining occurrence data. To download this data, simply export R objects with the write.csv function. If we wanted to obtain Mollusc data from OBIS, some options would be:
library(robis)
# obtain occurrence data
moll <- occurrence("Mollusca")
moll_abs <- occurrence(“Mollusca”, absence = "include") # include absence records
write.csv(moll, "mollusca-obis.csv") # save the data to csv
# obtain a list of datasets for a taxon
molldata <- dataset(scientificname = "Mollusca")
#obtain a checklist of Mollusc species in a certain area
mollcheck <- checklist(scientificname = "Mollusca", geometry = "POLYGON ((2.3 51.8, 2.3 51.6, 2.6 51.6, 2.6 51.8, 2.3 51.8))")
You can use robis to obtain all datasets and then filter based on keywords in the title and/or abstract. See example below where we filter to find datasets related to seamounts. Multiple keywords can be provided by using | to separate each word, e.g. "seamount|deepsea|benthos".
search_terms <- "seamount" # define your search terms
datasets <- robis::dataset() # obtain datasets from OBIS
seamount_datasets <- datasets[
grepl(paste(search_terms, collapse = "|"), datasets$title, ignore.case = TRUE) |
grepl(paste(search_terms, collapse = "|"), datasets$abstract, ignore.case = TRUE),]
A full data export of OBIS data is available for download as a Parquet file, here. Note the following:
- These exports do not include measurement data, dropped records, or absence records
- The exported file will be a single, flattened Occurrence table
- The table includes all provided Event and Occurrence data, as well as 68 fields added by the OBIS Quality Control Pipeline, including taxonomic information obtained from WoRMS
From the OBIS homepage, you can search for data in the search bar in the middle of the page. You can search by particular taxonomic groups, common names, dataset names, OBIS nodes, institute name, areas (e.g., Exclusive Economic Zone (EEZ)), or by the data provider’s country. See here for more details.
The OBIS Mapper lets you visualize and filter OBIS data by taxonomy, location, time, and data quality, with options to combine layers and download them as CSV. For more details, see the OBIS manual.
speciesgrids is a Python package to build WoRMS aligned combined OBIS and GBIF species distribution datasets. The resulting dataset is available in a few resolutions on AWS S3. The dataset can be downloaded locally for best performance, or queried directly from the S3 bucket. For more details about downloading and using the dataset, see the speciesgrids README or the notebook.
We have prepared several JupyterHub Notebooks that can be used for reference, see: https://github.com/iobis/hackathon/tree/master/notebooks. The notebooks cover several topics including OBIS data access, data cleaning, environmental information extraction, and data visualization.
You can also access the notebooks through the Binder link.
Binder already have the requirements installed and comes with RStudio, but is slower. For Colab, you need to install the needed packages, but is faster and have other nice features. The easiest way to install the requirements is to add a code cell in the notebook and run this:
For Python
!pip install -r https://raw.githubusercontent.com/iobis/hackathon/refs/heads/master/requirements.txt
For R
source('https://raw.githubusercontent.com/iobis/hackathon/refs/heads/master/requirements-r-colab.txt')
Here is a list of other OBIS-relevant resources:
- Darwin Core term Quick Reference Guide: provides definitions of the DwC terms in datasets obtained from OBIS
- OBIS MapTool: used for generating WKT strings, georeferencing, etc.
- How to use the OBIS MapTool YouTube tutorial
- Wellknown Text (WKT) visualization tool: tool to visualize WKT strings
- YouTube video tutorial on accessing data with OBIS Mapper
- YouTube video tutorial on accessing data with robis