Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

datastore

Kai Blumberg edited this page Feb 15, 2018 · 146 revisions

My working datastore of awi data compiled to simulate a pangaea sparql endpoint populated with relevant FRAM data.

for now the datasets are located in the directory: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/temp_other_datasets

conventions

All data files will be named using as simple a name as possible from the pangaea name preferably the first two words becoming lowercase with underscores for example: Inorganic nutrients measured on water bottle samples at AWI HAUSGARTEN during POLARSTERN cruise MSM29. will be named: inorganic_nutrients

csv files will be converted to triple in .nt format.

supplemental files with annotations will be be of the same name but be in .ttl format to differentiate them from the .nt files which are the triple version of the csv data. Within the supplemental file triples will be added about the csv files to state that they are a data matrix. Expressed as a triple by: file a obo:OBCS_0000120 where a is rdf:type and data matrix is the OBCS class OBCS_0000120, which was recommended here. With the supplemental file another axiom, is about, will link the csv file to an ontobee term. for example: file obo:IAO_0000136 obo:ENVO_00001999 . is about a marine water body.

All .nt and .ttl files will be merged into a single datastore.ttl file by which to query.

NOTE I will probably need to post compose the annotation for many of the data columns for example Water Depth should be something like: subclass of depth and inheres in a water body.

Table of Contents:

inorganic_nutrients

physical_oceanography

chlorophyll_a

global_chlorophyll_a

biogenic_particle_flux

snow_height

influence_snow_depth

ice_algal_chlorophyll

genomic_data

Example in datastore as inorganic_nutrients.csv

any23 rover -t -p -f ntriples -o inorganic_nutrients.nt inorganic_nutrients.csv

The data is about a: 'marine water body'

The Date Time column's values were changed to match the ones in the physical_oceanography, for the example in /kblumberg_masters_thesis/testing/test_inorganic_chem

Parameters:

column pending term post compositional annotation:
Event 'centrally registered identifier' and 'is about' some ('observing process' or 'specimen collection process')
Date Time temporal instant
Latitude latitude coordinate measurement datum
Longitude longitude coordinate measurement datum
Elevation (elevation or depth) and ('inheres in' some 'marine water body')
Water Depth (elevation or depth) and ('inheres in' some 'marine water body')
Nitrate concentration of nitrate in seawater 'concentration of' and ('inheres in' some (nitrate and ('part of' some 'sea water')))
Nitrite concentration of nitrite in seawater 'concentration of' and ('inheres in' some (nitrite and ('part of' some 'sea water')))
Silicate concentration of silicate in seawater 'concentration of' and ('inheres in' some ('silicate(4-)' and ('part of' some 'sea water')))
Phosphate concentration of phosphate in seawater 'concentration of' and ('inheres in' some (phosphate and ('part of' some 'sea water')))
Ammonium concentration of ammonium in seawater 'concentration of' and ('inheres in' some (ammonium and ('part of' some 'sea water')))

Example in datastore as physical_oceanography.csv

any23 rover -t -p -f ntriples -o physical_oceanography.nt physical_oceanography.csv

The data is about a [marine current flow](part of phytoplankton PR). Pending the release of such class.

For now I will for now use: is about a: 'marine current'

Latitude and longitude metadata was changed to match certain values from inorganic_nutrients

column single term annotation need to create class post compositional annotation:
Date Time temporal instant
Gear Identification Number ('centrally registered identifier' and 'manufactured product')
Water Depth (elevation or depth) and ('inheres in' some 'marine water body')
Pressure pressure and ('inheres in' some 'sea water')
Temperature temperature and ('inheres in' some 'sea water')
Salinity [salinity] yes part of my PR osmolarity and ('inheres in' some ('salt' and ('part of' some 'sea water')))
Horizontal Current Velocity could instead be a new class for horizontal velocity velocity and ('inheres in' some 'marine current' and 'has quality' some 'horizontal')
Current Direction submited cardinal direction issue on pato tracker direction
East-west Current Velocity could instead be a new class for velocity in eastern direction velocity and ('inheres in' some 'marine current')
North-south Current Velocity could instead be a new class for velocity in northern direction velocity and ('inheres in' some 'marine current')
Oxygen concentration of oxygen in seawater 'concentration of' and ('inheres in' some (dioxygen and ('part of' some 'sea water')))

The data set is about: 'chlorophyll a' and ('part of' some 'marine water body')

column post compositional annotation:
Event 'centrally registered identifier' and 'is about' some ('observing process' or 'specimen collection process')
Date Time temporal instant
Latitude latitude coordinate measurement datum
Longitude longitude coordinate measurement datum
Elevation (elevation or depth) and ('inheres in' some 'marine water body')
Water Depth (elevation or depth) and ('inheres in' some 'marine water body')
Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water')))

global_chlorophyll_a.csv

any23 rover -t -p -f ntriples -o global_chlorophyll_a.nt global_chlorophyll_a.csv

see log from 24.11.17,

from paper Synergistic Exploitation of Hyper- and Multi-Spectral Precursor Sentinel Measurements to Determine Phytoplankton Functional Types (SynSenPFT)

The data is about a marine water body

I could also go with: is about: 'chlorophyll a' and ('part of' some 'marine water body')

column post compositional annotation:
Ordinal Number 'categorical label' and 'is about' some specimen
Date Time temporal instant
Latitude latitude coordinate measurement datum
Longitude longitude coordinate measurement datum
Water Depth (elevation or depth) and ('inheres in' some 'marine water body')
Total Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water')))
Diatom Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacillariophyta)))
Haptophyte Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Coccolithales)))
Prokaryote Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacteria)))
Database Cross Reference hasDbXref

The paper mentions that the cyanobacterial chlorophyll a fraction is actually all prokaryotes. Thus the column Prok Chl A is not 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Cyanobacteria)))

any23 rover -t -p -f ntriples -o biogenic_particle_flux.nt biogenic_particle_flux.csv

need to add lat and long to annotation file

data is about 'marine particle sinking process' //part of plankton ecology PR

For now I will go with is about: 'material transport process' and ('has input' some 'marine snow')

column post compositional annotation:
Water Depth (elevation or depth) and ('inheres in' some 'marine water body')
Date Time 'zero-dimensional temporal region'
Date Time End 'zero-dimensional temporal region'
Duration 'one-dimensional temporal region'
Sample Label 'categorical label' and 'is about' some specimen
Seston Flux flux and ('inheres in' some 'marine snow')
Calcium Carbonate Flux flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('calcium carbonate' and 'part of' some 'organic molecular entity')))
Particulate Organic Carbon Flux flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('carbon atom' and 'part of' some 'organic molecular entity')))
Particulate Organic Nitrogen Flux flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('nitrogen atom' and 'part of' some 'organic molecular entity')))
Particulate Silicon Flux flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('silicon atom' and 'part of' some 'organic molecular entity')))
any23 rover -t -p -f ntriples -o snow_height.nt snow_height.csv

link to awi epic for the data

data is about: 'first year ice'

column single term annotation need to create class post compositional annotation:
Date Time 'temporal instant'
Latitude 'latitude coordinate measurement datum'
Longitude 'longitude coordinate measurement datum'
Snow Height Sensor 1 snow thickness Part of MIxS PR thickness and ('inheres in' some 'snow')
Snow Height Sensor 2 snow thickness Part of MIxS PR thickness and ('inheres in' some 'snow')
Snow Height Sensor 3 snow thickness Part of MIxS PR thickness and ('inheres in' some 'snow')
Snow Height Sensor 4 snow thickness Part of MIxS PR thickness and ('inheres in' some 'snow')
Snow Height Mean snow thickness Part of MIxS PR 'expected value' and 'is about' min 2 ('data item' and 'is about' some (thickness and 'inheres in' some 'snow'))
Atmospheric Pressure atmospheric pressure Part of MIxS PR pressure and ('inheres in' some atmosphere)
Air Temperature temperature of air
Ice Temperature temperature and ('inheres in' some 'sea ice')

From data collection: Influence of snow depth and surface flooding on light transmission through Antarctic pack ice, supplementary data and paper

any23 rover -t -p -f ntriples -o influence_snow_depth.nt influence_snow_depth.csv
column Single term annotation need to create class post compositional annotation:
Date Time temporal instant
Latitude latitude coordinate measurement datum
Longitude longitude coordinate measurement datum
Relative Distance X distance and ('inheres in' some 'sea ice')
Relative Distance Y distance and ('inheres in' some 'sea ice')
Sea Ice Thickness sea ice thickness yes part of cryoMIxS PR thickness and ('inheres in' some 'sea ice')
Signal Strength 'degree of illumination' and ('inheres in' some 'sea ice')
Database Cross Reference hasDbXref

the signal strength is about under-ice optical measurements, also described as Under-ice solar radiation, downwelling planar under-ice spectral irradiance (320–950 nm). So I'm pretty sure this column is about PATO:degree of illumination

We could try to make a class referring to a sub-sea ice environment, which would probably link to the class environment determined by seawater beneath sea ice which is under consideration in the cryoMIxS PR.

The data is about Antarctic pack ice

from the wiki on arctic ice pack it says The Arctic ice pack is the ice cover of the Arctic Ocean and its vicinity. So I can probably use the class with synonym sea ice cover which is under consideration in the cryoMIxS PR.

Or this data could be about an ice sheet, a class whose definition could use some cleaning up.

Pending the release of the cryoMIxS PR I can use the following post-composition for now:

'physical quality' and ('inheres in' some ('marine water body' and ('adjacent to' some 'sea ice')))

ice_algal_chlorophyll

Comparing Springtime Ice-Algal Chlorophyll a and Physical Properties of Multi-Year and First-Year Sea Ice from the Lincoln Sea, Lange et al 2015 collection of datasets here link to paper here

I'll take two datasets out of this collection one about multi year ice and one about first year ice, so I can enhance my competency question to be data about subclasses of sea ice

Old idea could also make classes for the sea ice texture class mentioned in the paper in table 2. probaly won't get to this.

For the column: Areal Chlorophyll A: we intended to use the term: water-based planetary surface however it isn't on ontobee, hence I guess it hasn't been released yet, so I will instead use 'liquid planetary surface'

Data is about: ('chlorophyll a' and 'multiyear ice')

column post compositional annotation:
Identification 'categorical label' and 'is about' some specimen
Site site
Sea Ice Type 'categorical label' and 'is about' some 'multiyear ice'
Ice Or Snow Depth depth and ('inheres in' some ('multiyear ice' or 'snow'))
Minimum Ice Or Snow Depth depth and ('inheres in' some ('multiyear ice' or 'snow'))
Maximum Ice Or Snow Depth depth and ('inheres in' some ('multiyear ice' or 'snow'))
Chlorophyll A Concentration 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water')))
Areal Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some ('sea water' and 'part of' some 'liquid planetary surface'))))
Salinity osmolarity and ('inheres in' some ('salt' and ('part of' some meltwater)))
Ice Or Snow Temperature temperature and ('inheres in' some ('multiyear ice' or 'snow'))
Brine Volume 'volume' and ('inheres in' some 'brine' )
Texture 'morphology' and ('inheres in' some 'multiyear ice')
Sea Ice Type Portion 'fiat object part' and 'part of' some 'multiyear ice'

Data is about: ('chlorophyll a' and 'first year ice')

column post compositional annotation:
Identification 'categorical label' and 'is about' some specimen
Site site
Sea Ice Type 'categorical label' and 'is about' some 'first year ice'
Ice Or Snow Depth depth and ('inheres in' some ('first year ice' or 'snow'))
Minimum Ice Or Snow Depth depth and ('inheres in' some ('first year ice' or 'snow'))
Maximum Ice Or Snow Depth depth and ('inheres in' some ('first year ice' or 'snow'))
Chlorophyll A Concentration 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water')))
Areal Chlorophyll A 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some ('sea water' and 'part of' some 'liquid planetary surface'))))
Salinity osmolarity and ('inheres in' some ('salt' and ('part of' some meltwater)))
Texture 'morphology' and ('inheres in' some 'first year ice')
Sea Ice Type Portion 'fiat object part' and 'part of' some 'first year ice'

genomic_data

Abyssal and Bathyal metagenomic data provided by Jose Rapp. Run using the metagenomicNGS assembly and annotation pipeline available from: here I don't have access to view it, but I have the script saved /kblumberg_masters_thesis/working/genomic_workflow

Samples 1 and 2 collected from depths of 1244m and 2403m which best correspond to 'marine bathyal zone biome'

Samples 3 and 4 collected from depths of 3531m and 5525m which best correspond to 'marine abyssal zone biome'

Neritic transcriptomic data provided by Dr. David Probandt, from from sandy sediment of a marine neritic benthic zone biome of about 8 m depth. Use the first 4 samples: labeled X1, X2, X3, X4.

molecular_function_bathyal

data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))

column post compositional annotation:
Molecular Function 'categorical label' and 'is about' some 'molecular_function'
Sample 1 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))
Sample 2 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))

molecular_function_abyssal

data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))

column post compositional annotation:
Molecular Function 'categorical label' and 'is about' some 'molecular_function'
Sample 1 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))
Sample 2 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))

molecular_function_neritic

data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))

column post compositional annotation:
Molecular Function 'categorical label' and 'is about' some 'molecular_function'
Sample 1 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))
Sample 2 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))
Sample 3 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))
Sample 4 Molecular Function Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function')))

cellular_components_bathyal

data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))

column post compositional annotation:
Cellular Components 'categorical label' and 'is about' some 'cellular_component'
Sample 1 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))
Sample 2 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))

cellular_components_abyssal

data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))

column post compositional annotation:
Cellular Components 'categorical label' and 'is about' some 'cellular_component'
Sample 1 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))
Sample 2 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))

cellular_components_neritic

data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))

column post compositional annotation:
Cellular Components 'categorical label' and 'is about' some 'cellular_component'
Sample 1 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))
Sample 2 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))
Sample 3 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))
Sample 4 Cellular Components Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component')))

biological_process_bathyal

data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))

column post compositional annotation:
Biological Process 'categorical label' and 'is about' some 'biological_process'
Sample 1 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))
Sample 2 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))

biological_process_abyssal

data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))

column post compositional annotation:
Biological Process 'categorical label' and 'is about' some 'biological_process'
Sample 1 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))
Sample 2 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))

biological_process_neritic

data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))

column post compositional annotation:
Biological Process 'categorical label' and 'is about' some 'biological_process'
Sample 1 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))
Sample 2 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))
Sample 3 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))
Sample 4 Biological Process Count 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process')))

microbial_taxonomy_bathyal

*in the annotation have classes be an 'operational taxonomic unit matrix' (or part thereof) instead of a 'data matrix'. Make sure to add query cases for these in the query script.

data is about some: 'community species diversity' and ('inheres in' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))

column post compositional annotation:
Sample 1 Operational Taxonomic Unit Count 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community')))
Sample 2 Operational Taxonomic Unit Count 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community')))
Domain (Bacteria or Archaea or Eukaryota)
Phylum phylum
Class class
Order order
Family family
Genus genus

microbial_taxonomy_abyssal

*in the annotation have classes be an 'operational taxonomic unit matrix' (or part thereof) instead of a 'data matrix'. Make sure to add query cases for these in the query script.

data is about some: 'community species diversity' and ('inheres in' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))

column post compositional annotation:
Sample 1 Operational Taxonomic Unit Count 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community')))
Sample 2 Operational Taxonomic Unit Count 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community')))
Domain (Bacteria or Archaea or Eukaryota)
Phylum phylum
Class class
Order order
Family family
Genus genus