-
Notifications
You must be signed in to change notification settings - Fork 0
datastore
My working datastore of awi data compiled to simulate a pangaea sparql endpoint populated with relevant FRAM data.
for now the datasets are located in the directory: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/temp_other_datasets
conventions
All data files will be named using as simple a name as possible from the pangaea name preferably the first two words becoming lowercase with underscores for example: Inorganic nutrients measured on water bottle samples at AWI HAUSGARTEN during POLARSTERN cruise MSM29.
will be named: inorganic_nutrients
csv files will be converted to triple in .nt format.
supplemental files with annotations will be be of the same name but be in .ttl format to differentiate them from the .nt files which are the triple version of the csv data. Within the supplemental file triples will be added about the csv files to state that they are a data matrix. Expressed as a triple by: file a obo:OBCS_0000120
where a is rdf:type and data matrix is the OBCS class OBCS_0000120, which was recommended here. With the supplemental file another axiom, is about, will link the csv file to an ontobee term. for example: file obo:IAO_0000136 obo:ENVO_00001999 . is about a marine water body.
All .nt and .ttl files will be merged into a single datastore.ttl
file by which to query.
NOTE I will probably need to post compose the annotation for many of the data columns for example Water Depth should be something like: subclass of depth and inheres in a water body.
Example in datastore as inorganic_nutrients.csv
any23 rover -t -p -f ntriples -o inorganic_nutrients.nt inorganic_nutrients.csv
The data is about a: 'marine water body'
The Date Time column's values were changed to match the ones in the physical_oceanography
, for the example in /kblumberg_masters_thesis/testing/test_inorganic_chem
Parameters:
Example in datastore as physical_oceanography.csv
any23 rover -t -p -f ntriples -o physical_oceanography.nt physical_oceanography.csv
The data is about a [marine current flow](part of phytoplankton PR). Pending the release of such class.
For now I will for now use: is about a: 'marine current'
Latitude and longitude metadata was changed to match certain values from inorganic_nutrients
column | single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | temporal instant | ||
Gear Identification Number | ('centrally registered identifier' and 'manufactured product') | ||
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') | ||
Pressure | pressure and ('inheres in' some 'sea water') | ||
Temperature | temperature and ('inheres in' some 'sea water') | ||
Salinity | [salinity] | yes part of my PR | osmolarity and ('inheres in' some ('salt' and ('part of' some 'sea water'))) |
Horizontal Current Velocity | could instead be a new class for horizontal velocity | velocity and ('inheres in' some 'marine current' and 'has quality' some 'horizontal') | |
Current Direction | submited cardinal direction issue on pato tracker | direction | |
East-west Current Velocity | could instead be a new class for velocity in eastern direction | velocity and ('inheres in' some 'marine current') | |
North-south Current Velocity | could instead be a new class for velocity in northern direction | velocity and ('inheres in' some 'marine current') | |
Oxygen | concentration of oxygen in seawater | 'concentration of' and ('inheres in' some (dioxygen and ('part of' some 'sea water'))) |
The data set is about: 'chlorophyll a' and ('part of' some 'marine water body')
column | post compositional annotation: |
---|---|
Event | 'centrally registered identifier' and 'is about' some ('observing process' or 'specimen collection process') |
Date Time | temporal instant |
Latitude | latitude coordinate measurement datum |
Longitude | longitude coordinate measurement datum |
Elevation | (elevation or depth) and ('inheres in' some 'marine water body') |
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') |
Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water'))) |
global_chlorophyll_a.csv
any23 rover -t -p -f ntriples -o global_chlorophyll_a.nt global_chlorophyll_a.csv
see log from 24.11.17,
The data is about a marine water body
I could also go with: is about: 'chlorophyll a' and ('part of' some 'marine water body')
column | post compositional annotation: |
---|---|
Ordinal Number | 'categorical label' and 'is about' some specimen |
Date Time | temporal instant |
Latitude | latitude coordinate measurement datum |
Longitude | longitude coordinate measurement datum |
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') |
Total Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water'))) |
Diatom Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacillariophyta))) |
Haptophyte Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Coccolithales))) |
Prokaryote Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacteria))) |
Database Cross Reference | hasDbXref |
The paper mentions that the cyanobacterial chlorophyll a fraction is actually all prokaryotes. Thus the column Prok Chl A
is not 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Cyanobacteria)))
any23 rover -t -p -f ntriples -o biogenic_particle_flux.nt biogenic_particle_flux.csv
need to add lat and long to annotation file
data is about 'marine particle sinking process' //part of plankton ecology PR
For now I will go with is about: 'material transport process' and ('has input' some 'marine snow')
any23 rover -t -p -f ntriples -o snow_height.nt snow_height.csv
link to awi epic for the data
data is about: 'first year ice'
column | single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | 'temporal instant' | ||
Latitude | 'latitude coordinate measurement datum' | ||
Longitude | 'longitude coordinate measurement datum' | ||
Snow Height Sensor 1 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 2 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 3 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 4 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Mean | snow thickness | Part of MIxS PR | 'expected value' and 'is about' min 2 ('data item' and 'is about' some (thickness and 'inheres in' some 'snow')) |
Atmospheric Pressure | atmospheric pressure | Part of MIxS PR | pressure and ('inheres in' some atmosphere) |
Air Temperature | temperature of air | ||
Ice Temperature | temperature and ('inheres in' some 'sea ice') |
From data collection: Influence of snow depth and surface flooding on light transmission through Antarctic pack ice, supplementary data and paper
any23 rover -t -p -f ntriples -o influence_snow_depth.nt influence_snow_depth.csv
column | Single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | temporal instant | ||
Latitude | latitude coordinate measurement datum | ||
Longitude | longitude coordinate measurement datum | ||
Relative Distance X | distance and ('inheres in' some 'sea ice') | ||
Relative Distance Y | distance and ('inheres in' some 'sea ice') | ||
Sea Ice Thickness | sea ice thickness | yes part of cryoMIxS PR | thickness and ('inheres in' some 'sea ice') |
Signal Strength | 'degree of illumination' and ('inheres in' some 'sea ice') | ||
Database Cross Reference | hasDbXref |
the signal strength is about under-ice optical measurements, also described as Under-ice solar radiation, downwelling planar under-ice spectral irradiance (320–950 nm). So I'm pretty sure this column is about PATO:degree of illumination
We could try to make a class referring to a sub-sea ice environment, which would probably link to the class environment determined by seawater beneath sea ice which is under consideration in the cryoMIxS PR.
The data is about Antarctic pack ice
from the wiki on arctic ice pack it says The Arctic ice pack is the ice cover of the Arctic Ocean and its vicinity.
So I can probably use the class with synonym sea ice cover which is under consideration in the cryoMIxS PR.
Or this data could be about an ice sheet, a class whose definition could use some cleaning up.
Pending the release of the cryoMIxS PR I can use the following post-composition for now:
'physical quality' and ('inheres in' some ('marine water body' and ('adjacent to' some 'sea ice')))
Comparing Springtime Ice-Algal Chlorophyll a and Physical Properties of Multi-Year and First-Year Sea Ice from the Lincoln Sea, Lange et al 2015 collection of datasets here link to paper here
I'll take two datasets out of this collection one about multi year ice and one about first year ice, so I can enhance my competency question to be data about subclasses of sea ice
Old idea could also make classes for the sea ice texture class mentioned in the paper in table 2. probaly won't get to this.
For the column: Areal Chlorophyll A
: we intended to use the term: water-based planetary surface however it isn't on ontobee, hence I guess it hasn't been released yet, so I will instead use 'liquid planetary surface'
Data is about: ('chlorophyll a' and 'multiyear ice')
Data is about: ('chlorophyll a' and 'first year ice')
column | post compositional annotation: |
---|---|
Identification | 'categorical label' and 'is about' some specimen |
Site | site |
Sea Ice Type | 'categorical label' and 'is about' some 'first year ice' |
Ice Or Snow Depth | depth and ('inheres in' some ('first year ice' or 'snow')) |
Minimum Ice Or Snow Depth | depth and ('inheres in' some ('first year ice' or 'snow')) |
Maximum Ice Or Snow Depth | depth and ('inheres in' some ('first year ice' or 'snow')) |
Chlorophyll A Concentration | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water'))) |
Areal Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some ('sea water' and 'part of' some 'liquid planetary surface')))) |
Salinity | osmolarity and ('inheres in' some ('salt' and ('part of' some meltwater))) |
Texture | 'morphology' and ('inheres in' some 'first year ice') |
Sea Ice Type Portion | 'fiat object part' and 'part of' some 'first year ice' |
Abyssal and Bathyal metagenomic data provided by Jose Rapp. Run using the metagenomicNGS assembly and annotation pipeline available from: here I don't have access to view it, but I have the script saved /kblumberg_masters_thesis/working/genomic_workflow
Samples 1 and 2 collected from depths of 1244m and 2403m which best correspond to 'marine bathyal zone biome'
Samples 3 and 4 collected from depths of 3531m and 5525m which best correspond to 'marine abyssal zone biome'
Neritic transcriptomic data provided by Dr. David Probandt, from from sandy sediment of a marine neritic benthic zone biome of about 8 m depth. Use the first 4 samples: labeled X1, X2, X3, X4.
data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))
column | post compositional annotation: |
---|---|
Molecular Function | 'categorical label' and 'is about' some 'molecular_function' |
Sample 1 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
Sample 2 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))
column | post compositional annotation: |
---|---|
Molecular Function | 'categorical label' and 'is about' some 'molecular_function' |
Sample 1 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
Sample 2 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
data is about some: 'molecular_function' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))
column | post compositional annotation: |
---|---|
Molecular Function | 'categorical label' and 'is about' some 'molecular_function' |
Sample 1 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
Sample 2 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
Sample 3 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
Sample 4 Molecular Function Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'molecular_function'))) |
data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))
column | post compositional annotation: |
---|---|
Cellular Components | 'categorical label' and 'is about' some 'cellular_component' |
Sample 1 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
Sample 2 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))
column | post compositional annotation: |
---|---|
Cellular Components | 'categorical label' and 'is about' some 'cellular_component' |
Sample 1 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
Sample 2 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
data is about some: 'cellular_component' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))
column | post compositional annotation: |
---|---|
Cellular Components | 'categorical label' and 'is about' some 'cellular_component' |
Sample 1 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
Sample 2 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
Sample 3 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
Sample 4 Cellular Components Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'cellular_component'))) |
data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))
column | post compositional annotation: |
---|---|
Biological Process | 'categorical label' and 'is about' some 'biological_process' |
Sample 1 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
Sample 2 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))
column | post compositional annotation: |
---|---|
Biological Process | 'categorical label' and 'is about' some 'biological_process' |
Sample 1 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
Sample 2 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
data is about some: 'biological_process' and ('part of' some ('microbial community' and ('part of' some ('sandy sediment' and ('part of' some 'marine neritic benthic zone biome')))))
column | post compositional annotation: |
---|---|
Biological Process | 'categorical label' and 'is about' some 'biological_process' |
Sample 1 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
Sample 2 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
Sample 3 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
Sample 4 Biological Process Count | 'discrete random variable' and ('is about' some ('categorical label' and ('is about' some 'biological_process'))) |
*in the annotation have classes be an 'operational taxonomic unit matrix' (or part thereof) instead of a 'data matrix'. Make sure to add query cases for these in the query script.
data is about some: 'community species diversity' and ('inheres in' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine bathyal zone biome')))))
column | post compositional annotation: |
---|---|
Sample 1 Operational Taxonomic Unit Count | 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community'))) |
Sample 2 Operational Taxonomic Unit Count | 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community'))) |
Domain | (Bacteria or Archaea or Eukaryota) |
Phylum | phylum |
Class | class |
Order | order |
Family | family |
Genus | genus |
*in the annotation have classes be an 'operational taxonomic unit matrix' (or part thereof) instead of a 'data matrix'. Make sure to add query cases for these in the query script.
data is about some: 'community species diversity' and ('inheres in' some ('microbial community' and ('part of' some ('deep marine sediment' and ('part of' some 'marine abyssal zone biome')))))
column | post compositional annotation: |
---|---|
Sample 1 Operational Taxonomic Unit Count | 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community'))) |
Sample 2 Operational Taxonomic Unit Count | 'discrete random variable' and ('is about' some ('community species diversity' and ('inheres in' some 'microbial community'))) |
Domain | (Bacteria or Archaea or Eukaryota) |
Phylum | phylum |
Class | class |
Order | order |
Family | family |
Genus | genus |