-
Notifications
You must be signed in to change notification settings - Fork 0
datastore
My working datastore of awi data compiled to simulate a pangaea sparql endpoint populated with relevant FRAM data.
for now the datasets are located in the directory: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/temp_other_datasets
conventions
All data files will be named using as simple a name as possible from the pangaea name preferably the first two words becoming lowercase with underscores for example: Inorganic nutrients measured on water bottle samples at AWI HAUSGARTEN during POLARSTERN cruise MSM29.
will be named: inorganic_nutrients
csv files will be converted to triple in .nt format.
supplemental files with annotations will be be of the same name but be in .ttl format to differentiate them from the .nt files which are the triple version of the csv data. Within the supplemental file triples will be added about the csv files to state that they are a data matrix. Expressed as a triple by: file a obo:OBCS_0000120
where a is rdf:type and data matrix is the OBCS class OBCS_0000120, which was recommended here. With the supplemental file another axiom, is about, will link the csv file to an ontobee term. for example: file obo:IAO_0000136 obo:ENVO_00001999 . is about a marine water body.
All .nt and .ttl files will be merged into a single datastore.ttl
file by which to query.
NOTE I will probably need to post compose the annotation for many of the data columns for example Water Depth should be something like: subclass of depth and inheres in a water body.
Example in datastore as inorganic_nutrients.csv
any23 rover -t -p -f ntriples -o inorganic_nutrients.nt inorganic_nutrients.csv
The data is about a: 'marine water body'
The Date Time column's values were changed to match the ones in the physical_oceanography
, for the example in /kblumberg_masters_thesis/testing/test_inorganic_chem
Parameters:
Example in datastore as physical_oceanography.csv
any23 rover -t -p -f ntriples -o physical_oceanography.nt physical_oceanography.csv
The data is about a [marine current flow](part of phytoplankton PR). Pending the release of such class.
For now I will for now use: is about a: 'marine current'
Latitude and longitude metadata was changed to match certain values from inorganic_nutrients
column | single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | temporal instant | ||
Gear Identification Number | ('centrally registered identifier' and 'manufactured product') | ||
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') | ||
Pressure | pressure and ('inheres in' some 'sea water') | ||
Temperature | temperature and ('inheres in' some 'sea water') | ||
Salinity | [salinity] | yes part of my PR | osmolarity and ('inheres in' some ('salt' and ('part of' some 'sea water'))) |
Horizontal Current Velocity | could instead be a new class for horizontal velocity | velocity and ('inheres in' some 'marine current' and 'has quality' some 'horizontal') | |
Current Direction | submited cardinal direction issue on pato tracker | direction | |
East-west Current Velocity | could instead be a new class for velocity in eastern direction | velocity and ('inheres in' some 'marine current') | |
North-south Current Velocity | could instead be a new class for velocity in northern direction | velocity and ('inheres in' some 'marine current') | |
Oxygen | concentration of oxygen in seawater | 'concentration of' and ('inheres in' some (dioxygen and ('part of' some 'sea water'))) |
any23 rover -t -p -f ntriples -o chlorophyll_a.nt chlorophyll_a.csv
The data is about a marine water body.
I could also go with: is about: 'chlorophyll a' and ('part of' some 'marine water body')
column | post compositional annotation: |
---|---|
Event | 'centrally registered identifier' and 'is about' some ('observing process' or 'specimen collection process') |
Date Time | temporal instant |
Latitude | latitude coordinate measurement datum |
Longitude | longitude coordinate measurement datum |
Elevation | (elevation or depth) and ('inheres in' some 'marine water body') |
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') |
Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water'))) |
global_chlorophyll_a.csv
any23 rover -t -p -f ntriples -o global_chlorophyll_a.nt global_chlorophyll_a.csv
see log from 24.11.17,
The data is about a marine water body
I could also go with: is about: 'chlorophyll a' and ('part of' some 'marine water body')
column | post compositional annotation: |
---|---|
Ordinal Number | 'categorical label' and 'is about' some specimen |
Date Time | temporal instant |
Latitude | latitude coordinate measurement datum |
Longitude | longitude coordinate measurement datum |
Water Depth | (elevation or depth) and ('inheres in' some 'marine water body') |
Total Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some 'sea water'))) |
Diatom Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacillariophyta))) |
Haptophyte Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Coccolithales))) |
Prokaryote Chlorophyll A | 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Bacteria))) |
The paper mentions that the cyanobacterial chlorophyll a fraction is actually all prokaryotes. Thus the column Prok Chl A
is not 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some Cyanobacteria)))
any23 rover -t -p -f ntriples -o biogenic_particle_flux.nt biogenic_particle_flux.csv
need to add lat and long to annotation file
data is about 'marine particle sinking process' //part of plankton ecology PR
For now I will go with is about: 'material transport process' and ('has input' some 'marine snow')
any23 rover -t -p -f ntriples -o snow_height.nt snow_height.csv
link to awi epic for the data
data is about: 'first year ice'
column | single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | 'temporal instant' | ||
Latitude | 'latitude coordinate measurement datum' | ||
Longitude | 'longitude coordinate measurement datum' | ||
Snow Height Sensor 1 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 2 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 3 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Sensor 4 | snow thickness | Part of MIxS PR | thickness and ('inheres in' some 'snow') |
Snow Height Mean | snow thickness | Part of MIxS PR | 'expected value' and 'is about' min 2 ('data item' and 'is about' some (thickness and 'inheres in' some 'snow')) |
Atmospheric Pressure | atmospheric pressure | Part of MIxS PR | pressure and ('inheres in' some atmosphere) |
Air Temperature | temperature of air | ||
Ice Temperature | temperature and ('inheres in' some 'sea ice') |
From data collection: Influence of snow depth and surface flooding on light transmission through Antarctic pack ice, supplementary data and paper
any23 rover -t -p -f ntriples -o influence_snow_depth.nt influence_snow_depth.csv
column | Single term annotation | need to create class | post compositional annotation: |
---|---|---|---|
Date Time | temporal instant | ||
Latitude | latitude coordinate measurement datum | ||
Longitude | longitude coordinate measurement datum | ||
Relative Distance X | distance and ('inheres in' some 'sea ice') | ||
Relative Distance Y | distance and ('inheres in' some 'sea ice') | ||
Sea Ice Thickness | sea ice thickness | yes part of cryoMIxS PR | thickness and ('inheres in' some 'sea ice') |
Signal Strength | 'degree of illumination' and ('inheres in' some 'sea ice') |
the signal strength is about under-ice optical measurements, also described as Under-ice solar radiation, downwelling planar under-ice spectral irradiance (320–950 nm). So I'm pretty sure this column is about PATO:degree of illumination
We could try to make a class referring to a sub-sea ice environment, which would probably link to the class environment determined by seawater beneath sea ice which is under consideration in the cryoMIxS PR.
The data is about Antarctic pack ice
from the wiki on arctic ice pack it says The Arctic ice pack is the ice cover of the Arctic Ocean and its vicinity.
So I can probably use the class with synonym sea ice cover which is under consideration in the cryoMIxS PR.
Or this data could be about an ice sheet, a class whose definition could use some cleaning up.
Pending the release of the cryoMIxS PR I can use the following post-composition for now:
'physical quality' and ('inheres in' some ('marine water body' and ('adjacent to' some 'sea ice')))
Comparing Springtime Ice-Algal Chlorophyll a and Physical Properties of Multi-Year and First-Year Sea Ice from the Lincoln Sea, Lange et al 2015 collection of datasets here link to paper here
any23 rover -t -p -f ntriples -o ice_algal_chlorophyll.nt ice_algal_chlorophyll.csv
Data is about: ('chlorophyll a' and 'sea ice') * I will have to add a case for this to the query script.
//the column Depth ice/snow [m] is probably about snow depth or ice depth. I'll have to figure out to to post compose an annotation using or. Maybe with an equivalence class see /home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/working/envo_rdf.owl
could also make classes for the sea ice texture class mentioned in the paper in table 2.
For the column: Areal Chlorophyll A
: I will have to write another clause to the or statement of my query function.
Data provided by Jose Rapp. Run using the metagenomicNGS assembly and annotation pipeline available from: here I don't have access to view it, but I have the script saved /kblumberg_masters_thesis/working/genomic_workflow
column | post compositional annotation: |
---|---|
Molecular Function | 'data about an ontology part' 'is about' some 'molecular_function |
Molecular Function Count | 'discrete random variable' and 'is about' some 'data about an ontology part' 'is about' some 'molecular_function |
Cellular Components | |
Cellular Components Count | |
Biological Process | |
Biological Process Count |
Data provided by Jose Rapp. Run using the metagenomicNGS assembly and annotation pipeline available from: here I don't have access to view it, but I have the script saved /kblumberg_masters_thesis/working/genomic_workflow
column | post compositional annotation: |
---|