Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.
Kai Blumberg edited this page Jan 25, 2018 · 462 revisions

Table of contents

week 1: 19.09.17, 20.09.17, 21.09.17, 22.09.17

week 2: 25.09.17, 26.09.17, 27.09.17, 28.09.17

week 3: 03.10.17, 04.10.17, 05.10.17, 06.10.17

week 4: 09.10.17, 10.10.17, 11.10.17, 12.10.17, 13.10.17, 14.10.17

week 5: 16.10.17, 17.10.17, 18.10.17, 19.10.17, 20.10.17, 21.10.17

week 6: 23.10.17, 24.10.17, 25.10.17, 26.10.17, 27.10.17,

week 7: 30.10.17, 31.10.17, 02.11.17, 03.11.17,

week 8: 06.11.17, 07.11.17, 08.11.17, 09.11.17, 10.11.17,

week 9: 13.11.17, 14.11.17, 15.11.17, 16.11.17, 17.11.17,

week 10: 20.11.17, 21.11.17, 22.11.17, 23.11.17, 24.11.17,

week 11: 27.11.17, 28.11.17, 30.11.17, 01.12.17,

week 12: 04.12.17, 05.12.17, 06.12.17, 07.12.17, 08.12.17, 09.12.17, 10.12.17,

week 13: 11.12.17, 12.12.17, 13.12.17, 14.12.17,

week 14: 18.12.17, 19.12.17, 20.12.17, 21.12.17,

Holiday break

19.09.17

Notes from 18.09.17 meeting with Pier:

I need to create a citable statements wiki page done here

CMECS: a NOAA project: Coastal and Marine Ecological Classification Standard (CMECS) Document which will colaborate with Miriam linking marine protected areas to SDGIO and ENVO

Ruth Duerr: Research Scholar at Ronin Institute works on NSIDC can coordinate with sea ice ontology work

YAMZ metadictionary holding terms and definitions which is currently being upgraded, we should try to get awi people to submit basic definitions and citations, so that it can easier harvested into ENVO.

PACES 2 awi research framework. 5 year initiative to observe analyse model and predict impacts to earth system to be used as a basis for address the social and economic impacts of climate change. I want this to be a major them of my research so I need to narrow this down. I could do something along the lines of centering a test semantic web around a specific or some specific ENVO process, and see if there is any data sampled during such a process which can be linked to. Perhaps for the test case we don't necessarily have enough FAIR style data uploaded to a repository concerning such Arctic related processes, in order to use a semantic knowledge graph to find something as an interesting test case.

This ties into another of my goals which is to interweave such a project with UN Sustainable development goal 14 concerning ocean sustainability

Working to come up with my thesis hypothesis

We need to establish some competency questions such as the following:

What metabolic functions can be assigned with environment X?

What data do we have that can access the the natural capital of sea ice?

What ecosystem services can be performed by sea ice?

(Albedo, biodeversity etc)

What can be bioprospected for in sea ice?

What ecosystem services are provided by the deep sea?

Can we demonstrate a test case for the use of semantic research (implemented around UN sustainable development goal 14), using AWI data on a specific type of environment to assess the natural capital of such environment as well as answer unasked questions concerning such environment.

Thus the Goal is to find some specific examples of data and semantics which could serve as a test case either with real data (perhaps from some Habitat group members Josie or Eddie) or with dummy data. This could potentially be any ecosystem which is being researched by the Habitat group or AWI such as sea ice, marine plastic polluted environments or deep sea environments which ever serves as the better test case. Perhaps the cruise Josie's data comes from perhaps there are other type of scientific data such as sea ice physicists who also collected information at that time.

Investigating Josephine Rapp's research projects she has been RV Po­larstern cruises: PS99.2 PS93.2 PS85. following the 2016 crusie on awi ePIC leads me PS99 Expedition Porgramme

2016 cruise

From the PS99 report: Where there are many potential options. Josie Eddie and Pierre Offre and Morten Iversen were all aboard. Josie and Eddie participating in section 9. Assessment of archaeal and bacterial life and matter cycling in the arctic water column and deep-sea surface sediments

sections 4 and 10 are about marine litter which could possibly tie into some of the marine plastics entries added recently to ENVO. 4's data should be uploaded to PANGAEA with unrestricted access, so perhaps if there is some genomic material coincides with stations plastic was sampled in it could be a possibility.

section 5 HAUSGARTEN: IMPACT OF CLIMATE CHANGE ON ARCTIC MARINE ECOSYSTEMS, has a lot of cool and relevant linkable data but accessing it sounds semi political it may not be super FAIR data.

Section 6 pairing satelite ice data with the under ice observational data of the Autonomous Underwater Vehicle (AUV) “PAUL” could be cool but the data may not be accessible and the links here are perhaps already under investigation.

section 7 PLANKTON ECOLOGY AND BIOGEOCHEMISTRY IN THE CHANGING ARTCIC OCEAN (PEBCAO GROUP), has alot of biogeochem data which may correlate with section 9 but may not be available to Pangaea till they publish.

2015

PS93 report: I don't see any of the MPI associated people and the research themes don't seem to be as relevant perhaps I will narrow down to either 2014 or 2016

2014 cruise

Using older data from the PS85 the 2014 cruise may also be wise as people will have had more time to upload their data to PANGAEA possibly facilitating the exercise. According to the record she was on the SP84 cruise section 1.3 BENTHIC MICROBIOLOGY. 1.3 is assessing links between the pelagic and benthic environments in terms of bacterial diversity perhaps try pairing this with section 1.2 IMPACT OF CLIMATE CHANGE ON ARCTIC MARINE ECOSYSTEMS where their measurements quantify the export of organic matter from the sea surface to the deep sea, and trace changes in these fluxes over time.

mixing 1.3 with 1.4 Biogeochemistry of deep sea benthic communities could also be interesting as it measures benthic oxygen fluxes. Potentially a knowledge graph with some basics about carbon cycling oxygen respiration the connection to orgC remineralization could be used to link this O2 profile data to 1.3's genomic data (if the sampled stations overlap I need to check this).

mixing 1.3 with AWI ARCTIC NET COLLABORATION ON ARCTIC MARINE OBSERVATORIES could perhaps also work as they have a variety of different types of sensor data from a deployed McLane Moored Profiler. I suppose with all of these I'd need to check if the data is there which reminds me that the point of this is to use the semantics to query for unknown data, but I just need to have enough of a clear test case (going in to a test example already knowing that there are 2 data types which can be linked)

Sections 1.6 PLANKTON ECOLOGY AND BIOGEOCHEMISTRY IN THE CHANGING ARCTIC OCEAN PEBCAO GROUP and 1.7 SEA ICE STUDIES ECOLOGICAL CONSEQUENCES OF CLIMATE CHANGE IN THE FRAM STRAIT, A KEY REGION OF THE TRANSPOLAR DRIFT (PEBCAO GROUP) also have a variety of biogeochemical parameters. 1.6 being water column and 1.7 being sea ice, (the other potential ecosystem for the question)

20.09.17

Current goals are to find some AWI data sets relevant to the paces 2 framework, hosted on PANGAEA which can be used as the test case for the semantic research

LTER Observatory HAUSGARTEN 21 permananent research station in the Fram Strait in the transition zone between the northern North Atlantic and the central Arctic Ocean since 1999

FRAM the FRontiers in Arctic marine Monitoring which also a fixo3 observatory includes the HAUSGARTEN stations plus additional stations such as the mooring in the West Spitsbergen Current. More information in the fram brochure and fram interactive page They claim that the information will be available through an AWI data portal which is under construction. I found this page with AWI data portals which I will investigate later.

Hopefully my work can expand on FRAM's efforts to:

FRAM enhances sustainable knowledge for science, society and maritime economy as it enables truly year round observations from surface to depth in the remote and harsh arctic sea.

Hence I could perhaps refine my question to:

Can semantic research be used to support the FRAM scientific and societal goals as well as the UN sustainable development goal 14 concerning ocean sustainability by connecting and mobilize various FRAM data to assess natural capital and answer unasked questions?

can the space of community developed standards and community projects syn standards to be useful to a broader to a wider community, building sustainable infrastructure

useful reference

to get a some datasets

[pangaea data tools] (https://pangaea.de/tools/)

in the search bar search for FRAM, check the Hausegarten and FRAM Projects -> this yields 262 results I can use the similar datasets tab to get additional datasets relevant to that of interest. I can also search by geographically using the coordinates. So I can try by starting with the molecular data from Katja Metfies Josephine Rapp, Mari­anne Jacob, then search for other types of data which coincide with it. I should try to find something like physical oceanography data, biogeochem data, molecular data and something like Benthic megafaunal composition at the Arctic deep-sea observatory HAUSGARTEN, the count data from which could be ontologized.

If I can't find these things all together in a real example I could pull together ones from different spatiotemporal locations. So from 262 FRAM and HAUSGARTEN data sets I'll look through the most common ones make a list here and see narrow down which ones could be ontologized into a knowledge graph.

common data set titles:

Bacterial sequence information for surface sediments of HAUSGARTEN during Polarstern should include one even though there are only 5 of these

Benthic megafaunal composition

Biochemical investigation of multicorer sediment profile

Biogenic particle flux at AWI HAUSGARTEN from mooring

Diversity of benthic megafaunal communities

Dominant pelagic amphipods collected as swimmers from mooring time-series staion

High resolution movies along OFOS profile

Inorganic nutrients measured on water bottle samples at AWI HAUSGARTEN during POLARSTERN cruise

Oxygen profiles from in situ measurements below the carcasses less common but interesting

Pteropod sedimentation at AWI HAUSGARTEN northern station

Physical oceanography and current meter data

Physical oceanography raw data from moorings recovered during POLARSTERN cruise check if different than previous

Radiosonde PS85/36045 during POLARSTERN cruise PS85

Sea-bed photographs

Sea-bottom video taken along TVMUC deployment PS101/219-1 during POLARSTERN cruise

Sea ice drift from autonomous measurements from buoy 2016P37, deployed during POLARSTERN cruise PS101

Sea ice drift, surface temperature, and barometric pressure from autonomous measurements

Sea ice coverage at HAUSGARTEN station

Snow thickness measurements at sea ice station

Snow height on sea ice and sea ice drift from autonomous measurements

a shorter concise list with example links (to begin with):

Bacterial sequence information 22 parameters definitely relevant

Biochemical investigation of multicorer sediment profile less important perhaps drop this

Biogenic particle flux 10 parameters seems quite relevant

Diversity of benthic megafaunal communities 10 parameters relevant for ocean biogeochemistry

Inorganic nutrients 12 parameters relevant for ocean biogeochem

Physical oceanography raw data from moorings requires a login so perhaps not this? OR Physical oceanography and current meter data 11 parameters seems quite relevant

Sea-bed photographs looks a little tricky to me lacking a way of identifying anything relevant from the photo without some sort of image detection AND OR Sea-bottom video perhaps a similar problem to the photos this only contains raw video files which may be difficult to infer meaning out of unless of course you asked a question like do I have any video from of such a sample, perhaps I will deal with these two more later on.

Sea ice drift this data is only the time + lat long of I infer drifting ice or the position of the boey placed upon first year ice (so perhaps not super interesting to add) Snow thickness measurements only lat/long and thickness Sea ice drift, surface temperature, and barometric pressure from autonomous measurements similar to sea ice drift but with a few more parameters [Sea ice coverage](Sea ice coverage at HAUSGARTEN station N3) only 2 parameters Date/Time and ice coverage

Snow height on sea ice and sea ice drift from autonomous measurements 11 parameters height repeated from multiple sensors but also some other data which could be interesting to link up air temp and pressure.

working list

Bacterial sequence information 22 parameters definitely relevant

Biogenic particle flux 10 parameters seems quite relevant

Diversity of benthic megafaunal communities 10 parameters relevant for ocean biogeochemistry

Inorganic nutrients 12 parameters relevant for ocean biogeochemistry

Physical oceanography and current meter data 11 parameters seems quite relevant

Snow height on sea ice and sea ice drift from autonomous measurements 11 parameters height repeated from multiple sensors but also some other data which could be interesting to link up air temp and pressure.

21.09.17

Made a new page for the working datasets

Playing with the 6 datasets and their parameters to begin to see the possibilities for a semantic model. I will start with my knowledge from my database classes to create data base tables and see what can link up that way.

I wrote up a quick white board schematic showing ontology semantic connections to try and link up data between different datasets. I'm on the right track with this, so I should try to integrate the SDGIOs and microplastics. Create a digital version of such a diagram (in vue) which could serve as a supplement for the thesis, which I can use to present to some experts such as the AWI physicists and marine litter researchers to get them to validate the semantic model. All this is to create the semantic infrastructure to allow a user to perform a sparql query in the form of select ?x in which they don't need to know the semantic linkages but the model does and can help to link together the information contained within various AWI databases.

I think this thesis can help with the UN SDGIO 14 target goal:

Increase scientific knowledge, develop research capacity and transfer marine technology, taking into account the Intergovernmental Oceanographic Commission Criteria and Guidelines on the Transfer of Marine Technology, in order to improve ocean health and to enhance the contribution of marine biodiversity to the development of developing countries, in particular small island developing States and least developed countries

following the eskp litter database page link for database itself here perhaps it would be interesting to ontologize the various types of marine litter see their photo with caption Global composition of marine litter. It could be cool to assign some PATO qualities to the different types of pollutants i.e. resin pellets are denser than water so they sink to the seafloor vs styrofoam is less dense thus dominate microplastics at sea surface.

litterbase paper here

This plus nutrient pollution are perhaps ways I can integrate SDG target 14 goals, which I could possibly integrate with the Inorganic nutrient data as it deals with Nitrogen and phosphate species where are cycles dangerously out of balance, according to the Planetary Boundaries: Exploring the Safe Operating Space for Humanity paper

22.09.17

Created a page for potential references

DOOS report deep ocean observing strategy which

Provides the science and societal justification for physical, biogeochemical, and biological observations of the deep-sea.

in their terms of reference section there are a couple of their goals which we could help with

3 Coordinate observations to: Utilize existing platforms for new sensors or integration of physical, biogeochemical and biological sensors in order to improve observing efficiency. Document the state of deep-ocean observin. Identify standards and best practices for observing the deep sea.

My masters will be attempting to address the former, MESO potentially the latter.

6 Foster availability, discoverability, and usability of deep ocean data. Promote fit –for-purpose data

Which (beside availability) I'm trying to work on.

Dr. Felix Janssen from AWI is a DOOP member, so I guess that means we could potentially give some input?

DOOS is withing the framework developed by GOOS, which I should probably also check out Global Ocean Observing System (GOOS)

from the DOOS questions page

How does deep pelagic ecology respond to natural variation and multiple climate change stressors, including warming, deoxygenation, acidification, changes in biological production, as well as industrial activities?

Maybe we could try to answer part of this by interlinking the AWI data? At least the industrial activities part via the marine plastics story.

What drives variations in seafloor fluxes of heat, nutrients, tracers, oxygen and carbon? How are these quantities connected to greater ocean dynamics? This includes the longer-term links between seafloor fluxes and greater oceanic physical and biogeochemical properties.

How might natural and anthropogenic change influence the functional importance of animals and microbes in the deep sea and at the seafloor? What environmental variations do they experience in space and time? This includes consideration of benthic storms and currents, turbidity, T, pH, O2, and POC flux. This will inform spatial planning and impact assessment for seabed mining, bottom trawling and oil and gas extraction.

Maybe during this work I think think about answering subsets of such questions.

I should also check out the Framework for Ocean Observing final document

They keep mentioning EOV which stands for Essential Ocean Variables. This page could potentially serve as a rich source of terms to harvest.

Related is the world meteorological organization (WMO) with a similar Global Climate Observing System initiative.

25.09.17

notes from Friday's meeting with Ruth Duerr

esip telecon calander

esip-semanticweb portal which I signed up to.

Earth Science Information Partners (ESIP) a NASA consultants collaborative project now focused on managing environmental science data. The esip strategic vision and 2015-2020 Strategic Plan which is neat and I think my masters work could serve as an example of using an OBO ontology and semantics to link together earth science data. They like the idea of a graph so my coming semantic connection graph to connect awi datasets could be cool for them. I also need to look deeper into web semantics

esip collaborates with the SWEET ontology. Checking out the SWEET/ENVO alignment thread. Pier has such a grand vision integrating OBO and SWEET to get the life-sciences and earth sciences to collaborate via the potential eventual merger of such various ontologies, cool stuff. I guess we'll see where this goes. Additionally there, I see a link to the alignment between SWEET and Seaice ontologies which I could at some point possibly ontologize to ENVO (although I guess we'll have to see where the SWEET/ENVO alignment ends up going.

Also from ruth workshop for an Open Knowledge Network hosted by the Big Data Interagency Working Group (IWG) they have some pretty game changing proposals to look for new ways to better manage information in their white paper It's interesting that some of the ideas mentioned here are quite reminiscent of cryptocurrency talks given by Andreas Antonopoulos such as this one I recently watched. I want to continue to explore the theme and idea of an open knowledge network which is to be open access and decentralized as to democratize information and people's abilities to ask questions. I could see this ranging from everything from environmental questions to government and policy questions to environmental justice and sustainability. I think it would be really cool if I found some sort of side project to do during my Ph.D. upon such topics analogously to how Pier took over governance of ENVO during his.

specifically I like the quote from the white paper:

The current situation with knowledge networks is reminiscent of the mid-1980’s in computer networking—at that time many proprietary, disconnected islands of networking technology were in existence (e.g., AOL, Prodigy, CompuServe, IBM, DEC, etc). The subsequent advent of the Internet and the Web, with their open protocols, generated an explosion in innovation across all aspects of networking. This enabled “permission-less innovation, ” with anyone able to create and publish a website and thus become a part of the web.

They also state the need for

maintaining provenance (the history of where a concept or relationship came from)

Perhaps this sort of information is best suited by the use of a blockchain technology.

Also from ruth, she invited us to participate in the Arctic data committee working group (ADC-IARPC Vocabularies and Semantics WG)

Perhaps I should import the info from Ruth's sea ice ontology into ENVO? I'll ask Pier about this.

Created working outline page and with Pier's help came up with thesis title:

Interconnecting Arctic observatory data through machine-actionable knowledge representation: are ontologies fit for purpose?

which can lead to competency questions then to experiments to test against such questions, which will help guide the development of this project.

26.09.17

data lake a system or repository which stores data in it's natural format being all stored together. All the data for an enterprise including structured data (formated for relational databases) semi structured like csv files and unstructured pdf emails etc. You call it a data swamp if the lake deteriorates to being useless to users.

Data Mart: smaller repository of atributes of interest extracted from raw data. Subset of a data warehouse like for a single department of an enterprise.

Data warehouse is an integrated central repository for data of different sources. Such data is arranged into hierarchal groups dimensions.

I began making the awi_dataset connections graph

What makes for typical seawater typical and what should we call this? (having it be a subclass of seawater)

27.09.17

My vue has an annoying bug where it won't open an old vue file unless there are multiple maps open as tabs. In order to re open one I just need to press file new, then for whatever reason the old one will open.

awi_dataset connections graph figure added in commit #860e9b2bc556b88de5c76115591e634022329c3d accessible from here

Working on the awi_dataset connections graph notes here.

ENVO water flow process: had the mistakes of having: axioms: has participant some land & has participant some lava which I believe Pier fixed.

ENVO: marine current: is a continuous, directed movement of marine water generated by the forces acting upon this mean flow ...

to link marine current and water flow process I could use the RO relation contains process def:

a relation between an independent continuant and a process, in which the process takes place entirely within the independent continuant

Does a water flow process take place entirely within a marine current? No not entirely A water flow process could happen in places other than a marine current.

RO: occurs in is also for the process entirely taking place within the independent continuant. RO: occurent part of is between occurents

Perhaps RO participates in or has participant. A marine current participates in a water flow process. That sounds right.

A mass liquid flow: A process whereby a volume of liquid moves due to a disequilibrium of physical forces. Has the movement as part of it's definition therefore perhaps we can add the connection to a fluid flow rate

fluid flow rate: A physical quality inhering in a fluid (liquid or gas) by virtue of the amount of fluid which passes through a given surface per unit time.

PATO: has quality: is between an between an independent continuant (the bearer) and a quality, so I can't mass liquid flow process as having quality flow rate.

Marine current

a continuous, directed movement of marine water generated by the forces acting upon this mean flow, such as breaking waves, wind, Coriolis force, temperature and salinity differences and tides caused by the gravitational pull of the Moon and the Sun. Depth contours, shoreline configurations and interaction with other currents influence a current's direction and strength.

by it's definition it could have qualities like direction and fluid flow rate.

Ask the librarian to order me some books on OWL SPARQL, and semantic web

28.09.17

Notes from Pier:

Add sea ice controlled bloom as a subclass of algal bloom Janout et al 2016 paper for reference Include zooplankton bloom as causally downstream of a phytoplankton bloom also make a note to add ice associated communities Add sea ice primary production Needs coordination with ECOCORE

make terms for zooplankton and phytoplankton blooms, link to ECOCORE, link to chlorophyll A data

03.10.17

I am wondering about encoding the ideas of endangered species somewhere. My room mate (a bird focused ecologist) was talking about spending his day looking over endangederd species lists and having to do lots of manual research to get the info about the different species he'll be encountering in certain locations in the amazon, and I was wondering if encoding endangered species or similar concepts could be paired with some literature searching to mobilize data which could be useful to generate policy reports. Pier thinks it could go into PCO.

04.10.17

From Pier: https://github.com/ktym/d3sparql http://crubier.github.io/Hexastore

Re-organized the to ontologize page into seperate pages for ENVO, PATO, PCO and put the links on the main page. Also make a main tab for projects: AWI datasets interlinking and plankton ecology

For the AWI datasets physical oceanography and current meter data, checking for papers using such data.

GROUP PI Wilken-Jon von Appen publishes a lot, try to find a paper which actually uses the right data set. https://doi.org/10.1002/2017JC012974 no http://onlinelibrary.wiley.com/doi/10.1002/2016JC012462/abstract no http://dx.doi.org/10.1002/2016JC012228 no, http://dx.doi.org/10.1002/2016JC012121 no

Eduard_Bauerfeind's publications are also a potentially useful source of info.

05.10.17

Reading Chapter 3 of A Biologist's Guide to Mathmatical Modeling in Ecology and Evolution

Adding appropriate classes to ontologize as part of the plankton ecology project

PCO already has some basic terms for me to work off.

  • quality of a population:

    • population birth rate

    • population death rate

    • population growth rate

    • carrying capacity

  • population process

    • population growth

      • exponential population growth

      • logistic population growth

Perhaps I can link these to be able to describe the dynamics of a bloom.

Kai:

Perhaps I could even try to define all of lag phase, log phase/exponential phase, stationary phase, and death phase using the terms from the logistic growth equation in the definition.

Pier:

we would have "lagged pop growth", "exp pop growth", etc where we can get lagged and exponential rates into PATO

work temporarily stored here

06.10.17

Pier:

we can think of phases as temporal regions during which certain processes are unfolding

So I'll look into adding terms like lag population growth phase, exponential population growth phase, stationary population growth phase, and death population growth phase as temporal regions linking to their accompnying PATO processes.

I now have the basic population growth clases organized where each of the x population growth processes (lagged, exponential stationary, death/decline) would go to PCO (as subclass of PCO:population growth), In PATO there would be the accompanying x population growth rate (as PATO:growth quality of occurrent), and in ENVO the x population growth phase (as spatiotemporal region).

To axiomatize the x population growth phases as temporal regions, I'd use the occurent part of relation to link it to the appropriate x population growth process.

Alternatively if we make the phases be spaciotemporal regions, we could use the RO:contains process relation between the x population growth process, a material entity (a population) and spaciotemporal region (the phase class). We'll go with this one.

09.10.17

review of work and questions:

work thus far:

plankton ecology outline awi datasets and vue figure

side project crypto

Questions:

should I create ENVO: plankton, phytoplankton, zooplankton classes?

Would it make sense to create terms for processes like remineralisation of organic matter, nutrient depletion, water stagnation?

is Diatom Phenology in the Southern Ocean: Mean Patterns, Trends and the Role of Climate Oscillations, (associated with Tillman satellite chlorophyll data) the types of classes we want to create for the plankton ecology project.

Report on ocean health indicators for SDG's linking phytoplankton ecology use geo for food webs briefly link to phenomena about primary productivity -> fisheries economies, perhaps report on a common arctic fish of some economic importance.

illustrate that these representations pulled together are of direct relevance to associate this scientific research and the SDGs.

ask katia metfis about metadata about microbial observatory in Pangaea.

Most important task find phenomena out of the AWI paper and get some of the most important ones ontologized in order of importance. Try to get a sufficient bulk of this done this week, so we can push this to the ontologies and move on to mobilizing data. Either the existing data or if necessary.

I think I could use the data from the Biogeography and Photosynthetic Biomass of Arctic Marine Pico-Eukaroytes during Summer of the Record Sea Ice Minimum 2012 paper for my thesis as part of a dummy data set with chlorophyll sea ice coverage and sequence data.

check out the Hexastore data reasoning link pier sent me on skype, and the d3sparql

Diapycnal diffusion

added ontology terms raw workup.xlsx

10.10.17

Work on concepts from Diatom Phenology in the Southern Ocean: Mean Patterns, Trends and the Role of Climate Oscillations and Biogeography and Photosynthetic Biomass of Arctic Marine Pico-Eukaroytes during Summer of the Record Sea Ice Minimum 2012 papers. Attempting to piece together the puzzle around marine water masses and, their mixing processes, stratification and the halocline the light penertration depth and nutrient requilbration through mixing of water masses. The latter leads to the resupply of nutrients for phytoplankton but having a class like nutrient resupply would be super anthropogenic or planktopogenic perhaps it could be a synonym for a marine water mass re-equilibration process. we'll see how this turns out.

Example from today's work marine water mass is currently a marine pelagic feature perhaps it could be moved to water body as the dbxref definition of water mass identifies it as a body of water

Work to define these classes temporarily being stored here

11.10.17

old version:

I am researching polar, microbial and oceanographic semantics, in order to interconnect disparate FRAM and HAUSGARTEN AWI data, through a machine-actionable knowledge representation of the underlying processes and phenomena captured within the data.

rewrite:

I am researching how ontologies and semantics can be used to better connect various different types of FRAM and HAUSGARTEN data such as molecular observatories, physical oceanography, inorganic nutrients and sea ice coverage data. Through the semantic representation of such phenomena I will attempt to ask and answer novel questions of the existing data such as: "what metabolic functions are associated with plankton communities living in regions of decreased ice coverage?".

rewrite 2: I am researching how the representation of expert knowledge about marine and polar environments, within an ontology, can help to interconnect polar observatory data. Leveraging FRAM and HAUSGARTEN projects such as microbial observatories, physical oceanography and drift from ice moorings, and satellite chlorophyll sensors, I will attempt to demonstrate the feasibility of using an ontology to answer novel questions such as: "What changes in polar ecosystem services will result from decreasing sea ice coverage?".

rewrite 3: I am researching how the representation of expert knowledge about marine and polar environments, within an ontology, can help to interconnect polar observatory data. Leveraging FRAM and HAUSGARTEN projects such as microbial observatories and physical oceanography from ice moorings, I will attempt to demonstrate the feasibility of using an ontology to answer novel questions such as: "What changes in polar ecosystem services will result from decreasing sea ice coverage?".

rewrite 4: I am researching how the representation of expert knowledge about marine and polar environments, within an ontology, can help to interconnect polar observatory data. Leveraging FRAM and HAUSGARTEN projects such as microbial observatories and physical oceanography from ice moorings, I will attempt to demonstrate the feasibility of using an ontology to answer novel questions such as: "What ecosystem services derive from the metabolic functions of microbial communities which live in regions of decreased sea ice coverage?"

I had written a bunch of envo classes but the wifi went out and my computer glitched so I lost the days worth of material !!!!!! FUCK!!!!!

But I managed to re-write a couple of the lost classes.

Here is an idea which may help me to get to a competency question. Could we use the physical oceanography data about salinity and temperature depth profiles, to infer if the water body is stratified? Using the knowledge that a stratified water body has layers of distinct water masses with characteristic temperature and salinity, which together affect the water's density. Do you think it would be possible to query such data to look for potential density stratified patters? We also know that density stratification hinders vertical mixing processes of marine water masses, and therefore could hinder primary productivity. Do you think semantic linkages along these lines could be used to help infer primary productivity based physical oceanography data? Or do you think this is unfeasible.

12.10.17

Chris Mungall is trying to drum up funding for a project Alliance for Microbial Ontology Curation (AMOC).

link to his google doc he also cites: BERAC-Grand-Challenges-Draft-Report on the need for a greater ontology presence to represent microbes. Sounds like interesting and relevant work for me to contribute to however I'm concerned it would potentially distract from what should be my main focus, linking AWI arctic data.

reading Particle sedimentation patterns in the eastern Fram Strait during 2000–2005 worked a bit on classes for it here

Focusing the work around plankton blooms and HABs help give me direction in what kinds of classes I should harvest being those connected to a plankton bloom. Connecting sea ice and water mass data in relation to how they can affect plankton. Connecting sea ice melting and photo availabilty to blooms, to the carbon pump, physical forces shaping water mass mixing which affect nutrient concentrations which affect blooms. I will have plankton bloom as the central theme which should help for the other elements to come into place.

13.10.17

finishing up work here. Moving ideas from there into the ENVO page to consolidate classes.

link to ECOCORE

In terms of water masses, Pier says I should make subclasses of water mass for water masses of distinct properties such as 'saline water mass', 'cold water mass', 'warm water mass' then use these classes to create a union class: 'cold salty water mass' as annotate that to be used as in gaz for a specific instance of a water mass for example: the coldest saltiest one would be 'antarctic bottom water'. Figure out the details of these water masses to come up with the diferentia (temp, salinity). I'm not sure if these classes should be subclasses of 'marine water mass' or of the 'marine cold/warm-water sphere' classes, (it's an easy fix either way).

Pier would also like me to focus on SPARQL and finish the tutorial which I'll do this weekend.

14.10.17

going through sparql tutorial notes here

from pier: good reading from a somewhat different perspective

16.10.17

Today and tomorrow have to attend the Marmic soft skills presentation course.

closed some issues 4, 5.

Checked out the Hexastore data reasoning link pier sent me on skype, and the d3sparql see sparql page

17.10.17

Pier would like me to wrap up the first ontology pull request this week / early next week so I can move on to querying and working with the data. He suggested I pull all the data together, make some sparql queries to access the data making use of the bloom related envo terms, then write a python script to pull together the results of the query into my own data space. For this I should use hashes (the data structure) to have primary keys for the columns where they came from etc.

To do: finish the remaining papers/concepts for this first round.

[Biogeography of Deep-Sea Benthic Bacteria at Regional Scale (LTER HAUSGARTEN, Fram Strait, Arctic)]

[Physical and ecological processes at a moving ice edge in the Fram Strait as observed with an AUV]

From: [Exchange of warming deep waters across Fram Strait] all that really matters is a process about water mass mixing

and maybe the marine litter paper: Citizen scientists reveal: Marine litter pollutes Arctic beaches and affects wild life I don't think this is necessary for the first release according to Pier's timeframe. They mention a bit about "Fisheries-related debris" that could come in handy later if I manage to make a connection to the UN sustainability development goals by addressing the economic dimension of fishing in connection to plankton blooms being the food for fish catches, but for now it's a bit of a stretch so I'll leave it out. For later this could be interesting as they have some citations about increasing fishing activities near Svalbard. Perhaps I can later use that knowledge to find out about whats fished and by whom, perhaps there is an indegionous people's fishing connection, or perhaps not to be found here.

18.10.17

to do remaining read/ find concepts from:

[Biogeography of Deep-Sea Benthic Bacteria at Regional Scale (LTER HAUSGARTEN, Fram Strait, Arctic)]

[Physical and ecological processes at a moving ice edge in the Fram Strait as observed with an AUV]

[Influence of snow depth and surface flooding on light transmission through Antarctic pack ice]

read paper and check if signal strength has to do with light transmission.

check if I have harvested sufficient ontology terms for the rest of the other datasets:

Diatom Phenology in the Southern Ocean: Mean Patterns, Trends and the Role of Climate Oscillations satellite chlorophyll ice cover

Biogenic particle flux Selected data fields to be represented added to be harvested in ENVO.

Inorganic nutrients Ions exist in CHEBI, but I may need to deal with:

Date/Time of event Latitude Longitude Elevation

Physical oceanography and current meter data Selected data fields to be represented added to be harvested in ENVO.

Snow height on sea ice and sea ice drift from autonomous measurements temperature of air exists in envo, Other selected data fields to be represented added to be harvested in ENVO.

19.10.17

made many changes to terms in both ENVO and plankton ecology pages.

20.10.17

ontobee sparql tutorial

example of how to run a sparql query: To find all subclasses of an ontology term

PREFIX obo-term: http://purl.obolibrary.org/obo/ SELECT DISTINCT ?x ?label from http://purl.obolibrary.org/obo/merged/ENVO WHERE { ?x rdfs:subClassOf obo-term:ENVO_00002200. ?x rdfs:label ?label. }

I could try to do something where I query for the the created by relation and get the orcid page, then see if I can get their contact info, perhaps using a wrapper python script could be a potential use case to demonstrate the utility of semantic querying for AWI.

I want to try and find a link to a list of the owl relations so I use them for queries. Answer it's kindof all over the place some are rdfs some owl, some RO some IAO, I can use protoge to look at the annotation properties and get the purl so that I know what to set as the prefix to search for something.

for example 'term editor' is http://purl.obolibrary.org/obo/IAO_0000117

I want to make a query get me all term editors of classes which are subclasses of ENVO:sea ice.

PREFIX obo: http://purl.obolibrary.org/obo/ SELECT DISTINCT ?x ?label ?author from http://purl.obolibrary.org/obo/merged/ENVO WHERE { ?x rdfs:subClassOf obo:ENVO_00002200. ?x rdfs:label ?label. ?x obo:IAO_0000117 ?author . }

This worked pier wants me to do a simple example task where I answer the question:

"Can we find the author of an ontology term and if so give me their contact information"

To do this I'll need to look at the ORCID API (Application Programming Interface) and figure out how to send them a request perhaps wraping it in a python or bash script.

ORCID API

could do it with the api

or a bash script wget what's needed grep for the important part sed to clean it up // unfortunately this won't work as the vital data is encrypted instead I'll use the ORCID API

Signing up for the developer tools (I think I have to do this to be able to use the sandbox) Name of application: Ontology class author search

Description of your application: An application for querying ontology classes and searching for the author information affiliated with the term editor. I sent them an email if I could perfom simple queries without having my own website.

So I'll try to do it in bash

to start with just with the result of one wget

The orcid people responded, it's possible see this tutorial on reading data from an orcid record

You can submit a two-legged OAuth authorization request to the ORCID api to get a /read-public access token to be able to access to an ORCID record.

curl example:

https://sandbox.orcid.org/oauth/token HEADER: accept:application/json DATA: client_id=[Your client ID] client_secret=[Your client secret] grant_type=client_credentials scope=/read-public turns into:

curl -i -L -H "Accept: application/json" -d "client_id=APP-NPXKK6HFN6TJ4YYI" -d "client_secret=060c36f2-cce2-4f74-bde0-a17d8bb30a97" -d "scope=/read-public" -d "grant_type=client_credentials" "https://sandbox.orcid.org/oauth/token"

got it to run as:

curl "Accept: application/json" -d "client_id=APP-NPXKK6HFN6TJ4YYI" -d "client_secret=060c36f2-cce2-4f74-bde0-a17d8bb30a97" -d "scope=/read-public" -d "grant_type=client_credentials" "https://sandbox.orcid.org/oauth/token"

returned was:

curl: (6) Could not resolve host: Accept {"access_token":"2bd6d6b7-9438-4a5a-8f87-7e43d6eaac25","token_type":"bearer","refresh_token":"5d1ddab9-4bc3-48d8-8a41-97b1c4151ada","expires_in":631138518,"scope":"/read-public","orcid":null}

which is what the tutorial said I should get. This token can now be perform multiple searches or read multiple ORCID records.

that brings me back to the tutorial to read orcid records and here

 Method:  GET
  Content-type: application/vnd.orcid+xml
  Authorization type: Bearer
  Access token: [Your access token]
  URL: https://pub.sandbox.orcid.org/v2.0/search/?q=orcid

from the orcid api git hub [example calls] (https://github.com/ORCID/ORCID-Source/blob/master/orcid-model/src/main/resources/record_2.0/README.md#read-sections) there is this code for Emails /read-limited or /read-public

curl -i -H "Accept: application/vnd.orcid+xml" -H 'Authorization: Bearer dd91868d-d29a-475e-9acb-bd3fdf2f43f4' 'https://api.sandbox.orcid.org/v2.0/0000-0002-9227-8514/email'

try it with 2bd6d6b7-9438-4a5a-8f87-7e43d6eaac25 token

curl -i -H "Accept: application/vnd.orcid+xml" -H 'Authorization: Bearer 2bd6d6b7-9438-4a5a-8f87-7e43d6eaac25' 'https://api.sandbox.orcid.org/v2.0/0000-0002-9227-8514/email'

gives me the result:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<email:emails path="/0000-0002-9227-8514/email" xmlns:internal="http://www.orcid.org/ns/internal" xmlns:funding="http://www.orcid.org/ns/funding" xmlns:preferences="http://www.orcid.org/ns/preferences" xmlns:address="http://www.orcid.org/ns/address" xmlns:education="http://www.orcid.org/ns/education" xmlns:work="http://www.orcid.org/ns/work" xmlns:deprecated="http://www.orcid.org/ns/deprecated" xmlns:other-name="http://www.orcid.org/ns/other-name" xmlns:history="http://www.orcid.org/ns/history" xmlns:employment="http://www.orcid.org/ns/employment" xmlns:error="http://www.orcid.org/ns/error" xmlns:common="http://www.orcid.org/ns/common" xmlns:person="http://www.orcid.org/ns/person" xmlns:activities="http://www.orcid.org/ns/activities" xmlns:record="http://www.orcid.org/ns/record" xmlns:researcher-url="http://www.orcid.org/ns/researcher-url" xmlns:peer-review="http://www.orcid.org/ns/peer-review" xmlns:personal-details="http://www.orcid.org/ns/personal-details" xmlns:bulk="http://www.orcid.org/ns/bulk" xmlns:keyword="http://www.orcid.org/ns/keyword" xmlns:email="http://www.orcid.org/ns/email" xmlns:external-identifier="http://www.orcid.org/ns/external-identifier">
    <common:last-modified-date>2016-10-26T17:03:35.739Z</common:last-modified-date>
    <email:email visibility="public" verified="true" primary="false">
        <common:created-date>2016-10-26T17:03:05.707Z</common:created-date>
        <common:last-modified-date>2016-10-26T17:03:35.739Z</common:last-modified-date>
        <common:source>
            <common:source-orcid>
                <common:uri>http://sandbox.orcid.org/0000-0002-9227-8514</common:uri>
                <common:path>0000-0002-9227-8514</common:path>
                <common:host>sandbox.orcid.org</common:host>
            </common:source-orcid>
            <common:source-name>Sofia Maria Hernandez Garcia</common:source-name>
        </common:source>
        <email:email>[email protected]</email:email>
    </email:email>
</email:emails>

which contains <email:email>[email protected]</email:email> the email of the person associated with that orcid using the token I asked for in the previous step.

now can I try to hack Pier's email out of his orcid page using my Bearer token.

curl -i -H "Accept: application/vnd.orcid+xml" -H 'Authorization: Bearer 2bd6d6b7-9438-4a5a-8f87-7e43d6eaac25' 'https://api.sandbox.orcid.org/v2.0/0000-0002-4366-3088/email'

ok it didn't work I probably need to ask for a new token the right way, or do the token request again as I think I've already used the previous token.

curl -i "Accept: application/json" -d "client_id=APP-NPXKK6HFN6TJ4YYI" -d "client_secret=060c36f2-cce2-4f74-bde0-a17d8bb30a97" -d "scope=/read-public" -d "grant_type=client_credentials" "https://sandbox.orcid.org/oauth/token"

gave me the output:

{"access_token":"f11d91aa-903b-46e9-afad-0bfe8699e27f","token_type":"bearer","refresh_token":"3f805bac-4af2-4282-af50-d412f3ece17b","expires_in":631138518,"scope":"/read-public","orcid":null}

with new token: f11d91aa-903b-46e9-afad-0bfe8699e27f

Use this to try and hack Pier's contact info:

curl -i -H "Accept: application/vnd.orcid+xml" -H 'Authorization: Bearer f11d91aa-903b-46e9-afad-0bfe8699e27f' 'https://api.sandbox.orcid.org/v2.0/0000-0002-4366-3088/email'

error 404 again send me to troubleshooting link

try a different request for 0000-0002-9227-8514's data again using the first maybe it's the first step that I would need to figure out how to get the right info for.

curl -i -H "Accept: application/vnd.orcid+xml" -H 'Authorization: Bearer 2bd6d6b7-9438-4a5a-8f87-7e43d6eaac25' 'https://api.sandbox.orcid.org/v2.0/0000-0002-9227-8514/educations'

This returns Maria's info, so I guess I need to try and request token access to Pier?

The token request was:

curl "Accept: application/json" -d "client_id=APP-NPXKK6HFN6TJ4YYI" -d "client_secret=060c36f2-cce2-4f74-bde0-a17d8bb30a97" -d "scope=/read-public" -d "grant_type=client_credentials" "https://sandbox.orcid.org/oauth/token"

From their github documentation

I'm going to try and Read the entire record https://[HOST]/v2.0/[ORCID]/record

with [HOST] being pub.sandbox.orcid.org for the Public API on the ORCID Sandbox (/read-public scope only)

https://pub.sandbox.orcid.org/v2.0/0000-0002-4366-3088/record

gives me the same error 404 no access issue. From the error codes error 404 is ORCID ... not found

when I try the same thing with Maria:

https://pub.sandbox.orcid.org/v2.0/0000-0002-9227-8514/record

it works without a token request.

try with mine (I just made my email public)

https://pub.sandbox.orcid.org/v2.0/0000-0002-3410-4655/record

This work is summarized here.

Finish up Ontobee sparql tutorial and here check out queries 7 and 8.

21.10.17

By no uncertain terms would anyone possibly be considering a Research Fellow position at the University of York Environment Department.

Stated in the Research Fellow job description, the candidate:

[Has a main roles] to participate actively in the planning and management of research projects, including supervising the work of others and providing expert advice and guidance

I assume my master's thesis would play a part in proof to such an extent.

To supervise postgraduate research students and mentor colleagues with less experience. Advising on their personal development and supporting them in developing their research techniques

As well as to look for grants.

The fully-funded PhD programme in Adapting to the Challenges of a Changing Environment (ACCE) has PHD offerings. Available is the ACCE NERC DTP Studentships including 10 PHD vacancies along with one possible self created allocation.

My notes on the 10 PHD availabilities:

Effects of habitat management on the mating system and reproductive success of European nightjar populations (PhD in Environmental Science)

Lots of birds and Field work... Maybe I should tell my room mate about this, he may enjoy it, but certainly not for me.

Modelling exposure and effects of mycotoxins in fish (PhD in Environmental Science)

Poisoning fish ... mmm I don't think this is good for my Karma.

Influence of sea-level rise on carbon sequestration in US salt marshes (PhD in Environmental Geography)

Field work in Mississippi mmm honey git my shotgun ... (This is why this repo is private)

Environmental impacts of tidal energy generation on tidal environments and shorebirds (PhD in Environmental Geography)

A bit of field work (seems optional) and modeling looking for either a modeler interested in ecology, or an ecologist interested in modeling. I could see myself being a potential candidate for this. Seeing as the lead supervisor Dr Jon Hill builds software to solve/investigate environmental problems and he is interested in integrating phylogenetic information with environmental data. Perhaps he could be interested in a project using ENVO to link data, maybe with this project or another? Good lead to further explore. Jon's paper on global cooling affecting species diversification in response to climate change is interesting.

I'll check out the co-supervisors: Kathryn Arnold and Angus Garbutt. Angus studies restoration management and processes of intertidal habitats, sounds like a solid foundation ecologist. Kathryn Arnold is the senior lecturer in ecology and she studies biodiversity and ecosystem services. She also sounds quite competent.

Assessing the Impact of Microplastics on Terrestrial Ecosystem Services (PhD in Environmental Science)

I love the topic but this is more an analytic chemistry laboratory project, not the methods I want to pursue.

Links between the marine cycles of nitrogen and iodine (PhD in Environmental Science)

Mostly laboratory-based study, sounds like biogeochemistry style work.

Peering into the hidden world: Photosynthetic microbes as functionally-significant indicators of peatland degradation and restoration. (PhD in Environmental Geography)

Very MolEcol and Biogeo style work cool but not what I want.

The quantification of changes in supraglacial conditions on the Greenland Ice Sheet and implications for surface drainage development (PhD in Environmental Geography)

Goes nicely with the sea ice related ENVO contributions from my rotation with Pier. Using satelite imagery and some field-based techniques to explore ice drainage. I think there are possibilities for this to overlap with ENVO contributions.

Microbes and Microplastics: Assessing the influence of ingesting microbially colonised plastics on the physiology of fish. (PhD in Environmental Science)

Very MolEcol style work not for me.

Unravelling methane capture in forests (PhD in Environmental Science)

Field and lab work stable isotopes measuring fluxes ... perhaps in a previous life

self created York studentship

Also stated on the ACCE NERC DTP Studentships site is that:

One of the York studentships each year can be allocated to well qualified students who have devised their own research project, together with a York ACCE supervisor. The project must fall broadly within one of the ACCE themes. If you are interested in this route, please contact a potential supervisor in the departments of Biology, Environment, Archaeology or Chemistry to discuss your idea and help develop your project outline.

Perhaps I can work with Pier to write a proposal for a project which is integrated as a part of the work he would be doing at a Research Fellow, assuming such position qualifies as eligible York ACCE supervisor.

The application for all these positions is: Sunday 7th January 2018 at 23h59min.

Unfortunately it looks like as an international student I am not eligible for these ACCE NERC DTP Studentships. :(

Q: I am an international student and I would like to study in the UK. Am I eligible for NERC funding? A: In order to be eligible for NERC funding, you must be able to prove settled status in the UK as per the Immigration Act 1971. You must also be able to prove that you have a relevant connection to the UK, which requires that you have been ordinarily resident in the UK throughout the 3 year period preceding the date of the award. NERC does not currently fund international students outside of these criteria.

Alternatively we could look into writing a grant for the Research Fellow from a funding agency such as the UK's National Environment Research Council.

23.10.17

Alainna from the ORCID team responded back she has a tutorial on using their API.

Attempting to register for a public API so I can get access to a client ID and client secret documented here

Name of application: Ontology class author search

website https://thelonggameforum.wordpress.com/ //try with a random website of mine see if they give me access then I just don't use the website.

Description of your application: An application for querying ontology classes and searching for the author information affiliated with the term editor. I sent them an email if I could perfom simple queries without having my own website.

Redirect URIs just use one of the examples provided like the google code. https://developers.google.com/oauthplayground

Success!! They gave me the required fields.

Client ID: APP-GCONHGFI79VLYEYW Client secret: 40fa8efb-1fa6-4415-99eb-6126f440866b

Using a two-legged OAuth authorization request to send a token request to be able to query orcid pages:

curl -i -L -H "Accept: application/json" -d "client_id=APP-GCONHGFI79VLYEYW" -d "client_secret=40fa8efb-1fa6-4415-99eb-6126f440866b" -d "scope=/read-public" -d "grant_type=client_credentials" "https://sandbox.orcid.org/oauth/token"

gave error:

HTTP/1.1 401 Unauthorized
Date: Mon, 23 Oct 2017 12:29:18 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d8923de8daa4d1ef9fa7f14248967b5621508761757; expires=Tue, 23-Oct-18 12:29:17 GMT; path=/; domain=.orcid.org; HttpOnly
Cache-Control: no-store
Pragma: no-cache
WWW-Authenticate: Bearer realm="orcid", error="invalid_client", error_description="Client not found: APP-GCONHGFI79VLYEYW"
Server: cloudflare-nginx
CF-RAY: 3b24b37b78570c2f-AMS

{"error":"invalid_client","error_description":"Client not found: APP-GCONHGFI79VLYEYW"}

This and curl are giving me problems, maybe I stop at this for now and just leave that first competency questions as solved stating that the work needing to be done further is to query the API which is outside the scope of this project.

moving on I created a page for Competency questions

To sync my envo fork follow directions first here, and here, get it up to date so I can try to import classes from NPO.

To be able to express light there is an issue NTRs: radiation which uses NPO's way of modeling it. According to Chris on the npo issue 6 the NPO definition for radiation came from NCIT. Pier suggests I should fork ENVO and try to import the relevant classes (the electromagnetic radiation hierarchy) from NPO. NPO homepage here

I could also try to deal with the PAR issue New class: photosynthetically active radiation #364 while I'm at it.

To import from NPO I need to run the makefile in envo/src, to do that I need to have a working environment. Which requires Robot (which I have) and possibly owltools. I belive what Pier said was that this will create the mirror files which go and mirror other ontologies, so that I can specify the NPO IRI's, what is called the ID in bioportal the URI's like http://purl.bioontology.org/ontology/npo# ... which I can then get the _import.owl _terms.txt or similar files.

When I run make it runs the command:

owltools http://purl.obolibrary.org/obo/pato.owl --remove-annotation-assertions -l -s -d --remove-axiom-annotations --remove-dangling-annotations --make-subset-by-properties -f BFO:0000050 BFO:0000051 RO:0002202 immediate_transformation_of RO:0002176 IAO:0000136 --set-ontology-id http://purl.obolibrary.org/obo/pato.owl -o mirror/pato.owl

which doesn't work because I don't have owltools. owltools build instructions. I have mvn so I can try to build it. Take it from here

mvn clean install

errors:

Results :

Failed tests: 
  BasicChecksRuleTest.testOutdatedIEAs:39 expected:<1> but was:<2>

Tests run: 45, Failures: 1, Errors: 0, Skipped: 2

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] OWLTools-Parent .................................... SUCCESS [  4.170 s]
[INFO] OWLTools-Core ...................................... SUCCESS [03:52 min]
[INFO] OWLTools-Annotation ................................ FAILURE [13:00 min]
[INFO] OWLTools-Oort ...................................... SKIPPED
[INFO] OWLTools-Sim ....................................... SKIPPED
[INFO] OWLTools-Web ....................................... SKIPPED
[INFO] Lego ............................................... SKIPPED
[INFO] OWLTools-Solr ...................................... SKIPPED
[INFO] OWLTools-Runner .................................... SKIPPED
[INFO] OWLTools-NCBI ...................................... SKIPPED
[INFO] Golr-Client ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:58 min
[INFO] Finished at: 2017-10-23T21:35:14+02:00
[INFO] Final Memory: 39M/505M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project OWLTools-Annotation: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/kai/Desktop/grad_school/marmic/master_thesis/code/owltools/OWLTools-Annotation/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :OWLTools-Annotation

An error message on the owltools github suggests

Thanks -- I had to delete owltools and do a fresh svn checkout (svn update didn't do the trick), and then the build succeeded.

svn checkout https://github.com/owlcollab/owltools.git

that was getting the all the branches,

try from here

to my ~/.bashrc I added export PATH=$PATH:~/Desktop/grad_school/marmic/master_thesis/code/owltools

now I could run make in the ~/Desktop/grad_school/marmic/master_thesis/code/kai_envo_fork/envo/src/envo

unfortunately the make didn't work, error output message:

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000055c900000, 1041760256, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1041760256 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/kai/Desktop/grad_school/marmic/master_thesis/code/kai_envo_fork/envo/src/envo/hs_err_pid8793.log
Makefile:260: recipe for target 'mirror/ncbitaxon.owl' failed
make: *** [mirror/ncbitaxon.owl] Error 1
rm imports/pco_combined_seed.tsv imports/uberon_combined_seed.tsv

24.10.17

Brandon's PHD defense was quite good, good incites into how the process works.

when I run make, the Makefile call the following: OWLTOOLS_MEMORY=12G owltools ncbitaxon.obo --remove-annotation-assertions -l -s -d --remove-axiom-annotations --remove-dangling-annotations --set-ontology-id http://purl.obolibrary.org/obo/ncbitaxon.owl -o mirror/ncbitaxon.owl

specifically OWLTOOLS_MEMORY=12G, which may explain the error='Cannot allocate memory' (errno=12) issue.

From here cat /proc/meminfo shows that I have MemTotal: 8066808 kB which is how much ram I have, which is ~8G, so that explains why my little computer can't handle it. I guess I could try editing this line OWLTOOLS_MEMORY=12G in the Makefile to something like OWLTOOLS_MEMORY=7G and see if that works. I'll try this later (when I have my charger I forgot it at the moment).

working on plankton ecology classes link to ECOCORE ontology

trying owltools again, with 7G see if it builds.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.semanticweb.owlapi.model.parameters.ConfigurationOptions.getValue(ConfigurationOptions.java:178)
	at org.semanticweb.owlapi.model.OWLOntologyLoaderConfiguration.isStrict(OWLOntologyLoaderConfiguration.java:234)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer.addType(OWLRDFConsumer.java:837)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer.addClassExpression(OWLRDFConsumer.java:732)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.TripleHandlers$TypeClassHandler.handleTriple(TripleHandlers.java:2504)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.TripleHandlers$HandlerAccessor.handleStreaming(TripleHandlers.java:177)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer.statementWithResourceValue(OWLRDFConsumer.java:1501)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.statementWithResourceValue(RDFParser.java:365)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.NodeElement.startElement(StartRDF.java:311)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.NodeElementList.startElement(StartRDF.java:374)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.startElement(RDFParser.java:196)
	at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.parse(RDFParser.java:140)
	at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:73)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:197)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1098)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1054)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1004)
	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1015)
	at org.obolibrary.robot.IOHelper.loadOntology(IOHelper.java:308)
	at org.obolibrary.robot.IOHelper.loadOntology(IOHelper.java:192)
	at org.obolibrary.robot.CommandLineHelper.getInputOntology(CommandLineHelper.java:294)
	at org.obolibrary.robot.CommandLineHelper.updateInputOntology(CommandLineHelper.java:351)
Makefile:161: recipe for target 'imports/ncbitaxon_import.owl' failed
make: *** [imports/ncbitaxon_import.owl] Error 1

tried again with 8G

same issue.

25.10.17

Of relavance to me from last night's esip meeting VirtualVoCamp a virtual hack-a-thon like session to expand the semantics of Glacier. Tentatively scheduled for 02.02.17. The VirtualVoCamp proposes quite a few ideas we've already added to ENVO during my lab rotation. Pier says it will look good for my MSc outcomes. Meeting minutes

Link for the esip semantic telecons

For my java to run owltools pier suggest the folling links:

increasing heap size in java and increasing heap for intensive applications

For the AWI computer system: password change // not working more info //also not working. Maybe I have to wait the 3 days for the accounts to be received.

Allgemeine Daten für alle Dienste
For All Services
Username 	kblumber
Passwort, Password 	-)tjJ!7r
Status, Affiliation 	- Student
Ablaufdatum
Expiration Date 	01.05.2018
Mail
Mail-Adresse, Mail 	[email protected]
Zusätzliche Mail-Adresse(n)
Additional Mail Adress(es) 	[email protected]
Webmail 	https://owa.awi.de
Server eingehende Mail (IMAP/POP3)
Incoming Mail Server 	imap.awi.de
Server ausgehender Mail (SMTP)
Outgoing Mail Server 	smtp.awi.de
Anleitung
Instructions 	http://www2.awi.de/de/dienste/rechenzentrum_wissenschaftliches_rechnen/dienste/email/
Ansprechpartner, Contact Persons 	Jörg Kosinski, +49(471)4831-1577
Siegfried Makedanz, +49(471)4831-1250
Potsdam: Heiko Gericke, +49(331)288-2172
Windows mit Metaframe (New!)
Ansprechpartner
Contact Persons 	Helpdesk, +49(471)4831-1403
Potsdam: Heiko Gericke, +49(331)288-2172
VPN (New!)
VPN-Username 	[email protected]
VPN Server: https://vpnasa.awi.de
Bitte folgen Sie dann den Anweisungen auf dieser Seite.
Please follow the instructions on this page. 	
Ansprechpartner, Contact Person 	Jens-Michael Schlüter, +49(471)4831-1416
Unix (New!)
UID (User ID) 	7664
GID (Group ID) 	210
Gruppe, Workgroup 	BIO
Home Directory 	/home/bios1/kblumber
Ansprechpartner
Contact Persons 	Herbert Liegmahl-Pieper, +49(471)4831-1269
Potsdam: Heiko Gericke, +49(331)288-2172

Account erstellt: 20171025

Express the bloom hierarchy as subclasses of community

a similar hierarchy for population (going down to phytoplantkon population bloom) make examples of arctic relevant phytoplankton diatoms pheocystics etc, so that later an instance class can be created using both the community bloom _> phytoplankton bloom class and one of the population bloom classes to represent specific awi data. This could then be further paired with links to other environmental contexts so I can answer a competency questions in the vein of (or more specific than):

"Can I find data about plankton blooms which involve different primary driving species and which occur under different environmental contexts?

Algal bloom process phase add a downstream of or within axiom from algal bloom process, but keep as subclass of environmental system.

Look for specific GO terms to answer the algal bloom toxin production/degradation questions.

Idea people are contracting Leptospirosis bacteria from animal urine contaminated water, as is happening now in Puerto Rico, perhaps this is an example indicator species for human health/SDGs. Mitigating environmental hazards and climate shocks.

26.10.17

Leptospirosis would be at the level of mitigating environmental hazards and climate shocks.

Notes From Eddie:

Plankton classes and groups of interest:

Diatoms:

major group of microalgae class: Bacillariophyceae phyla: Bacillariophyta

Dinoflagellates

class: Dinophyceae

Syndiniales:

order of dinoflagellates

phaeocystis

genus of algae belonging to the Prymnesiophyte class

In PCO we have:

material entity
  environmental feature
    organismal entity 
      collection of organisms
        multi-species collection of organisms
          ecological community
            microbial community

        single species collection of organisms
          population of organisms 

process
  biological process
    multi-organism process 
  
  population process
    

Could instead try having subclasses phytoplankton community such as diatom dominated community. e.g. primarily composed of diatoms. get diatom from NCBITaxon. This alleviates the need to have population of organisms classes such as populations of individual diatom species.

//make some classes for the following: phytoplankton community: Diatoms: Dinoflagellates phaeocystis //check to make sure they are phytoplankton plankton community : Syndiniales //check if they are phytoplankton or just plankton

microbial community: Flavobacteria Alteromonas SAR11 SAR202 Oceanospirillales

aphros

Connect to aphros: the workgroup bioinformatics server for the Deep Sea Ecology and Technology (AWI-MPI) group First run the cisco AnyConnect Secure Mobility Client

ssh kblumber@aphros

made myself a directory here: kblumber@aphros:/scratch2/kblumber

Will need to make a list of software to ask to be installed on aphros to do ontology and sparql work such as jena-arq

27.10.17

created issue 13 to add hole in ice bloom class based on Eddie's presentation.

MPO Metadata Provenance Ontology Automated Metadata, Provenance Catalog, And Navigable Interfaces: Ensuring The Usefulness Of Extreme-Scale Data. Represents scientific workflows in an ontology, cool stuff.

working on plankton ecology page subclasses classes of ecological community, microbial communites phytoplankton algae etc.

I had a interesting conversation with Ivo about using this kind of work to support GfBio. We'll see if having classes like gammaproteobacteria dominated microbial community which link to NCBITaxon and other otologies such as ENVO or CHEBI when appropriated is a useful for annotating Gfbio submissions.

30.10.17

Added swarm class and related classes to have as superclass for plankton bloom (material entity) see here

Idea from Pier

Can make examples of both Pre and post composing complicated classes such as cyanobacterial bloom in marine water body.

Could precompose a definition in PCO like I did for cyanobacterial dominated microbial community

or post-compose for example in ENVO by having a class be a subclass of ecosystem, be 'determined by' some bloom population or community

ecosystem of species X bloom 'overlaps' some 'marine water body' 'overlaps (location of (population/community bloom))' 'has active participant' some 'species X'

example class creation

Here I will create the class diatom-dominated lake community using both pre-composition and post-composition design patterns.

Pre-composition example: To specify a diatom dominated community I need the following classes in the ecological community hierarchy.

ecological community
  plankton community
    phytoplankton community
      diatom dominated community

As well as the ... hierarchy

community bloom 
  plankton bloom process
    phytoplankton bloom process

diatom-dominated lake community

is a 'diatom dominated community' 'located in' some lake 'composed primarily of' some NCBITaxon:Bacillariophyta 'formed as result of' some 'phytoplankton bloom process'

'diatom bloom'

Post-composition example:

diatom-dominated lake community

is a 'ecological community' //if removed will put as subclass of thing overlaps some lake overlaps some ('location of' some 'community bloom') // this works 'has active participant' some NCBITaxon:Bacillariophyta 'formed as result of' some 'community bloom'

For my thesis

Illustrate how with semantic research we can defining complex envo classes such ecosystem determined by lentic diatom community and ecosystem determined by lentic diatom community bloom I will do it once as a pre-composition where I push and get an iri for the whole class. Then I will illustrate how to post-compose such classes using pieces from other classes as building blocks to illustrate how it can be done either way.

Pier had a good idea about how to employ the entire scientific method for creating such defined classes. Can we create an example class as the hypothesis, use smaller class fragments as the materials and methods, query some examples for results.

31.10.17

ESIP meeting notes

sweet alignment issue

Semantic Sensor Network (SSN) ontology

http://purl.obolibrary.org/obo/ENVO_01001145 from Chris M to everyone: https://www.ebi.ac.uk/ols/ontologies/envo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FENVO_00001998

02.11.17

Created a page thesis pieces where I will write small sections of my thesis one snipit/competency question at a time.

Added Any23 idea from lewis to the sparql page in case it could help with my attempt to pull all the data together to create a little data repository for the project.

The paper A communal catalogue reveals Earth’s multiscale microbial diversity just came out. It uses EMPO a light-weight application ontology built on ENVO the Earth Microbiome Project Ontology, perhaps relevant to my thesis and the Alliance for Microbial Ontology Curation (AMOC) project proposal.

PCO repository

closed issue #14 see PCO tracker

03.11.17

Working on plankton ecology ENVO terms.

Ideas from pier for axioms for the class sea ice melting

use the PATO: degree of illumination classes, has input some sea ice and water column adjacent to sea ice and atmosphere adjacent to sea ice where the sea ice has decreased illumination, then have output water column with increased illumination, so I can answer competancy question such as:

"What processes result in increased illumination of a marine water body?"

added to thesis pieces here

More useful information about sea ice and phytoplankton blooms:

//sea-ice cover can influence phytoplankton blooms in a variety of ways: Firstly, sea-ice reduces light penetration into the water column, which negatively affects the growth of algae in and under the sea ice //during the ice melt, sea-ice plankton, nutrients and trace elements are released into the upper ocean layer. Maybe try to make a class or at least axioms which describe the release of sea-ice plankton? //This process can accelerate the spring bloom //melting of sea-ice increases the upper ocean stability since freshwater is released into the upper ocean layer //This can either promote blooms by keeping plankton closer to the surface where light levels are favorable //or suppress them by increasing grazing pressure from zooplankton dbxref

marginal ice zone //promotes promotes phytoplankton blooms and enhanced biological productivity

A useful link for marginal ice zone from the Norwegian Polar Institute. The talk about how the marginal ice zone is the site of high productivity and plankton bloom, as well as essential for the energy budjet of arctic species gulls polar bears seal etc and serves as a habbitat for many of such animals to reproduce.

Maybe if I could find some example of an animal which is of cultural importance to an indigenous community in the affected in the Arctic marginal ice zone, I could use the satelite data and marginal sea ice zone semantics to link to sustainable development goals. Like an endangered seal species living in the MIZ which is hunted by x indigenous people.

meeting with Ivo

Ivo request a example based usage of how GFbio should use the following terms to annotate data. Environment, ecosystem, biome, habitat.

If you sample the surface of the desk what is the

where does the biome end and the ecosystem begin, what's the proper way to understand the overlap.

Example use case of ontology term requests from the scientific community.

annotating data with ontology classes examples from GFbio submission, Ideas for competancy questions.

  1. Given an abstract/manuscript what ontology terms could be suggested for the annotation of such dataset (using a textming aproach similar to what I did int the lab rotation). Could do this with gfbio submissions of which we could use the manually finished annotations as the "true" reference. Could alternatively do this on ENA datasets, the same subset as Henny is working on in her review of the Nagoya protocols. Give me the id of all datasets of certain type, then give me all samples from all datasets and give me all properties of type environmetal feature, environmental biome, aggregate back to a dataset level, the samples in this data set have these ... biomes.

  2. Disscuss a how to propagate ontology annotations done on a sample back up to a whole data set inoreder to be able to do a semantic search (for example with GFbio) for ontology classes associated with datasets. Now the envo anotations are in the samples not in the data sets. Does GFbio propagate all unique envo terms from all samples in a dataset up to the dataset level, or do we annotate a dataset only with higher level classes.

06.11.17

Made issue on PCO tracker for PCO:carrying capacity to be ceded to ECOCORE.

For ECOCORE creating the disposition classes for growth limitation. For my thesis I could describe the axiomatization which would be necessary for to create a class: light limitation of photosynthetic organisms in a marine water body or something along those lines similar to the class growth-limiting nutrient making use of ENVO:electromagnetic radiation, ECOCORE:dispostion to limit the growth of an ecological assemblage

Discussion with Pier about the Tilman satelite data it's a bit sparse but we can try to use the bloom semantics I created to represent parts of the data, demonstrating the value of the semantic work.

overflow from ECOCORE editing process:

limiting nutrient

A limiting factor which consists of a nutrient essential for the growth of a population of organisms within an ecosystem, which when lacking in an environment, will limit the growth of a population, despite the presence of other essential material or non material entities required for growth. 

'has part' some CHEBI:'[nutrient](http://purl.obolibrary.org/obo/CHEBI_33284)'

light limitation

A limiting factor which consists of light from solar electromagnetic radiation, essential for the growth of a population of organisms within an ecosystem, which when lacking in an environment, will limit the growth of a population, despite the presence of other essential material or non material entities required for growth. 

can use ENVO:electromagnetic radiation

To be able to express light there is an issue NTRs: radiation which uses NPO's way of modeling it. According to Chris on the npo issue 6 the NPO definition for radiation came from NCIT. Pier suggests I should fork ENVO and try to import the relevant classes (the electromagnetic radiation hierarchy) from NPO

I could also try to deal with the PAR issue New class: photosynthetically active radiation #364 while I'm at it.

07.11.17

From Pier Ofert some recomendations for applying for phd funding.

cordis the Community Research and Development Service, hosts material regarding funding opportunities, the most relevant being Horizon 2020

Could also look for potential opportunities in here

This seems like the place to look for phd funding opportunities. See the Work Programme 2018 - 2020 pdf This has the instances of the marie-currie actions available for the 2018-2020 time frame. Also see the applying and submitting grant proposals page

For Marie Skłodowska-Curie actions, Work Programme 2018 - 2020 pdf see (I believe p59 is what theses are referencing): Eligibility and admissibility conditions:

The admissibility conditions are described in General Annex B of the work programme. The eligibility conditions for Marie Skłodowska-Curie actions apply. Please read the dedicated section in the Marie Skłodowska-Curie part of the work programme.

Evaluation criteria, scoring and threshold:

The selection criteria are described in General Annex H of the work programme. The award criteria, scoring and threshold for Marie Skłodowska-Curie actions apply. Please read the dedicated section in the Marie Skłodowska-Curie part of the work programme.

Evaluation Procedure:

The evaluation procedure for Marie Skłodowska-Curie actions applies. Please read the dedicated section in the Marie Skłodowska-Curie part of the work programme.

Research networks (ITN): support for Innovative Training Networks // for cross institutional degree, perhaps we could try to partner York with JCOMM, or Antje's group or another of Piers collaborators.

ITN supports competitively selected joint research training and/or doctoral programmes, implemented by partnerships of universities, research institutions, research infrastructures, businesses, SMEs, and other socioeconomic actors from different countries across Europe and beyond.

Partnerships take the form of collaborative European Training Networks (ETN), European Industrial Doctorates (EID) or European Joint Doctorates (EJD)

focuses on inter-organizational collaboration.

Individual fellowships (IF) //Pier Ofert mentioned this, could cover a couple years phd salary, if this applies to mobility being between Germany and the UK? Pier O said it might.

Co-funding of regional, national and international programmes that finance fellowships involving mobility to or from another country (COFUND) It says: "The scheme can support doctoral and fellowship programmes." This may have been what Pier O was referring to. Their doctoral programs:

Doctoral programmes address the development and broadening of the research competencies of early-stage researchers. The training follows the EU Principles on Innovative Doctoral Training. Substantial training modules, including digital ones, addressing key transferable skills common to all fields and fostering the culture of Open Science, innovation and entrepreneurship will be supported. Collaboration with a wider set of partner organisations, including from the non-academic sector, which may provide hosting or secondment opportunities or training in research or transferable skills, as well as innovative and interdisciplinary elements of the proposed programme, will be positively taken into account during the evaluation.

Also: "Each researcher must be enrolled in a doctoral programm" //I'm not sure if this means I would have to be accepted into a doctoral program first then being accepted could apply for this?

Research and Innovation Staff Exchanges (RISE) short term funding, but check for it.

The RISE scheme promotes international and cross-sector collaboration through exchanging research and innovation staff, and sharing knowledge and ideas from research to market (and vice-versa).

Doesn't look as promising. I don't think this would apply to me as:

For RISE: supported staff members must be (early-stage or experienced) researchers or administrative, managerial or technical staff supporting the research and innovation activities under the action. They must be actively engaged in or linked to research and/or innovation activities for at least one month (full-time equivalent) at the sending institution, before the first period of secondment.

take away try to write a cofund grand if Pier is accepted. Pier thinks it may be a non issue.

From pier:

He will give me some best practice protocols to clean up, and convert into html.

08.11.17

Reading sparql book, waiting for the best practice protocols to clean up.

working on sparql queries of dbpedia see sparql page.

Idea from Pier to become a dbpedia/wikipedia editor and add some ENVO annotations to some sea ice environment classes (like ones from my lab rotation) to demonstrate an outreach case for the awi to have knowledge quickly disseminated to the public on Wikipedia.

09.11.17

Reading sparql book, trying queries.

from ex052.rq

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?propertyLabel ?value 

#FROM <http://xmlns.com/foaf/spec/index.rdf>
WHERE
{
  ?s ?property ?value . 
  ?property rdfs:label ?propertyLabel . 
}

I learned you can locally run such a script and call data using a url you run it like this:

arq --data ex050.ttl --data http://xmlns.com/foaf/spec/index.rdf --query ex052.rq arq can take multiple --data arguments which can be hosted locally or be a url to a resource like rdf, I guess the same would apply with an link to a .owl ontology file?

like I did in this line to get to ENVO on the ontobee sparql endpoint FROM <http://purl.obolibrary.org/obo/merged/ENVO>

10.11.17

This is a neat use of the + from regular expressions added to the predicate in a sparql tripple. This allows us to search for ?s that cite paperA and recursively papers that cite papers that cite paperA etc. Perhaps this could be used to to try and link relations that are related to other relations etc for some of my competency questions.

# filename: ex078.rq

PREFIX : <http://learningsparql.com/ns/papers#>
PREFIX c: <http://learningsparql.com/ns/citations#> 

SELECT ?s
WHERE { ?s c:cites+ :paperA . }

can also do property paths:

WHERE { ?s c:cites/c:cites/c:cites :paperA . } like would be in a unix directory structure. this is asking for papers 3 citation links away.

can also use the ^ as inverse operator and flip the subject and object around. This could be useful for a predicate relationship that's only going in one direction, e.g. if relation "cites" exists but "cited by" doesn't in the relations ontology.

SELECT ?s
WHERE { :paperA ^c:cites ?s }

We can mix the ^ and property path like so:

?s c:cites/^c:cites :paperF .

get me any paper that cites a paper cited by paper F, wow, cool stuff for traversing knowledge graphs.

try an example using filter to get get canadian rock guitarists born after 1970.

PREFIX d: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX crg: <http://dbpedia.org/resource/Category:>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?guitarist ?birthdate
WHERE {
?guitarist dcterms:subject crg:Canadian_rock_guitarists  .
?guitarist d:birthDate ?birthdate .
#FILTER(datatype(?birthdate) = <http://www.w3.org/2001/XMLSchema#dateTime>)
#coalesce(xsd:datetime(?birthdate), '1000-01-01')
#FILTER ( datatype(?birthdate) = xsd:datetime )
#FILTER ( coalesce(xsd:datetime(str(?birthdate)), '!') != '!')
#FILTER (xsd:dateTime(?birthdate) >= "1970-01-01"^^xsd:date)

FILTER (str(?birthdate) >= "1970")
FILTER (str(?birthdate) <= "1990")
}

The commented out lines didn't work because of the some probably just one incorrect data values which threw off the datatype = xsd:datetime, I tried to FILTER for non xsd:datetype objects and to coalesce awawy bad objeccts as sugested here and here but they didn't work. but the more hack just using the date solution worked for the dbpedia data. Presumably this same structure using a line such as FILTER (xsd:dateTime(?birthdate) >= "1970-01-01"^^xsd:date) should work with properly formatted xsd:datetype objects

PREFIX d: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX crg: <http://dbpedia.org/resource/Category:>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?guitarist ?birthdate
WHERE {
?guitarist dcterms:subject crg:Canadian_rock_guitarists  .
?guitarist d:birthDate ?birthdate .
FILTER ( datatype(?birthdate) = xsd:date)
FILTER (xsd:date(?birthdate) >= "1970-01-01"^^xsd:date && xsd:date(?birthdate) <= "1990-01-01"^^xsd:date )
}

fixed the issue data vs datetime issue

try using any23 to convert some awi excel data to rdf to query it as an example of what I'm doing in the future.

try installing any23-cli-2.1.tar.gz

also trying with the apache-any23-2.1-src.tar.gz

13.11.17

The sparql book has it's exercises available online to query from http://learningsparql.com/2ndeditionexamples/

for example http://learningsparql.com/2ndeditionexamples/ex012.ttl which I could try to figure out any23 with.

trying to use the Apache Any23 REST Service

local directory on my computer ~/Desktop/grad_school/marmic/master_thesis/sparql_exercises/testing

and some Inorganic nutrients awi data

the rest server can take several formats including HTML data, so I'll take the data from this html link (I clicked view as html on pangaea.) alternatively the server can take csv, so I could try downloading the file as tsv, convert to csv and use that. to try later perhaps.

format: http://<any23-service-host>/<output-format>/<input-uri>

trying: GET http://any23.org/ttl/https://doi.pangaea.de/10.1594/PANGAEA.834685?format=html#download This and going to the url: any23.org/any23/turtle/https://doi.pangaea.de/10.1594/PANGAEA.834685?format=html#download both give the same errors:

Could not fetch input.
================================================================
java.io.IOException: Failed to fetch https://doi.pangaea.de/10.1594/PANGAEA.834685?format=html: 400 Bad request
	at org.apache.any23.http.DefaultHTTPClient.openInputStream(DefaultHTTPClient.java:104)
	at org.apache.any23.source.HTTPDocumentSource.ensureOpen(HTTPDocumentSource.java:66)
	at org.apache.any23.source.HTTPDocumentSource.openInputStream(HTTPDocumentSource.java:73)
	at org.apache.any23.source.MemCopyFactory.createLocalCopy(MemCopyFactory.java:47)
	at org.apache.any23.extractor.SingleDocumentExtraction.ensureHasLocalCopy(SingleDocumentExtraction.java:525)
	at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:211)
	at org.apache.any23.Any23.extract(Any23.java:300)
	at org.apache.any23.Any23.extract(Any23.java:452)
	at org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:114)
	at org.apache.any23.servlet.Servlet.doGet(Servlet.java:79)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:475)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
	at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:403)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:796)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1366)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
================================================================

http://any23.org/any23/turtle/https://doi.pangaea.de/10.1594/PANGAEA.834685 also didn't work

Could not parse input.
================================================================

------------ BEGIN Exception context ------------
ExtractionContext(urn:x-any23:rdf-jsonld:root-extraction-result-id:https://doi.pangaea.de/10.1594/PANGAEA.834685)
Errors {
}
------------ END   Exception context ------------

org.apache.any23.extractor.ExtractionException: Error while parsing RDF document.
	at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:109)
...

Perhaps it's because of the input format?

trying with the link from Download dataset as tab-delimited text

http://any23.org/any23/turtle/https://doi.pangaea.de/10.1594/PANGAEA.834685?format=textfile Again same issue Could not fetch input.

I'll try downloading it as tsv then maybe ...

I will go back to trying the any23 rover [options] input IRIs {<url>|<file>}+

-f for format try something other than json and -e for extractors I should try an example with any web data to see if I can extract from it, or perhaps it even takes local files.

any23 rover https://doi.pangaea.de/10.1594/PANGAEA.834685 get the error

Apache Any23 FAILURE

Execution terminated with errors: Illegal character in path at index 12: Colorometric autoanalysis

maybe I try with a simple csv (or a cleaned up version of this one and see if I can convert it to rdf like that, perhaps the current format needs cleaning up before conversion, I think Pier said something to this effect. I made a simpler csv file and will try on it.

any23 rover test_inorganic_chem.csv this works but I don't understand the output. I'll try making it a different file format than json, and output it to a file.

any23 rover -f turtle test_inorganic_chem.csv gets it as turtle format which is good because I can query over that.

I will switch to working in ~/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing so the files will be tracked on github.

any23 rover -f turtle -o test_inorganic_chem.ttl test_inorganic_chem.csv

OK this works, I have the data as in turtle format. The subjects are of relative position on my computer like <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem.csvEvent2> It recognizes some data types such as http://www.w3.org/2000/01/rdf-schema#label in the predicate or "3"^^<http://www.w3.org/2001/XMLSchema#integer> in the object position. however their are no prefixes. Perhaps I can see if specifying prefixes is one of the rover options.

The idea now being that I download a bunch of awi data clean it up use ontology terms when possible and query the local datastore as an example of what semantically querably data should be like?

I asked Antonio if such a querying schema already exists, he sent Links from micro3 grant agreement and sea data net this searches pangaea and other data providers, is a higher level gui like pangaea to "shop" for data. similar is the geo seas data shop. There is a similar gui data access portal from NOAA. These are probably using machine interoperable querying something like sparql in the back end, but that doesn't seem to be available for the end users. So I'm still unsure if using my masters as a test case of querying for data (converted from PANGAEA to RDF) is redundant or not. I will have to discuss this with Pier. But for now I will try to query this example inorganic chem data.

Query local ttl of inorganic chem data something along the lines of:

Get me all sampling events where the Nitrate was below 10 [µmol/l] and the ?Latitude is less than 79 degrees.

PREFIX ietf: <http://tools.ietf.org/html/>
PREFIX file: <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem_cleaned.csv> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?Event ?Latitude ?Longitude ?Nitrate

FROM <test_inorganic_chem_cleaned.ttl> 

WHERE
{ 
?s a ietf:rfc4180Row .
?s file:Event ?Event .
?s file:Nitrate ?Nitrate .
?s file:Latitude ?Latitude .
?s file:Longitude ?Longitude .

FILTER(xsd:float(?Nitrate) <= 10)
FILTER(xsd:float(?Latitude) <= 79)
}  

arq --query test_inorganic_chem_cleaned.rq --data test_inorganic_chem_cleaned.ttl

it works output:

------------------------------------------------------------------------
| Event | Latitude           | Longitude           | Nitrate           |
========================================================================
| "S3"  | "78.62"^^xsd:float | "5.0168"^^xsd:float | "5.67"^^xsd:float |
| "S3"  | "78.62"^^xsd:float | "5.0168"^^xsd:float | "2.58"^^xsd:float |
| "S3"  | "78.62"^^xsd:float | "5.0168"^^xsd:float | "6.04"^^xsd:float |
| "S3"  | "78.62"^^xsd:float | "5.0168"^^xsd:float | "8.05"^^xsd:float |
| "S3"  | "78.62"^^xsd:float | "5.0168"^^xsd:float | "3.82"^^xsd:float |
------------------------------------------------------------------------

For pier's thing save as docx and html, use header formats H1 H2 etc. I did it see commit from today and sent it back to him.

I may need to make more turtle files to house assertions such as nitrate and phosphate are nutrients, to have as an additional from turtle file to query my local data store. This will enable me to do queries such as get me all instances of nutrients which are nice an ice mass, so the query will learn that nitrate and phosphate data are about nutrients.

Use the iao about relation, and the quality of relations in my turtle files (see board in kitchen), try to replace current the relations in my test_inorganic_chem_cleaned.csv file with those to link it to the ontology graph structure.

Another idea, if I get those other things to succeed and want to automate the process of making my local datastore, I could write python/bash scripts to do webgets on the list of pangaea data files (or even a more sophisticated web crawling script), use the python script to clean up/remove the headers, rename the columns. The have it call the any23 rover ... to make the ttl file perhaps I could even do it just once for each data type. Then Have the script go in to the ttl files and find/replace the necessary relations.

14.11.17

The csv data is converted to triple as follows:

The first rows of headers each is a URI to a local object and they have rdf labels and column positions for example:

<file:/...c.csvEvent>
   rdf:label "Event";
   rfc:cposition 0. 

We want to change these file:/...c.csvEvent to be a a new quality class created with the same design pattern as ENVO:concentration of nitrate in groundwater

The next rows (after the first header row) are data objects for example:

<file:/...c.csvrow0/>
   a html:row;
   <file:/...c.csvEvent> "HG_I" ;
   <file:/...c.csvDatetime> "2013-06-26T06:04" ;
   <file:/...c.csvLatitude> "79.1338" ;
   <file:/...c.csvNitrate> "1.64" ;
    ...

We could change the "a html:row", to "a IAO:data item" or IAO:information content entity

These file:/...c.csvEvent, <c.csv.X> are pointers to the column header objects in the first row, These predicates could be replaced with the IAO:is about relation.

This is followed by two more lines:

<file:/path/to/c.csv> html:row <file:/...c.csvrow0/> .

<file:/...c.csvrow0/> html:rowPosition> "0" .

The first line says the csv file points to an html row object which is the first row object. This could be changed to ... not sure...

The second line says the first row object has a row position of 0. change to inheres in?


NamedIndividual

owl:NamedIndividual a rdfs:Class ;
     rdfs:label "NamedIndividual" ;
     rdfs:comment "The class of named individuals." ;
     rdfs:isDefinedBy <http://www.w3.org/2002/07/owl#> ;
     rdfs:subClassOf owl:Thing . 

so I guess in RDF:

@ PREFIX owl: <http://www.w3.org/2002/07/owl#>

then use it as: owl:NamedIndividual

def:

Is_about is a (currently) primitive relation that relates an information artifact to an entity.

Domain: information content entity

Range: entity

a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence

Domain: specifically dependent continuant in this case: quality

Range: independent continuant

since the range is independent continuant the this cannot inhere in some row of a csv file which is an instance of an IAO:information content entity. As pier had suggested so I will have to rethink the schema.

A generically dependent continuant that is about some thing.

Domain: generically dependent continuant

15.11.17

Trying a variety of things to improve the csv to rdf conversion. Still using turtle format I tried transposing it, but this may be a dead end. I found a website that converts csv to triple via an upload on their example page they used ntriple format, which seems more logical than what I currently have with turtle. So I tried it with their example data with any23

any23 rover -f ntriples -o network_planet_example1.nt network_planet_example1.csv

The output is not as clean as their example however, I think it makes more sense to do it this way. I can try to modify flags to make it better. Tried it doesn't seem super necessary just go with

any23 rover -t -p -f ntriples -o test_basic.nt test_basic.csv for now.

side note: I found a slide show about Apache Any23 - Anything to Triples not super useful.

now try to query the test_basic.nt file.

the example csv file is a very simple just:

Event, Nitrate

A, 1.64

B, 4.1

Clearing out all the unnecessary lines, we are left with the substantial information:

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0> <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvEvent> "A" .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0> <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate> "1.64"^^<http://www.w3.org/2001/XMLSchema#float> .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1> <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvEvent> "B" .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1> <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate> "4.1"^^<http://www.w3.org/2001/XMLSchema#float> .

I created a setup where I have the csv file data to query, and a supplemental ttl file where I specify that the test_basic.csv file, is about , CHEBI:nitrate

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv> <http://purl.obolibrary.org/obo/IAO_0000136> <http://purl.obolibrary.org/obo/CHEBI_17632> .

in ~/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing

run arq --query test_basic_1.rq --data test_basic.nt --data test_basic_supplemental.ttl

This is intending to link the csv object to being IAO:about nitrate, so I can query for objects which have some part about nitrate, (I also want to try having the nitrate column in the test_basic.csv file, IAO:is about, CHEBI:nitrate)

However when I run the query (above) I get

-------------------------------------------------------------------------------------------------------
| s     | p1                                           | o                                            |
=======================================================================================================
| file: | <http://purl.obolibrary.org/obo/IAO_0000136> | <http://purl.obolibrary.org/obo/CHEBI_17632> |
-------------------------------------------------------------------------------------------------------

Where the ?s returns just file: not the entire local file path/file file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv Hence it seems I can't do sparql queries with links to local data. Unless I'm missing something.

Searching around for similar issues, I found this stackoverflow post about Python Sparql Querying Local File. So I'm wondering if I should switch over to using the SPARQLWrapper python package to run my queries. ??

Looking if I can do so in apache jena

I wonder if this is the territory for a --namedgraph FILE : The data to query. It will be included as a named graph. From having a look at the arq named graphs page and the sparql book, it seems named graphs exist at web uri's doesn't seem like this would work with local data.

Tried the python sparql a bit. I need the rdfextras modules to run

pip install rdfextras to import the #rdfextras.registerplugins() # so I can run Graph.query() (aka query the graph object) but

pip install rdfextras doesn't work.

Exception:
Traceback (most recent call last):
  File "/home/kai/.local/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/home/kai/.local/lib/python2.7/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/home/kai/.local/lib/python2.7/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/home/kai/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "/home/kai/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "/home/kai/.local/lib/python2.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
  File "/home/kai/.local/lib/python2.7/site-packages/pip/wheel.py", line 323, in clobber
    shutil.copyfile(srcfile, destfile)
  File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pyparsing.py'

I went to this stack overflow page and tried to do the chmod o+r trick but it isn't solving the problem yet.

sudo pip install rdfextras did the trick but now the module still isn't callable.

Try this again later.

16.11.17

Suggestion from Antje during the presentation, need to provide a use case example for the thesis, preferably some genomic data Katje's eurkaryote data or some of the fram microbial observatory data + satellite data, and or inorganic chem data.

Suggestion from Felix, go to the upcoming FRAM meeting to get some incites about examples of data to interlink/possibly talk to Katje.

Antje said it was a good presentation and she thinks the works is necessary.

playing with the test_basic.py file, to query locally. I got the error message: You performed a query operation requiring a dataset (i.e. ConjunctiveGraph), but operating currently on a single graph. When trying to query a single graph object, so perhaps I need to create a ConjunctiveGraph (which I believe is a local dataset) then query the ConjunctiveGraph object. checking out the rdflib documentation. more documentation on rdflib read the docs page more documentation here json format documentation

I have managed to get the the test_basic.py to be able to perform the same query as arq and get the same results, I just need to figure out how to clean it up.

(rdflib.term.URIRef(u'file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0'), rdflib.term.Literal(u'1.64', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef(u'file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1'), rdflib.term.Literal(u'4.1', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#float')))

This is great because I can now just use a python script with the SPARQLWRAPPER, and rdflib modules to create my own graph/datastore in a rdflib.graph.ConjunctiveGraph() object adding as many RDF files (from CSVs or other) as graph objects into the master graph, and query it all together!!!!!! (to simulate all this data being available as a sparql endpoint) This can also perform queries against a URL endpoint, which I managed to get to run queries against dbpedia. Thus I can query my datastore against the ontobee sparql endpoint to access ENVO and other obo ontology terms. I think the project is coming along in decent shape.

17.11.17

Doing this in python is proving to be a good choice I initialize the ConjunctiveGraph which will function as the datastore (possibly for the entire project). graph = g.ConjunctiveGraph()

Then I can easily add triple objects to the ConjunctiveGraph. with graph.parse('test_basic.nt', format='ntriples')

I had previously made an with the files: test_basic_supplemental.ttl, test_basic.nt where the former created a link saying the csv file (in the latter) IAO:is about CHEBI:nitrate. The datastore which now contains both files within a graph object, when I parse it, it intelligently merged the information from both files. For example, both rdf files mention the csv object but have different information about it. One has the link to chebi and the other has the links to the csv file's number of rows, number of columns, and the links to the specific rows of the csv file.

The individual rows of the csv files are represented as additional objects with links to their data.

@prefix ns1: <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/> .
@prefix ns2: <http://tools.ietf.org/html/> .
@prefix ns3: <http://purl.obolibrary.org/obo/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:test_basic.csv ns3:IAO_0000136 ns3:CHEBI_17632 ;
    ns2:rfc4180numberOfColumns 2 ;
    ns2:rfc4180numberOfRows 2 ;
    ns2:rfc4180row <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0>,
        <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1> .

ns1:test_basic.csvEvent rdfs:label "Event" ;
    ns2:rfc4180columnPosition 0 .

ns1:test_basic.csvNitrate rdfs:label "Nitrate" ;
    ns2:rfc4180columnPosition 1 .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0> a ns2:rfc4180Row ;
    ns1:test_basic.csvEvent "A" ;
    ns1:test_basic.csvNitrate "1.64"^^xsd:float ;
    ns2:rfc4180rowPosition "0" .

<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1> a ns2:rfc4180Row ;
    ns1:test_basic.csvEvent "B" ;
    ns1:test_basic.csvNitrate "4.1"^^xsd:float ;
    ns2:rfc4180rowPosition "1" .

running the script with the query:

PREFIX file: <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ietf: <http://tools.ietf.org/html/>

SELECT ?o ?link ?something
FROM <test_basic.nt>
FROM <test_basic_supplemental.ttl>
WHERE 
{
?s ?p <http://purl.obolibrary.org/obo/CHEBI_17632> .
?s ?p1 ?o .
?link ?label "Nitrate" .
?o ?link ?something .
}

we get:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/1 | file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate | 4.1
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvrow/0 | file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate | 1.64

Essentially this query says get me any object which has some relation to CHEBI:Nitrate, where that object (the csv file) has a sub-object (a row from the csv file) where the thing is labeled as "Nitrate" and fetch that data.

It needs cleaning up in terms of what should be about the nitrate. I think perhaps it would be better to say the column of the csv file file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate is about the nitrate. Or a class like concentration of nitrate in seawater inheres in this column (I don't think this fits the range of inheres in). Or the column is about the concentration of nitrate in seawater Regradless I'll start small with simple example of the links functioning, then try to get the semantics correctly aligned with OBO/ENVO.

next example

I will try another example where the csv file is about some ENVO:marine water body and the nitrate column of the csv file is about some concentration of nitrate in seawater. In the query I will ask for any object which is about an ENVO:marine water body which has data about nitrate (IAO:is about CHEBI:Nitrate?).

tried with concentration of nitrate in seawater be an RDF: blank node but didn't seem to work.

so instead doing mock example with ENVO:concentration of nitrate in groundwater

finally got test_basic_2.rq to work giving us:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv | 1.64
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv | 4.1

20.11.17

Now that I have that query working, (see the previous log), I will try to do it again where I annotate the nitrate column with either a dummy concentration of nitrate in seawater class using a blank node or post compositionally using the concentration of class along with an axiom like

'concentration of' and ('inheres in' some (ammonium and ('part of' some soil)))

in the supplemental ttl file I added:

#the csvNitrate column is about a concentration of class and inheres in some nitrate and part of some seawater
<file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csvNitrate> <http://purl.obolibrary.org/obo/IAO_0000136> <http://purl.obolibrary.org/obo/PATO_0000033> ;
<http://purl.obolibrary.org/obo/RO_0000052> <http://purl.obolibrary.org/obo/CHEBI_17632>;
<http://purl.obolibrary.org/obo/BFO_0000050> <http://purl.obolibrary.org/obo/ENVO_00002149> .

and with the query

PREFIX file: <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_basic.csv> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ietf: <http://tools.ietf.org/html/>
PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?NitrateColumn ?data

FROM <test_basic.nt>
FROM <test_basic_supplemental_2.ttl>

# WHERE { ?s c:cites+ :paperA . }

WHERE 
{
#give me any things that are linked to nitrate
#in this case any data columns about nitrate
?NitrateColumn ?link obo:CHEBI_17632 .

# give me something that is about a marine water body, and get me it's subobjects. 
?s obo:IAO_0000136 obo:ENVO_00001999 ;
	?has ?subObject .
#get data from the nitrate column of the subobject of the thing about a marine water body
?subObject ?NitrateColumn ?data . 
}

We were able to retrieve the csv data.

Now I want to test if I can use python to create the tripples instead of any23. Simple enough looks like no, however I could call any23 from python by echoing a bash command. I'll deal with this later if I need to.

Tested if having parsing the csv file as a n3 or a ttl makes a difference, answer seems to be no. Ok cool.

Moving on to testing the inorganic chem data again. working in /home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem

any23 rover -t -p -f ntriples -o test_inorganic_chem_cleaned.nt test_inorganic_chem_cleaned.csv

see query_1 in this folder, I was able to access the data as I wanted and filter by latitude.

Now try linking the data with some dummy data sourced from a subset of this Physical oceanography and current meter data which I will modify to have latitude/longitudes and date times which overlap with the other dataset on Inorganic nutrients

I changed the Date Time column's values in the test_inorganic_chem_cleaned.csv file to match ones in the test_phys_oce_current.csv. And I'll add lat long metadata to the phys_oce_current which matches values in the inorg_chem.

Latitude 79.1338 Longitude 6.0925 I did this by adding columns for each with those values repeated over and over in the test_phys_oce_current.csv file.

any23 rover -t -p -f ntriples -o test_phys_oce_current.nt test_phys_oce_current.csv

in query_2 I try to link the phys and chem data.

backing up I read from the sparql 1.1 doccumentation that you can do queries like:

SELECT ?x ?name 
  {
     ?x  foaf:mbox <mailto:alice@example> .
     ?x  foaf:knows [ foaf:knows [ foaf:name ?name ]]. 
  }

So maybe I should use this patter the [ []] idea to redo some of the previous queries I did about get me the subthing about an object that is nitrate linked. etc.

21.11.17

(PolarSemantics google group link)[https://drive.google.com/drive/folders/0B9jyOTDPBdhPWi1HWWNaUWNhRTA?usp=sharing] from Ruth Deuerr.

I've been working on trying to have a better design pattern to access the data csv files. unfortunately the way it's stored into a tripple makes it a bit awkward to query,

the csv file gets the location file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem/test_inorganic_chem_cleaned.csv and the columns of the file such as the nitrite column gets the location file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem/test_inorganic_chem_cleaned.csvNitrite Unfortunately I don't know if it's possible to query for something get the csv path, then to add the Nitrite to the end of the URI within a sparql query. Perhaps there's a way I looked briefly into it trying to use sparql VALUES and BIND from here. But for now I may have to accept the kindof awkward query pattern to access data stored into tuples from csv. Which I guess means for the purposes of this project to set some conventions such as the fact that the data items will always be imported from csv meaning they follow this design pattern, where it's acceptable for me to use a query in the form:

# give me a data item, that is about a marine water body, and get me it's rows. 
?s rdf:type obo:IAO_0000027;
	obo:IAO_0000136 obo:ENVO_00001999 ;
	ietf:rfc4180row ?MarWatBodRows .

if I don't specify the ietf:rfc4180row here, then I don't get links to all the columns of the rows of ?MarWatBodRows.

I also specified in the supplemental ttl file that

#the csv file is a data item, and is about a marine water body
file: a obo:IAO_0000027 ;
	obo:IAO_0000136 obo:ENVO_00001999 .

What I don't like about this is that it means we need some a priori knowledge about the structure of the data we are querying for, which is defeats the purpose of being able to perform sparql queries without a priori knowledge. But for the sake of this example I may have to do it this way. Unless I can perhaps create rdf files from html instead of csv. I had briefly tried and failed to do so, but I will give it another quick attempt.

doing it here: /home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_inorganic_chem/html

wget -O test_inorg_chem.html https://doi.pangaea.de/10.1594/PANGAEA.834685?format=html#download

any23 rover -t -p -f ntriples -o test_inorg_chem.nt test_inorg_chem.html

Apache Any23 FAILURE

Execution terminated with errors: Illegal character in path at index 12: Colorometric autoanalysis

Total time: 5s
Finished at: Tue Nov 21 12:15:30 CET 2017
Final Memory: 18M/168M
------------------------------------------------------------------------

This was the same error I fixed in the csv data by removing the header lines.

This isn't working, the html from pangaea is too messy and would require a lot of cleaning. I don't want to waste too much time on this however I'd like to try converting the csv to JSON or html, then the html to ttl.

any23 rover -t -p -f rdfxml -o test_inorganic_chem_cleaned.xml test_inorganic_chem_cleaned.csv

any23 rover -t -p -f ntriples -o test_inorganic_chem_cleaned.nt test_inorganic_chem_cleaned.xml

it ends up being exactly the same. So I guess I'll mostly likely need to stick to csv to rdf.

Back to query_1_1

SELECT DISTINCT ?has 

FROM <test_inorganic_chem_cleaned.nt>
FROM <test_inorganic_chem_cleaned_sup.ttl>

WHERE {
# give me a data item that is about a marine water body, and get me it's rows. 
?s rdf:type obo:IAO_0000027;
	obo:IAO_0000136 obo:ENVO_00001999 ;
	ietf:rfc4180row ?MarWatBodRows .
	#?has ?MarWatBodRows .

?MarWatBodRows ?has ?stuff .
}

This shows via the ?has the columns that exist within a ?MarWatBodRows object.

in query_2 I was able to query two objects the physical_oceanography data and inoraganic_nutrient data, and pull each and both at the same time, however I was unable to perform what would be an Inner join in SQL. Where they have the same Lat, Long and DateTime. Perhaps such an example (although it would be cool if worked) to get data from both data types at the same time and place and join it into a new table which data from both datasets, may not be appropriate for the large heterogeneity of rdf available web data.

I had tried to get to to work with the GRAPH construct see stack overflow on joining graphs Similar material documented here But I can't figure this out.

Looking back at the SmartProtocols: competency questions issue #11 link they make use of the group_concat function quite a lot. Perhaps that would help with the current issue. I found a similar description on this page here.

22.11.17

sparql tutorial videos accompanying the learning sparql book.

Also from this page on sparql rdf queries there seem to be the ability to perform joins in sparql.

Attempting more sparql functions. Tried group by order by and having. For whatever reason both python sparql and apache jena don't like HAVING.

I also tried to play with GEOSPARQL but that cannot be called from a local version of sparql you need other tools to access their api and other resources such as MarkLogic suggested from here.

Trying to edit query_2 to be able to ask for any inorganic chem nitrate data about a marine water body which is at the same lat/long as phys ocean data about a marine current. I've nearly got the group_concat and group by clauses working but to get the filtering working to filter out the inorg chem data which is of a lat and long specified as being about the phys ocean data.

However to filter for the phys ocean data's lat/long in the same query makes it execute the get the lat long query as many times as the other query about getting the rows of data out of the other query. So in query_2_s (s for simplified as I removed the union and the fetching info from the phys ocean data) I tried using a sub-query to get the lat/long info about the marine current object. But when I ran it this way, with the query to get a data item that is about a marine water body and it's subclasses etc. the runtime became super bad and it still performed the ?marCurObj as many times as the other query. So my next thought it to switch the query about the rows of data to be in a subquery.

This is being annyoing. In theory you should be able to nest subqueries and assign limits. as doccumented in this stackoverflow post. However the behavior is still odd.

Moving back to query_2 I was able to solve the issues in a hack way by manually filtering with the Latitude value of 79.1338. Of course to do this I had to run the lines twice for whatever reason a simple = didn't work.

FILTER(xsd:float(?nLat) >= xsd:float(79.1338))
FILTER(xsd:float(?nLat) <= xsd:float(79.1338))

Now I just have to figure out how to simply grab the lat/long values from the data item that is about a marine current, without getting a long list or somehow just reduce the list down to one, and feed those value into my my hack above and we get:

None |  | 323.55 , 339.72 , 338.52 , 378.54 , 367.78 , 369.28 , 368.93 , 334.57 , 360.02 , 376.94 , 333.89 , 329.85 , 327.13 , 358.1 , 361.86 , 341.24 , 368.73 , 363.01 , 310.17 , 374.51 , 381.54 , 361 , 340.27 , 368.46 , 340.6 , 356.78 , 340.63 , 350.18 , 302.03 , 338.49 , 377.6 , 372.47 , 375.51 , 335.4 , 374.09 , 365.99 , 338.41 , 365.6 , 372.88 , 332.9 , 331.56 , 374.4 , 378.53 , 375.43 , 337.89 , 370.75 , 338.75 , 336.4 , 366.24 , 358 , 371.88 , 364.51 , 338.92 , 327.8 , 371.95 , 356.41 , 331.07 , 338.03 , 340.98 , 333.45 , 338.01 , 369.43 , 341.43 , 337.77 , 368.44 , 340.21 , 368.94 , 341.68 , 376.93 , 319.56 , 346.01 , 336.46 , 370.64 , 377.61 , 372.72 , 333.14 , 377.29 , 330.55 , 334.19 , 336.96 , 341.05 , 366.6 , 376.5 , 375.17 , 354.82 , 336.4 , 372.98 , 373.3 , 341.87 , 338.85 , 339.17 , 332.84 , 370.47 , 371.4 , 335.14 , 336.66 , 323.5 , 337.54 , 373.4 , 342.2 , 336.02 , 341.2 , 357.91 , 338.02 , 335.15 , 339.94 , 339.67 , 336.01 , 340.19 , 360.17 , 375.38 , 339.4 , 335.01 , 338.59 , 366.54 , 340.28 , 369.24 , 333 , 324.4 , 340.07 , 371.37 , 379.77 , 375.93 , 323.01 , 331.9 , 373.57 , 355.13 , 335.27 , 339.46 , 372.51 , 340.23 , 328.61 , 367.6 , 378.88
79.1338 | 14.76 , 14.74 , 9.66 , 11.29 , 5.61 , 4.1 , 11.65 , 12.73 , 1.64 , 7.27 |

Where I've used group_concat to get values from the 2 datsets. Where the None | refers to oxygen values from one dataset at the latitude in question and 79.1338 | refers to the nitrate values.

Back in query_2_s (simple) I ended up doing both queries 1) # give me a data item that is about a marine current and get me its Lat and Lon and 2) # give me a data item that is about a marine water body, and get its nitrate and latitude

as separate sub queries joined with a UNION. this solves the problem of query 1) being repeated as many times as query 2) runs. So now I can get just one value of the lat and long data from the metadata about the query, but now to integrate this back into query_2 which already has a union between two parts. perhaps three queries with two union statements, or a union between two queries one of which has a subquery. I'll play around with this now.

when I move the conserved lines such as

#give me any things that somehow linked to nitrate
?NitrateColumn ?no3link obo:CHEBI_17632 .

outside of the select the runtime becomes horrible, I'm not sure if it would finish so I just killed it.

When I run the query without a GROUP BY or ORDER BY, it randomly selects nLat rows + their group_concated data, so eventually you'll randomly get the right one, but I want to be able to control for it. I remember examples where people did things like group by ascending then used OFSET and or LIMIT to grab the right one.

The correct result I want is:

79.1338 | 79.1338 | 1.64 , 4.1 , 12.61 , 14.41 , 14.43 , 13.28 , 14.98 , 12.25 , 14.76 , 5.27 , 14.96 , 0.14 , 11.63 , 10.52 , 12.73 , 6.22 , 0.26 , 6.04 , 14.7 , 14.63 , 13.73 , 13.52 , 0.24 , 0.24 , 8.33 , 12.28 , 14.97 , 0.33 , 9.66 , 2.58 , 7.27 , 3.82 , 14.14 , 11.29 , 8.05 , 14.05 , 5.67 , 10.4 , 11.97 , 14.74 , 5.61 , 11.65 , 11.71 , 11.18 , 9.96 , 12.28 | 302.03 , 361 , 336.02 , 333.89 , 356.41 , 342.2 , 333 , 338.75 , 375.38 , 381.54 , 323.55 , 374.4 , 340.23 , 373.4 , 337.54 , 372.98 , 375.51 , 378.53 , 374.09 , 336.96 , 370.75 , 338.01 , 335.27 , 369.43 , 367.78 , 371.37 , 372.72 , 340.28 , 369.24 , 341.43 , 338.41 , 370.47 , 338.02 , 333.45 , 341.24 , 360.02 , 340.19 , 323.5 , 365.6 , 356.78 , 358 , 338.03 , 371.95 , 335.01 , 340.98 , 368.46 , 377.6 , 378.54 , 335.4 , 336.46 , 372.51 , 331.9 , 341.87 , 372.47 , 338.92 , 371.88 , 339.94 , 368.94 , 364.51 , 363.01 , 340.63 , 350.18 , 319.56 , 366.24 , 338.59 , 338.52 , 377.29 , 378.88 , 336.66 , 376.93 , 341.68 , 365.99 , 366.54 , 334.19 , 336.4 , 332.9 , 310.17 , 366.6 , 336.01 , 368.93 , 346.01 , 327.8 , 339.4 , 369.28 , 358.1 , 340.21 , 373.57 , 337.77 , 340.6 , 375.93 , 328.61 , 339.67 , 333.14 , 341.05 , 376.5 , 377.61 , 329.85 , 337.89 , 360.17 , 335.14 , 338.49 , 379.77 , 334.57 , 374.51 , 327.13 , 331.07 , 367.6 , 370.64 , 324.4 , 368.73 , 338.85 , 340.27 , 323.01 , 339.17 , 339.46 , 354.82 , 330.55 , 331.56 , 335.15 , 341.2 , 340.07 , 372.88 , 355.13 , 375.17 , 332.84 , 373.3 , 371.4 , 357.91 , 361.86 , 336.4 , 339.72 , 375.43 , 368.44 , 376.94

which I got here perchance.

In a new file query_2_s_1.rq I will try to use the following to filter the ?nLat pulled out of the nitrate data filtered with the Lat value of the oxygen data.

SUCCESS!!!!! using this jena tutorial on filtering and optional in sparql I was able to solve the issue. Instead of having a seperate code select block and or union with an individual block to get the known lat out of the file in question. such as:

?marCurObj rdf:type obo:IAO_0000027;
		obo:IAO_0000136 obo:ENVO_01000067 ;
		obo:OBI_0001620 ?oLat .

I just used optional around it, which got us the ?oLat value without iterating that as many times as the code above it, then I could just pass the ?oLat into the FILTER statement which got for us rows from the marine water body object which has a Latitude matching that of ?oLat (pulled from the marine current object having an OBI:latitude value ?oLat)

The final (desired) output is:

79.1338 | 14.76 , 11.65 , 12.73 , 14.74 , 5.61 , 7.27 , 4.1 , 11.29 , 1.64 , 9.66 | 371.88 , 328.61 , 346.01 , 377.61 , 339.46 , 367.78 , 364.51 , 368.44 , 358.1 , 310.17 , 319.56 , 369.43 , 370.75 , 371.95 , 340.98 , 376.93 , 337.89 , 371.37 , 336.4 , 358 , 373.57 , 356.41 , 335.27 , 372.72 , 335.4 , 341.87 , 340.21 , 375.93 , 324.4 , 372.51 , 365.6 , 336.96 , 332.9 , 340.19 , 370.64 , 341.68 , 355.13 , 370.47 , 330.55 , 372.88 , 302.03 , 338.01 , 338.41 , 323.55 , 340.63 , 369.24 , 335.15 , 357.91 , 381.54 , 335.14 , 336.4 , 378.53 , 334.19 , 331.07 , 340.23 , 374.51 , 354.82 , 374.09 , 327.13 , 339.67 , 376.5 , 338.59 , 339.4 , 332.84 , 365.99 , 363.01 , 329.85 , 338.03 , 340.28 , 374.4 , 368.93 , 340.07 , 338.92 , 368.46 , 339.94 , 333.89 , 377.29 , 377.6 , 350.18 , 361 , 331.56 , 339.72 , 338.49 , 333.45 , 338.52 , 372.47 , 366.24 , 360.17 , 336.46 , 327.8 , 367.6 , 366.6 , 378.88 , 339.17 , 334.57 , 378.54 , 375.51 , 360.02 , 341.05 , 356.78 , 338.02 , 337.54 , 366.54 , 371.4 , 369.28 , 375.43 , 341.43 , 335.01 , 379.77 , 323.5 , 333 , 375.17 , 336.02 , 338.85 , 336.66 , 342.2 , 331.9 , 341.2 , 372.98 , 333.14 , 337.77 , 336.01 , 340.6 , 323.01 , 373.3 , 373.4 , 341.24 , 361.86 , 338.75 , 375.38 , 340.27 , 376.94 , 368.94 , 368.73

Where the 79.1338 is the latiude in question, the values | 14.76 ... 9.66 | are the nitrate data from the marine water mass data object, and the values | 371.88 ... to the end, are the oxygen values from the marine current object which contains the latitude of 79.1338 in it's metadata, represented in the test_phys_oce_current_sup.ttl (supplemental file)

The final cleaned up version of the query files is in query_2_FINISHED.rq

I also changed the test_phys_oce_current.csv not to have lat and lon columns but to instead have that info represented in the meta data (like how it is in Pangaea) see latest commit

23.11.17

I showed Pier, he approves. Now I need to clean up the annotations. Add the data's units information either as a csv or in the supplemental annotation/metadata file.

Pier suggested I write in the outlook that although most people won't know how to use sparql queries to perform such semantic searches, future work could involve creating a webapp/GUI which automates the search process with a sparql backend, possible in service of an upcoming AWI data science interlinking grant.

He says Antje wrote an internal Helmholtz grant to address the issues of data interlinking which we could possibly use as my PHD funding (if the grant is successful). Part of that work could be to design a sparql front end for accessing the data, while interlinking the data and contributing to ENVO.

I can in my thesis demonstrate the semantic research aspect of the project, with an example where a column in an awi data file isn't annotated, someone (like an expert on subject material) could got to the ENVO issues page and present the issue about needing a class to represent x phenomena.

Had a meeting for the cryoMIxS, paper. see notes there.

iao

Make an issue about needing a new iao term to describe a data set of aggregated properties, as is commonly used in the life sciences. issue temporarily here

24.11.17

Working on the ontology terms for the cryoMIXs paper see the projects google drive folder

I noticed that the concentration of classes for example concentration of ammonium in soil the axiom:

'concentration of'
 and (
  'inheres in' some (
    ammonium
     and (
       'part of' some 
        soil
    )
  )
 )

is wrong. as stated in the editors note of BFO:part of

A specifically dependent continuant cannot be part of an independent continuant: use 'inheres in'.

Pier suggests these classes may be better defined:

'concentration of'
 and ('inheres in' some (soil and 'has part' ammonium))


    (ammonium
     and ('part of' some soil)))

inheres in min 2 ammonium

CHEBI represents ammonium as an ammonium molecule, whereas envo represents portions. A portion of seawater. could we have an RO class such as 'composed entirely of' but that seems wrong.

My mistake, I misread it. Its there is ammonium which is part of some soil,the concentration of class inheres in this ammonium (the one which is part of some soil). So it works as is.

27.11.17

Today I'm going to create a mini local datastore with one example from most of the awi datasets I think I'll endup using. This is done in the path: /kblumberg_masters_thesis/datastore

I'm not sure about the genomic data yet the Fram microbial observatory stuff from Eddie won't be available until January and Katje's stuff isn't available on Pangaea, so I'll deal with that later perhaps by either by making a faux pangaea excel file using the data from Katje's paper: Biogeography and Photosynthetic Biomass of Arctic Marine Pico-Eukaroytes during Summer of the Record Sea Ice Minimum 2012, to create a dataset like the Bacterial sequence information data.

I'll also add an example of Chlorophyll a measured on water bottle samples

Work done here in new datastore page

this page is useful for xml date and time types Unfortunately the date time format Pangaea is using for example 2014-06-20T09:00 doesn't seem to be compatible with any of the xml standards it's not xsd:dateTime as the seconds are missing. Nor is it datetime, smalldatetime, date, time, or datetime2. For now I will leave these columns un annotated.

also useful for xml date time

28.11.17

Continuing the search for date time. Pier suggested either an owl time or BFO:temporal region.

I also checked the owl time ontology.

so I either go with: @prefix : http://www.w3.org/2006/time# . :inDateTime

:inDateTime
  rdf:type owl:ObjectProperty ;
  rdfs:comment "Position of an instant, expressed using a structured description"@en ;
  rdfs:domain :Instant ;
  rdfs:label "in date-time description"@en ;
  rdfs:range :GeneralDateTimeDescription ;
  rdfs:subPropertyOf :inTemporalPosition ;
  skos:definition "Position of an instant, expressed using a structured description"@en ;

BFO:temporal region.

editor note:Temporal region doesn't have a closure axiom because 
the subclasses don't exhaust all possibilites. An example would be
the mereological sum of a temporal instant and a temporal interval 
that doesn't overlap the instant. In this case the resultant temporal
region is neither 0-dimensional nor 1-dimensional

BFO CLIF specification label:TemporalRegionBFO 

OWL specification label:t-region

elucidation:A temporal region is an occurrent entity that is part of 
time as defined relative to some
reference frame. (axiom label in BFO2 Reference:
[100-001])

has associated axiom(fol):(forall (r) (if (TemporalRegion r)
        (occupiesTemporalRegion r r))) // axiom label in BFO2 CLIF:
        [119-002] ; (forall (x) (if (TemporalRegion x) (Occurrent
        x))) // axiom label in BFO2 CLIF: [100-001] ; (forall (x y)
        (if (and (TemporalRegion x) (occurrentPartOf y x))
        (TemporalRegion y))) // axiom label in BFO2 CLIF:
        [101-001]
has associated axiom(nl):All parts of temporal regions are
        temporal regions. (axiom label in BFO2 Reference:
        [101-001]); Every temporal region t is such that t
        occupies_temporal_region t. (axiom label in BFO2 Reference:
        [119-002])isDefinedBy:bfo.owl

I've decided to go with BFO classes.

temporal instant A connected temporal region comprising a single moment of time. for Date Time columns, and

temporal interval A connected temporal region lasting for more than a single moment of time. for any Time Duration column

In order to post compositionally create classes (as opposed to pre-composing all as new classes I'm making an example of this for the global_chlorophyll_a

done in folder kblumberg_masters_thesis/testing/test_rdf

testing post composition of a class like diatom chlorophyll a to use to annotate the Diatom Chl A column. It was suggested to try with blank nodes.

#mock example to deal with post composition of a class diatom chlorophyll a to use to annotate the Diatom Chl A column

rdf:type a rdf:Property ;
	rdfs:isDefinedBy <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ;
	rdfs:label "type" ;
	rdfs:comment "The subject is an instance of a class." ;
	rdfs:range rdfs:Class ;
	rdfs:domain rdfs:Resource .

in the supplemental ttl file I added:

#mock example to deal with post composition of a class diatom chlorophyll a to use to annotate the Diatom Chl A column
#column is about this class which is a blank node _:b1
file:DiatomChlA obo:IAO_0000136 _:b1 .

# blank node _:b1 represents this post compositionally created class diatom chlorophyll a
# the blank node is an instance of CHEBI:chlorophyll a, and is part of some Coccolithales
_:b1 rdf:type obo:CHEBI_18230 ;
	obo:BFO_0000050 obo:NCBITaxon_418917.

The result:

ns1:global_chlorophyll_a.csvDiatomChlA rdfs:label "Diatom Chl A" ;
    obo:IAO_0000136 [ a obo:CHEBI_18230 ;
            obo:BFO_0000050 obo:NCBITaxon_418917 ] ;
    ns2:rfc4180columnPosition 6 .

says that the DiatomChlA column is about an instance of CHEBI:chlorophyll a, and is part of some Coccolithales

@prefix owl: http://www.w3.org/2002/07/owl# .

perhaps I could use owl:unionOf

owl:unionOf a rdf:Property ;
     rdfs:label "unionOf" ;
     rdfs:comment "The property that determines the collection of classes or data ranges that build a union.";
     rdfs:domain rdfs:Class ;
     rdfs:isDefinedBy <http://www.w3.org/2002/07/owl#> ;
     rdfs:range rdf:List . 

I exported envo to RDF in ttl format to understand how to properly post compose a class. For example examining the class sea ice we have the following code:

###  http://purl.obolibrary.org/obo/ENVO_00002200
<http://purl.obolibrary.org/obo/ENVO_00002200> rdf:type owl:Class ;
                                               rdfs:subClassOf <http://purl.obolibrary.org/obo/ENVO_01000277> ,
                                                               [ rdf:type owl:Restriction ;
                                                                 owl:onProperty <http://purl.obolibrary.org/obo/RO_0002473> ;
                                                                 owl:someValuesFrom <http://purl.obolibrary.org/obo/ENVO_00002149>
                                                               ] ;
                                               <http://purl.obolibrary.org/obo/IAO_0000115> "A frozen portion of sea water." ;
                                               <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "SWEETRealm:SeaIce"^^xsd:string ,
                                                                                                        "https://en.wikipedia.org/wiki/Sea_ice"^^xsd:string ;
                                               <http://www.geneontology.org/formats/oboInOwl#inSubset> "envoPolar" ;
                                               rdfs:label "sea ice"^^xsd:string .

I have a solution to post compose classes (for now) I'll have to test it to make sure it is queried in the same way as would be done against the ontobee endpoint.

In the supplemental ttle file:

#HaptophytaChlA column is about an owl class which is a subclass of chlorophyll a and is part of some Coccolithale
file:HaptophytaChlA obo:IAO_0000136 _:b4 .

_:b4 rdf:type owl:Class ;
	rdfs:subClassOf obo:CHEBI_18230,
		[obo:BFO_0000050 obo:NCBITaxon_418917] .

which is parsed into the datastore graph and expressed in ttl as:

ns1:global_chlorophyll_a.csvHaptophytaChlA rdfs:label "Haptophyta Chl A" ;
    obo:IAO_0000136 [ a owl:Class ;
            rdfs:subClassOf [ obo:BFO_0000050 obo:NCBITaxon_418917 ],
                obo:CHEBI_18230 ] ;
    ns2:rfc4180columnPosition 7 .

which is the equivalent of creating a class:

coccolithophore chlorophyll a

Chlorophyll a which is part of some Coccolithales.

subclassOf 'chlorophyll a' 'part of' some 'Coccolithales'

Next thing to work on is to query ontobee for example get me any classes which are subclasses of sea ice, and look in my local datastore and get me any data about things which are annotated with these subclasses of sea ice. Shouldn't take too long to figure this out just have to figure out how to retrieve from both the sparql endpoint and the local datastore simultaneously I have don't both separately so it should work.

I'll do this with a smaller version of the data in ice-algal-chlorophyll.csv, where I look for any local data which has a column with a link to an obo class multiyear ice which is a subclass of sea ice. Should be done quickly and demonstrate that I can query the obo knowledge graph to access data (in my local datastore).

30.11.17

Attempting query proposed (just above from the end of the 20.11.17 notes). in the folder /kblumberg_masters_thesis/testing/test_ice_algal_chlorophyll I'm doing it on a small subset of data with 2 relevant columns, one with data values for snow depth and one for Sea Ice Type which is annotated with the envo PURL: http://purl.obolibrary.org/obo/ENVO_03000073 for multiyear ice. I'm not sure what the best design pattern do this would be to host data in pangaea. If you'd want to have either or both a column for Sea Ice Type with values like multiyear ice and or a column like Sea Ice Type PURL with values like http://purl.obolibrary.org/obo/ENVO_03000073. Where in the former you'd use a query involving synonyms and or exact label matching to find the data which is about a specific purl. I'll ask Pier. But just query to get these data for now.

From this video about the sparql book here we learn about the service keyword and the book example code to query against dbpedia inside a query:

# filename: ex167.rq

PREFIX cat:     <http://dbpedia.org/resource/Category:>
PREFIX skos:    <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl:     <http://www.w3.org/2002/07/owl#>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/>

SELECT ?p ?o 
WHERE
{
  SERVICE <http://DBpedia.org/sparql>
  { SELECT ?p ?o 
    WHERE { <http://dbpedia.org/resource/Joseph_Hocking> ?p ?o . }
  }
}

When I try running this with sparqlwrapper and rdf lib. it doesn't work it has the error:

raise Exception('ServiceGraphPattern not implemented')
Exception: ServiceGraphPattern not implemented

however it does work with arq jena running the same query, therefore I seems that the ServiceGraphPattern is not implemented in the python package. I will instead use the SPARQLWrapper package.

endpoint = SPARQLWrapper("http://dbpedia.org/sparql") this works to query dbpedia.

endpoint = SPARQLWrapper("http://sparql.hegroup.org/sparql/") this works to query ontobee.

running the query:

SELECT ?p ?o 
WHERE
{
<http://purl.obolibrary.org/obo/ENVO_00002200> ?p ?o . 
}

I was able to get

http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Class
http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Class
http://www.w3.org/2000/01/rdf-schema#label sea ice
http://www.w3.org/2000/01/rdf-schema#label sea ice
http://www.w3.org/2000/01/rdf-schema#subClassOf http://purl.obolibrary.org/obo/ENVO_01000277
http://www.w3.org/2000/01/rdf-schema#subClassOf http://purl.obolibrary.org/obo/ENVO_01000277
http://www.w3.org/2000/01/rdf-schema#subClassOf nodeID://b43828016
http://www.w3.org/2000/01/rdf-schema#subClassOf nodeID://b43879224
http://www.geneontology.org/formats/oboInOwl#inSubset envoPolar
http://www.geneontology.org/formats/oboInOwl#inSubset envoPolar
http://www.geneontology.org/formats/oboInOwl#hasDbXref SWEETRealm:SeaIce
http://www.geneontology.org/formats/oboInOwl#hasDbXref SWEETRealm:SeaIce
http://www.geneontology.org/formats/oboInOwl#hasDbXref https://en.wikipedia.org/wiki/Sea_ice
http://www.geneontology.org/formats/oboInOwl#hasDbXref https://en.wikipedia.org/wiki/Sea_ice
http://www.geneontology.org/formats/oboInOwl#hasOBONamespace ENVO
http://www.geneontology.org/formats/oboInOwl#hasOBONamespace ENVO
http://www.geneontology.org/formats/oboInOwl#id ENVO:00002200
http://www.geneontology.org/formats/oboInOwl#id ENVO:00002200
http://purl.obolibrary.org/obo/IAO_0000115 A frozen portion of sea water.
http://purl.obolibrary.org/obo/IAO_0000115 A frozen portion of sea water.

The info about the class sea ice. Now to get it's subclasses.

This was successful

using the query:

SELECT ?s 
WHERE
{
?s rdfs:subClassOf+ obo:ENVO_01000277 . 
}

I asked get me anything that is a subclass of water ice as well as anything which is a subclass of those subclasses etc using the + operator after rdfs:subClassOf.

This returns everything as it should.

Pier suggest I have some trivial competancy questions like can I always query the latest version of the ontologies to get knowledge, answer yes using the ontobee sparql endpoint which is updated regularly.

Can I query an older version of an ontology? answer yes using a locally download older release of an ontology for example from github.

Now to connect the bridge I need to figure out how to use python to take this list of subclasses generated in the query against the endpoint and feed it into another query asking for data annotated with any of these subclasses. I think I may be able to pass in the list of python objects into the next query using the sparql values keyword. The Values keyword is described in the sparql book on page 91. However I realize I need be able to somehow pass in the external vector of purls into the query, which is a separate file. Maybe I need to concat together a string which comprises of all the individual query parts including the sparql values filtering of the purls obtained from the list.

Using this page on python string filtering I was able to get it to work!!

Now I will extend the example with 2 more dummy datasets first_year_ice.csv and smog_data.csv to make sure I'm pulling only columns that are subclasses of sea ice and not other things.

any23 rover -t -p -f ntriples -o first_year_ice.nt first_year_ice.csv

any23 rover -t -p -f ntriples -o smog_data.nt smog_data.csv

I modified the example to add these and it worked as expected pulling the data about the firstyear ice and multiyear ice but not the smog data.

This could be added as a competency question which I've added to the list.

Next I'll play with a python script which can, in a modular fashion, put together and execute a sparql query on my datastore and on ontobee.

Starting this in the same directory as for the question above: /kblumberg_masters_thesis/testing/test_ice_algal_chlorophyll in the file test_modular_sqparql_query_script.py To try and get to the same answer as I had before but which was called in a more general specifying the class to get subclass of outside of query_1

01.12.17

working on test_modular_sqparql_query_script.py I wrote several functions which do several things. the triple_merge() function takes a list of my rdf .nt and .ttl data and puts it all together into one file: datastore.ttl, so that I only need one FROM line to specify the triple store. Pier mentioned that in a real endpoint they would only have one triple store. So I will operate do it this way.

useful page on efficient python string concatenation

I also created functions to generate sparql PREFIX, FROM, SELECT, and WHERE clauses. As well as a function subclass_query_function() which take an input ontobee PURL and calls the above mentioned functions with the input purl going to the WHERE function, all of which generates a sparql query which allows us to query against the ontobee sparql endpoint for any classes which are a subclass+ of any desired input class.

I'm doing this in a modular fashion so that I can reuse these same sparql PREFIX, FROM, SELECT, and WHERE clause generating functions to begin building the back end for a sparql GUI.

Pier suggests to write a series of python scripts to do individual query pieces such as query for subclasses of x, query for processes linked to x, query for data concerning x etc. Each as individual python scripts. For the processes linked to x or similar I'd need to figure out how to make an interactive interface which prompts questions and answer to the command line about what they want to search to mimic a GUI. Next is parts and meriat topology ? probably the wrong spelling.

I was able to recreate completely in python the code to do the same queries as in test_ice_algal_chlorophyll.py in the test_modular_sqparql_query_script.py script. Now that I have some ideas about how to make the modules necessary, I can use these advancements to try to write the set of scripts that pier had suggested.

I will start with query for subclasses of x where x is specified via a command line input, and can be run by something like:

./query_for_subclasses_of.py http://purl.obolibrary.org/obo/ENVO_00002200

useful python command line args tutorials here, and here

I will probably eventually need to use the getopt package, however I can start the process in a more simple and understandable way using only the sys command line args as inputs. So I will start with that in new file:

query_for_subclasses_of.py

Got this to work taking command line args, when running it as above

./query_for_subclasses_of.py http://purl.obolibrary.org/obo/ENVO_00002200

Now try to do the second query, searching for data about any of the classes on the input list.

new file query_for_associated_data.py

./query_for_associated_data.py http://purl.obolibrary.org/obo/ENVO_03000073 http://purl.obolibrary.org/obo/ENVO_03000072 http://purl.obolibrary.org/obo/ENVO_03000071

Both work separately now try running both the query_for_subclasses_of.py and query_for_associated_data.py scripts sequentially.

./query_for_subclasses_of.py http://purl.obolibrary.org/obo/ENVO_00002200 | ./query_for_associated_data.py

Unfortunately this isn't working. It is probably not piping the output of the first the way I want it to.

Instead of trying to figure out how to pipe it between bash stdout and python Pier suggests I just do it between files, that way it can be run on any system be easier for me to setup and be diagnosable with intermediate files.

04.12.17

Created a new directory script_prototyping path:

kblumberg_masters_thesis/testing/test_ice_algal_chlorophyll/script_prototyping to put the testing versions of the modular python scripts which do specific functions while I write them for now.

Got query_for_subclasses_of.py to write out to file. Now to get query_for_associated_data.py to read in from file.

First I want to make the script which will build the datastore, which will be called. merge_triples_to_datastore.py

test run it with:

./merge_triples_to_datastore.py *.nt works also try:

./merge_triples_to_datastore.py *.nt *.ttl also works, due to the line: in_args = sys.argv[1:] take all the in arguments after sys.argv[0] (which is the name of python file itself)

I want to try making this find the in_args internally instead of the user having to explicitly pass them in like the previous command. So I'll look into python reading in files in directory. Got this to work. Currently it takes all files in the current working directory, filters for .nt and .ttl files, and prints the contents of those out to datastore.ttl. It would be cool perhaps if this also made use of rdflib module to print out an already parsed datastore.ttl file. This isn't necessary for now however it would make examining the ttl file easier cleaner and probably be a more correct simulation of building a datastore of triples. Later I could try to add a function to add indvidual csvs to the master triple datatore, to simulate iteratively adding data but for now, I'll stick with just building one from the .ttl and .nt files in the directory.

Got it to work, I just make use of a temp file to write all the lines of all the triple files to then read the temp file parse it and delete the temp file.

Pier suggest to note the version of python and all packages as part of my materials and methods.

For query_for_associated_data.py the issues is the line spacing for the generated query.

filter (?c != ?p) 
VALUES (?filter) { (<http://purl.obolibrary.org/obo/ENVO_03000073
>)(<http://purl.obolibrary.org/obo/ENVO_03000072
>)(<http://purl.obolibrary.org/obo/ENVO_03000071
>) }
}

I think that While reading the infile I need to ger rid of the line breaks. Could try this by making it a list? ['http://purl.obolibrary.org/obo/ENVO_03000073\n' I think it's the line break in there.

fixed this issue with the line in_args += [line.rstrip('\n') for line in in_file]

Now I need to take the input from the command line instead of just hard coded from a specific .txt file. Got it!.

had to fix query_for_associated_data.py but now it works you can run it with: ./query_for_associated_data.py query_for_subclasses_of_out.txt making sure the .txt file is a list of PURLs. Next step for this script is to have it print to an output file, and possibly use a slightly modified version of the sparql query to choose more appropriate columns such as the csv file it came from the label of the csvColumn the data came from and the data. Something along those lines. //TODO

Next task: make a python script which calls the any23 program for every csv file in an input directory to generate .nt triple files for each csv dataset. Prototyping done in /kblumberg_masters_thesis/testing/script_prototyping

call the script: create_rdf_triples_from_csv_files.py

This stack overflow page on %s taught me about a really cool function raw_input() which prompts the shell for user input. So I could use something along these lines to make a decision tree for a user query for a demonstration, which I could use for the meritopoligically related to processes of given class example Pier suggested.

using this stackover flow page for a helper function to send out bash commands from python I got the create_rdf_triples_from_csv_files.py script to work.

mereotopologically related to

Next competency question suggested from Pier:

for a given process give me any subclasses of that process.

For each of those processes, check if it has any participates in relations mainly 'input of' 'output of' and get those continuants. Then look for any data about any of those continuants.

could vary this to use causal relations instead of participates in relations. But start with the participates in (input output for now).

I'll do it with 3 scripts. The first to get anything that's a subclass+ of a given input class (which I already have a script for) A second which takes all subclasses of an object property, for example 'participates in' has subproperties: 'output of', 'formed as result of', 'input of'

Then Another script which takes both the list of processes which are subprocesses generated from the first scipt, a list of subproperties generated from the second script, and looks for the continuant classes which stem from the pairing of the first and second lists. as a triple it would be like inputClass[1:n] inputProperties [1:n] ?predicateClass

Then return back a table like:

subject object predicate
inputClass[1] inputProperty[1] continuant[1]
inputClass[1] inputProperty[2] continuant[2]
inputClass[1] inputProperty[2] continuant[3]
inputClass[2] inputProperty[1] continuant[4]

etc.

Finally I search the list of continuants in the predicates column to see if I have any data about them in my datastore and or columns about them in my datastore.

05.12.17

Suggestion from pier/ thoughts on how to do the data about processes and their participants question (above) to use the a nested for loop for each element in inputClassList with each element of inputPropertyList and pipe all these combinations into a big where clause.

First I need to modify the query_for_subclasses_of.py script to print out the results to a file with the name of the input purl. Got it.

run it with: ./query_for_subclasses_of_input_purl.py http://purl.obolibrary.org/obo/ENVO_00002200

now try running the subclasses script with RO:participates in

./query_for_subclasses_of.py http://purl.obolibrary.org/obo/RO_0000056 no results it's probably because they are of a different relation it has Subproperty Of, instead of rdfs:subClassOf

find the subproperty relation in rdfs here we go it's: rdfs:subPropertyOf

now I will make a new script query_for_subproperties_of_input_purl.py mirror of query_for_subclasses_of_input_purl.py but with rdfs:subPropertyOf instead of subclass.

run it: ./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0000056

mostly works however some of the results are links to non ontobee purls. like http://www.obofoundry.org/ro/ro.owl#agent_in or http://www.ebi.ac.uk/swo/SWO_0000088 which don't resolve. So I will modify the script to filter for results with http://purl.obolibrary.org/obo/ Got this to work, now

./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0000056 produces the file subproperties_of_RO_0000056.txt containing the list:

http://purl.obolibrary.org/obo/RO_0002461
http://purl.obolibrary.org/obo/OOSTT_00000042
http://purl.obolibrary.org/obo/ERO_0002095
http://purl.obolibrary.org/obo/RO_0002217
http://purl.obolibrary.org/obo/RO_0002353
http://purl.obolibrary.org/obo/RO_0002352
http://purl.obolibrary.org/obo/IDO_0100200
http://purl.obolibrary.org/obo/OBI_0000312
http://purl.obolibrary.org/obo/OBI_0000295
http://purl.obolibrary.org/obo/RO_0002354
http://purl.obolibrary.org/obo/ERO_0001520
http://purl.obolibrary.org/obo/RO_0002463
http://purl.obolibrary.org/obo/RO_0002462

Next step to make a script that will query all inputClasses and inputProperties. This work will be done in folder: /kblumberg_masters_thesis/testing/script_prototyping/classes_processes_participate_in

It will take as input subproperties_of_RO_0000056.txt (participates in) and subclasses of ENVO:water ice formation process. This example is relevant to the arctic research of AWI, and nicely makes use of work I did in my lab rotation.

so first run: ./query_for_subclasses_of_input_purl.py http://purl.obolibrary.org/obo/ENVO_01000950 to get my list of processes: subclasses_of_ENVO_01000950.txt

create the script: query_for_classes_which_participate_in_input_processes.py for now the script will take both .txt files as parameters and must be called as:

./query_for_classes_which_participate_in_input_processes.py subclasses_of_ENVO_01000950.txt subproperties_of_RO_0000056.txt

Later I can figure out how to use getopt and have it be run with proper input parameters.

I've got it taking in the two input files as lists. Now I'm going to temporarily shift gears to the script query_for_associated_data.py to make it print to outfile instead of to console, as I will need this or similar functionality in query_for_classes_which_participate_in_input_processes.py

./query_for_associated_data.py subclasses_of_ENVO_00002200.txt

first I need to strip the term_URI from the input and write to that, so similar coding to what I just did in query_for_subproperties_of_input_purl.py

right now the results returned form a local query are of type <class'rdflib.plugins.sparql.processor.SPARQLResult'> I need to deal with multiple column data which is returned from these local queries. I propose a list of tuples. the list being a list of tuples as long as the number of rows, and each tuple representing one row which has x number of columns.

list( row1(column1, column2), row2(column1, column2) ... ) I'll see if I can pipe the weird 'rdflib.plugins.sparql.processor.SPARQLResult' object using %s into a list of tuples. This was unnecessary I just used:

for row in results:
   f.write( "%s,%s\n" % row)

I got this working. I renamed it from query_for_associated_data.py to query_data_columns_associated_with_subclasses_of_input_purl.py and run it with:

./query_data_columns_associated_with_subclasses_of_input_purl.py subclasses_of_ENVO_00002200.txt

Back to query_for_classes_which_participate_in_input_processes.py I notice that the subproperties I have generated in the list of subproperties are classes like input of but not their inverse version like has input which we used to annotate the subclasses of 'water ice formation process' So I'll need to find a way of adding the inverseOf classes. I can't find an inverseOf class in rdf or rdfs, and on ontobee 'participates in ' subclasses have the inverse of as a text defintion not a class. Definition: Inverse of has input

I wonder if I could run the same subproperties function with the inverse of 'participates in' which should be 'has participant' PURL: http://purl.obolibrary.org/obo/RO_0000057 and see if I get the desired 'has input', 'has output' classes.

./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0000057

The results have the desired classes such as 'has input' 'has output' etc. For run the query_for_classes_which_participate_in_input_processes.py with both subproperties files as inputs and join the lists inside the function. This could be done elsewhere. Preferably within the query_for_subproperties_of_input_purl.py script which would make use of the inverseOf. which I found in as an owl class:

owl:inverseOf a rdf:Property ;
     rdfs:label "inverseOf" ;
     rdfs:comment "The property that determines that two given properties are inverse." ;
     rdfs:domain owl:ObjectProperty ;
     rdfs:isDefinedBy <http://www.w3.org/2002/07/owl#> ;
     rdfs:range owl:ObjectProperty . 

I will modify the query_for_subproperties_of_input_purl.py to also search any sub properties of the inverse of the input class if their exists an inverseOf the input.

running the following query against the ontobee sparql endpoint:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT DISTINCT ?s
WHERE { 
?s owl:inverseOf <http://purl.obolibrary.org/obo/RO_0000057> .
#?s owl:inverseOf <http://purl.obolibrary.org/obo/RO_0000056> .
}

I am able to use either RO_0000056 or RO_0000057 and retrieve the other class (as expected).

I successfully modified the function to return a complete list of subproperties of both the input class and the inverse of the input class. I check and the list of result is as expected, when run as ./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0000057 It runs a bit slower most likely because it's sending out 3 queries, one for the inverse an input term, then 2 for the subclasses of the original term and the input term.

Now back to query_for_classes_which_participate_in_input_processes.py run as

./query_for_classes_which_participate_in_input_processes.py subclasses_of_ENVO_01000950.txt subproperties_of_RO_0000057.txt

Now I need to deal with structuring sparql queries to be able to access subclasses from obo ontologies which are structured in owl in a more complex manor. This stack overflow post may help with this as we need the queries to be able to handle rdfs:subClassOf which have owl:Restriction and or owl:unionOf. The example makes use of '/' to specify a more complex/deeper path for a triple to follow. I'll try it.

Make use of the file /home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/working/envo_rdf.owl which is envo.owl exported as rdf, to figure out how to make the queries, as well as the ontobee sparql endpoint.

Along these lines I created the page: post composition annotation to formalize dealing with these issues.

The following query is making incremental progress, returning ?o's which are of type owl restriction.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>
SELECT ?o
WHERE {
<http://purl.obolibrary.org/obo/ENVO_03000056> rdfs:subClassOf/rdf:type ?o .
} 

06.12.17

Trying to figure out how to properly query owl ontologies using new ice formation process as example class to get it's inputs and outputs.

I'm going to try exporting envo.owl as a different form of rdf instead of ttl, perhaps I can use .nt or .n3 to see how the nested owl triples get unnested if they are in straight tripple format. Alternatively if I can't export envo, I could use a python script to parse the envo.ttl and print it out as .nt to see how it's done.

Protege exports to RDF/XML, Turtle OWL/XML, OWL/functional syntax, Manchester owl, obo format, Latex, JSON-LD.

So I'll turn the envo_rdf.owl turtle file see /kblumberg_masters_thesis/testing/test_sparql_on_owl

Make the script test_sparql_on_owl.py to parse and write out envo as .nt. Success file output is envo_owl.nt where the new ice formation process code block starts on line 15862. No unfortunately not, the triples for the code block have been distributed throughout the file, and it makes use of blank nodes which makes sense as you can't nest triples in ntriple format.

for example the line:

<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:ub1bL65707C64 .

leads to:

_:ub1bL65707C64 <http://www.w3.org/2002/07/owl#someValuesFrom> _:ub1bL65709C85 .

leads to:

_:ub1bL65709C85 <http://www.w3.org/2002/07/owl#unionOf> _:f7df0c95ff9804381bdb281d4c42fe8b3b1363 .

leads to:

_:f7df0c95ff9804381bdb281d4c42fe8b3b1363 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:f7df0c95ff9804381bdb281d4c42fe8b3b1364 .

leads to:

_:f7df0c95ff9804381bdb281d4c42fe8b3b1364 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:f7df0c95ff9804381bdb281d4c42fe8b3b1365 .

leads to:

_:f7df0c95ff9804381bdb281d4c42fe8b3b1365 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000076> .

where http://purl.obolibrary.org/obo/ENVO_03000076 is 'slush ice' a subclass of new ice formation process.


I will try to get the path to 'has input' some water from 'new ice formation process' to keep this simpler.

http://purl.obolibrary.org/obo/ENVO_03000056 rdf-schema#subClassOf: 2 blank nodes: _:ub1bL65707C64 and _:ub1bL65703C64. The former I tried last time, now the latter?

_:ub1bL65703C64 <http://www.w3.org/2002/07/owl#onProperty> <http://purl.obolibrary.org/obo/RO_0002233> .

where http://purl.obolibrary.org/obo/RO_0002233 is 'has input'

other lines from this blank node:

_:ub1bL65703C64 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:ub1bL65703C64 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://purl.obolibrary.org/obo/ENVO_00002006> .

where http://purl.obolibrary.org/obo/ENVO_00002006 is water. So now I know that the blank node _:ub1bL65703C64 corresponds to the owl clause 'has input' some water and is subclass of 'new ice formation process'

lines of interest:

<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:ub1bL65703C64 .
_:ub1bL65703C64 <http://www.w3.org/2002/07/owl#onProperty> <http://purl.obolibrary.org/obo/RO_0002233> .
_:ub1bL65703C64 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:ub1bL65703C64 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://purl.obolibrary.org/obo/ENVO_00002006> .

when I run the following query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT ?nodeProp
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/2002/07/owl#onProperty> ?nodeProp .
}

I get:

http://purl.obolibrary.org/obo/RO_0002233
http://purl.obolibrary.org/obo/RO_0002234
http://purl.obolibrary.org/obo/RO_0002233
http://purl.obolibrary.org/obo/RO_0002234

has input and has output each twice (I'm not sure why twice yet), but I am getting the input and outputs as desired.

This query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT ?nodeProp ?someClass
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .

?node <http://www.w3.org/2002/07/owl#onProperty> ?nodeProp .

?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .

?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?someClass .
}

with output:

nodeProp	                                   someClass
http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43832476
http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43883684

Is working as desired where the ?node is either 'has input' some water or 'has output' some nodeID://b43832476 or 'has output' some nodeID://b43883684

Now to figure out what nodeID://b43832476 and nodeID://b43883684 are I'm hoping one of them is the owl subclass assertion:

'has output' some 
    ('frazil ice' or shuga or 'slush ice')

It seems to be so, and the union class seems to be stored as an rdf list.

the list starts with:

node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000046> . the frazil ice

node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> node2 .

node2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000075> the 'shuga'

node2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> node3 .

node3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000076> the 'slush ice'

Finally the list ends with: node3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil>

I figured this out using variations on the query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT ?p ?something
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/2002/07/owl#onProperty> ?nodeProp .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?someClass .
<nodeID://b43832476> <http://www.w3.org/2002/07/owl#unionOf> ?o .
?o <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000046> .
?o <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ?o2 .
?o2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ?o3 .
?o3 ?p ?something
}

now to investigate nodeID://b43883684 which is the other thing that 'new ice formation process' 'has output' some nodeID://b43883684 .

nodeID://b43883684 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> and nodeID://b43883684 <http://www.w3.org/2002/07/owl#unionOf> <nodeID://b43883687>

let's find out what this: nodeID://b43883687 node is. it's the beginning of the union class's list. first frazil ice, next: node2


Back to querying from the 'new ice formation process' the following query where we look for subclasses of new ice formation which are of type owl restriction.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT ?node ?p ?o
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node ?p ?o . 
}

This gives us:

nodeID://b43832474	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://www.w3.org/2002/07/owl#Restriction
nodeID://b43832475	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://www.w3.org/2002/07/owl#Restriction
nodeID://b43883682	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://www.w3.org/2002/07/owl#Restriction
nodeID://b43883683	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://www.w3.org/2002/07/owl#Restriction
nodeID://b43832474	http://www.w3.org/2002/07/owl#onProperty	http://purl.obolibrary.org/obo/RO_0002233
nodeID://b43832475	http://www.w3.org/2002/07/owl#onProperty	http://purl.obolibrary.org/obo/RO_0002234
nodeID://b43883682	http://www.w3.org/2002/07/owl#onProperty	http://purl.obolibrary.org/obo/RO_0002233
nodeID://b43883683	http://www.w3.org/2002/07/owl#onProperty	http://purl.obolibrary.org/obo/RO_0002234
nodeID://b43832474	http://www.w3.org/2002/07/owl#someValuesFrom	http://purl.obolibrary.org/obo/ENVO_00002006
nodeID://b43832475	http://www.w3.org/2002/07/owl#someValuesFrom	nodeID://b43832476

In the class we have 4 blank nodes which are of type owl restriction.

The first: nodeID://b43832474 has owl:onProperty has input and owl:someValuesFrom water

The second nodeID://b43832475 has owl:onProperty has output and owl:someValuesFrom a new node nodeID://b43832476 which I believe is the union class. This node is of type owl:class and is the owl:unionOf nodeID://b43832479, which has first 'frazil ice' and rest a pointer to nodeID://b43832478 ... the nodes in the rdf list for the union class. These are also found from the 4th node.

The third nodeID://b43883682 has owl:onProperty has input This node has type owl#restriction (so it's the owl restriction) which has owl:property has input and has owl:someValuesFrom water. So this node is representing the same info as the first node. I'm not sure exactly why there are two different blank nodes here which are both of type owl restriction and which store the same info. I'm guessing it has to do with the nested nature of owl syntax but I'm not sure.

The fourth nodeID://b43883683 has owl:onProperty has output This node is of type owl:restriction on the property has output with owl:someValuesFrom blank node nodeID://b43883684 which is of type owl class, and is the owl#unionOf (in union with) the node nodeID://b43883687 which is the first element of the union class's list. having <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_03000046> first element frazil ice, and having <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <nodeID://b43883686> second element a pointer to the next node in the union class's rdf list. which has a pointer to the second element with first 'shuga' and rest next pointer to node with first 'slush ice' and rest nil (to end the list).

Since it looks like nodes 1 and 3 are the same and nodes 2 and 4 are the same I will check .

For 1 and 3 using a simple query inserting node1 and 3 in:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT ?p ?o
WHERE { 
<nodeID://b43883682> ?p ?o . 
}

I get exactly the same results.

For nodes 2 and 4: running the same queries. I get nearly identical results, the only difference being that. Node2 gives someValuesFrom nodeID://b43832476 and Node 4 gives someValuesFrom nodeID://b43883684.

Now I'll check both of those nodes using the same query format as above: The results again being nearly identical but with different owl:unionOf blank nodes, which when followed through the union list are just duplicates resulting in the same thing. I'm not sure why this is duplicated into two sets of blank nodes representing the subclass:

'has output' some 
    ('frazil ice' or shuga or 'slush ice')

In the end both subclass axioms (the output of above) and 'has input' some water are both duplicated hence the 4 blank nodes. I presume using SELECT DISTINCT is the way to overcome this when querying for them.

Querying from the subclass axiom 'has input' some water is relatively straight forward, but querying form a union class may be a bit trickier as we don't know how many classes are in the union, and I'm not sure how to write the sparql code to iteratively chase through the rdf list from pointer to pointer. perhaps something along the lines of http://www.w3.org/1999/02/22-rdf-syntax-ns#rest+.

I'll start with the easy case: 'has input' some water

This query get me half way there:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT DISTINCT ?property ?value
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
}  
property	                                 value
http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43832476
http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43883684

To filter this to only get the 'has input' some water use the query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT DISTINCT ?property ?value 
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value <http://www.w3.org/2000/01/rdf-schema#label> ?label
}  

which returns

property	                                value
http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006

the 'has input' and water. What we want however in the query_for_classes_which_participate_in_input_processes.py example is pass in a list of subclasses including http://purl.obolibrary.org/obo/ENVO_03000056 and a list of properties including: http://purl.obolibrary.org/obo/RO_0002233 to get the water http://purl.obolibrary.org/obo/ENVO_00002006.

I want to try passing in the value http://purl.obolibrary.org/obo/RO_0002233 as part of a VALUES block. So that the code can do this over a list of inputs.

Got that to work here:

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>
SELECT DISTINCT ?value 
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value <http://www.w3.org/2000/01/rdf-schema#label> ?label
VALUES(?property){(<http://purl.obolibrary.org/obo/RO_0002233>)}
}  

I've expanded it a bit to take two VALUES arguments the first is a list of properties and the second a list of input purls.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>

SELECT DISTINCT ?input_purl ?property ?value 
WHERE { 
?input_purl <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value <http://www.w3.org/2000/01/rdf-schema#label> ?label

VALUES(?property){
(<http://purl.obolibrary.org/obo/RO_0002233>)
(<http://purl.obolibrary.org/obo/RO_0002234>)
}

VALUES(?input_purl){
(<http://purl.obolibrary.org/obo/ENVO_03000056>)
(<http://purl.obolibrary.org/obo/ENVO_03000103>)
(<http://purl.obolibrary.org/obo/ENVO_03000047>)
}}  

This works as expected giving:

input_purl	                                    property	                             value
http://purl.obolibrary.org/obo/ENVO_03000047	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/ENVO_03000103	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_03000027
http://purl.obolibrary.org/obo/ENVO_03000047	http://purl.obolibrary.org/obo/RO_0002234	http://purl.obolibrary.org/obo/ENVO_03000046

where: The first row is: 'frazil ice formation' has 'has input' water The second row is: 'new ice formation process' has 'has input' water The third row is: 'aeolian transport of snow' has 'has input' 'powdery snow' The fourth row is: 'frazil ice formation' has 'has output' 'frazil ice'

Great, now I can pass in a list of PURLS and properties and query get the simple subclass axioms in the form: PURL has property some class. E.g. 'frazil ice formation' has 'has input' water

Now I need to be able to handle the more complex subclass axioms which involve unions. 'has output' some ('frazil ice' or shuga or 'slush ice')

This is a very first stab at it, it gets the data by does so only knowing that the union class list has length 3.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>
SELECT DISTINCT ?value_1 ?value_2 ?value_3 ?node4
WHERE { 
?input_purl <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
?value <http://www.w3.org/2002/07/owl#unionOf> ?node1 .
?node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ?value_1 .
?node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ?node2 .
?node2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ?value_2 .
?node2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ?node3 .
?node3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> ?value_3 .
?node3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> ?node4 .

VALUES(?property){
(<http://purl.obolibrary.org/obo/RO_0002233>)
(<http://purl.obolibrary.org/obo/RO_0002234>)
}
VALUES(?input_purl){
(<http://purl.obolibrary.org/obo/ENVO_03000056>)
(<http://purl.obolibrary.org/obo/ENVO_03000103>)
(<http://purl.obolibrary.org/obo/ENVO_03000047>)
}}

returning:

value_1	                                        value_2	                                           value_3	                                        node4
http://purl.obolibrary.org/obo/ENVO_03000046	http://purl.obolibrary.org/obo/ENVO_03000075	http://purl.obolibrary.org/obo/ENVO_03000076	http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

This stack overflow post about OWL ontology: SPARQL query a range or domain of an ObjectProperty when they're unionOf classes may have what I need to figure out contained within it, especially the part about the property path accessing what's inside the unionOf ?p rdfs:domain/(owl:unionOf/rdf:rest*/rdf:first)* ?d

Using what I code from that page I was able to access all the members of the union in the query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT DISTINCT ?o
WHERE { 
<http://purl.obolibrary.org/obo/ENVO_03000056> <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value owl:unionOf/rdf:rest*/rdf:first ?o .
}  

which returns:

http://purl.obolibrary.org/obo/ENVO_03000076
http://purl.obolibrary.org/obo/ENVO_03000075
http://purl.obolibrary.org/obo/ENVO_03000046

the desired ('frazil ice' or shuga or 'slush ice')

this happens from the most important line ?value owl:unionOf/rdf:rest*/rdf:first ?o . which goes to the unionOf class and iterates through 0 or more rdf:rest's (from the * operator), takes their rdf:first's aka the values and puts them into ?o. OMG THIS IS BRILLIANT!!!!!! Thanks Joshua Taylor for the pro tips!!!

When I merged the two working cases with a union IT's SUPER CLOSE to working just not quite there

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>
SELECT DISTINCT ?input_purl ?property ?value ?o
WHERE { 
?input_purl <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
{
#direct case
?value <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
UNION 
{
#union class case:
?value owl:unionOf/rdf:rest*/rdf:first ?o .
}
VALUES(?property){
(<http://purl.obolibrary.org/obo/RO_0002233>)
(<http://purl.obolibrary.org/obo/RO_0002234>)
}
VALUES(?input_purl){
(<http://purl.obolibrary.org/obo/ENVO_03000056>)
(<http://purl.obolibrary.org/obo/ENVO_03000103>)
(<http://purl.obolibrary.org/obo/ENVO_03000047>)
}
}

This query doesn't work python I think because due to the union there is a column with values in some rows but not others not exactaly sure. But this does work against the ontobee endpoint yielding:

input_purl	                                    property	                                          value	                                o
http://purl.obolibrary.org/obo/ENVO_03000047	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006	
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_00002006	
http://purl.obolibrary.org/obo/ENVO_03000103	http://purl.obolibrary.org/obo/RO_0002233	http://purl.obolibrary.org/obo/ENVO_03000027	
http://purl.obolibrary.org/obo/ENVO_03000047	http://purl.obolibrary.org/obo/RO_0002234	http://purl.obolibrary.org/obo/ENVO_03000046	
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43832476			http://purl.obolibrary.org/obo/ENVO_03000076
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43832476			http://purl.obolibrary.org/obo/ENVO_03000075
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43832476			http://purl.obolibrary.org/obo/ENVO_03000046
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43883684			http://purl.obolibrary.org/obo/ENVO_03000076
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43883684			http://purl.obolibrary.org/obo/ENVO_03000075
http://purl.obolibrary.org/obo/ENVO_03000056	http://purl.obolibrary.org/obo/RO_0002234	nodeID://b43883684			http://purl.obolibrary.org/obo/ENVO_03000046

Finally I have it working!!!!! with the query: saved as /kblumberg_masters_thesis/testing/script_prototyping/classes_processes_participate_in/test_1.rq

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX html: <http://tools.ietf.org/html/>

SELECT DISTINCT ?input_purl ?property ?value 
WHERE { 
?input_purl <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?node .
?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
?node <http://www.w3.org/2002/07/owl#onProperty> ?property . 
{
#direct case
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?value .
?value <http://www.w3.org/2000/01/rdf-schema#label> ?label .
}
UNION 
{
#union class case:
?node <http://www.w3.org/2002/07/owl#someValuesFrom> ?v .
?v owl:unionOf/rdf:rest*/rdf:first ?value .
}
VALUES(?property){
(<http://purl.obolibrary.org/obo/RO_0002233>)
(<http://purl.obolibrary.org/obo/RO_0002234>)
}
VALUES(?input_purl){
(<http://purl.obolibrary.org/obo/ENVO_03000056>)
(<http://purl.obolibrary.org/obo/ENVO_03000103>)
(<http://purl.obolibrary.org/obo/ENVO_03000047>)
} }

which yields:

PURL                                          Property                                 participant
http://purl.obolibrary.org/obo/ENVO_03000047 http://purl.obolibrary.org/obo/RO_0002233 http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/ENVO_03000056 http://purl.obolibrary.org/obo/RO_0002233 http://purl.obolibrary.org/obo/ENVO_00002006
http://purl.obolibrary.org/obo/ENVO_03000103 http://purl.obolibrary.org/obo/RO_0002233 http://purl.obolibrary.org/obo/ENVO_03000027
http://purl.obolibrary.org/obo/ENVO_03000047 http://purl.obolibrary.org/obo/RO_0002234 http://purl.obolibrary.org/obo/ENVO_03000046
http://purl.obolibrary.org/obo/ENVO_03000056 http://purl.obolibrary.org/obo/RO_0002234 http://purl.obolibrary.org/obo/ENVO_03000076
http://purl.obolibrary.org/obo/ENVO_03000056 http://purl.obolibrary.org/obo/RO_0002234 http://purl.obolibrary.org/obo/ENVO_03000075
http://purl.obolibrary.org/obo/ENVO_03000056 http://purl.obolibrary.org/obo/RO_0002234 http://purl.obolibrary.org/obo/ENVO_03000046

This works perfectly using the test list of 3 input purls ('frazil ice formation', 'new ice formation process', and 'aeolian transport of snow') and 2 sub properties (has input and output).

Next step (which should be relatively easy) is to assemble a query like this which adds the PURLS in subclasses_of_ENVO_01000950.txt to the ?input_purl VALUES block, and the PURLS from subproperties_of_RO_0000057.txt to the ?property VALUES block, and runs such query against ontobee. So basically clean/finish up the query_for_classes_which_participate_in_input_processes.py script.

07.12.17

I finished the query_for_classes_which_participate_in_input_processes.py script, runs as expected and we can get lots of data!!!!

Now to test the scripts: query_for_subclasses_of_input_purl.py, query_for_subproperties_of_input_purl.py and query_for_classes_which_participate_in_input_processes.py and try to break them, see where the limitations are. Testing the 3 scripts in now folder /kblumberg_masters_thesis/testing/test_querying_for_classes_which_participate_in_input_processes Test this with chebi and SDGIO classes.

for CHEBI try using the PURL: http://purl.obolibrary.org/obo/CHEBI_84735 'algal metabolite' which doesn't make sense with the script as it's a role (specifically dependent continuant) not process. It looks to me like CHEBI doesn't have processes. SDGIO also doesn't have any processes, which is fine I'll just need to modify the script to two other types of inputs besides processes and properties. I'll try this later first I'll try this with other envo classes.

Pier had suggested to make a similar script for merotopologically related to to get subproperties like 'has part' 'part of' etc. In theory I should be able to run my script using merotopologically related to as the property. Lets give it a go. I'll test it with ENVO:storm http://purl.obolibrary.org/obo/ENVO_01000876

./query_for_subclasses_of_input_purl.py http://purl.obolibrary.org/obo/ENVO_01000876

This works as expected.

./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0002323

This breaks:

error:

Traceback (most recent call last):
  File "./query_for_subproperties_of_input_purl.py", line 72, in <module>
    inverse_of_in_arg = inverse_list[0]
IndexError: list index out of range

probably because merotopologically related to doesn't have an inverse. So I need to modify the script to make the searching for an inverse be an if.

commenting out the inverse of query for now I also notice the script doesn't work call on for subproperties of http://purl.obolibrary.org/obo/RO_0002323 'merotopologically related to' nor does it work for http://purl.obolibrary.org/obo/RO_0002131 'overlaps'. the errors are:

Response:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory.  use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool

I will deal with this later. First add an if statement to make sure it can work for PURLS that don't have an inverse. I will do this using the purl: http://purl.obolibrary.org/obo/RO_0002434 'interacts with' because it has sub properties but no inverse. and http://purl.obolibrary.org/obo/RO_0000057 which does.

Fixed that. Tried to run queries such as this

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX html: <http://tools.ietf.org/html/>
SELECT DISTINCT ?s
WHERE {?s rdfs:subPropertyOf+ <http://purl.obolibrary.org/obo/BFO_0000051> . } 

#WHERE {?s rdfs:subPropertyOf/rdfs:subPropertyOf <http://purl.obolibrary.org/obo/RO_0002323> . } 

to get subproperties + of RO properties such as http://purl.obolibrary.org/obo/RO_0002323 'mereotopologically related to' but for classes that are to high in the hierarchy we run into a virtuoso error, it works with classes that have shallower hierarchies. Pier says to include this in my discussion, and contact the ontobee team about this issue, and see if it can be improved to allow for such queries.

Check the RO properties for ones that don't work with the inverse of query. I think this happened for has input, input of something like that. If I can find even one of these it would be worth noting for RO.

I found it and posted the issue to the ontobee tracker here

08.12.17

in the script query_for_subproperties_of_input_purl.py I filter for obo PURLS, I don't however in the subclasses script, which I believe is logical as subproperties should only come from RO, or another OBO ontology. The Subclasses script: query_for_subclasses_of_input_purl.py could/should be from wherever CHEBI (I got to run now I'll try SDGIO).

try it with SDGIO:exposure process

./query_for_subclasses_of_input_purl.py http://purl.unep.org/sdg/SDGIO_00000014

I get the error (kindof as expected)

Traceback (most recent call last):
  File "./query_for_subclasses_of_input_purl.py", line 68, in <module>
    f = open(outstring, 'w')
IOError: [Errno 2] No such file or directory: 'subclasses_of_http://purl.unep.org/sdg/SDGIO_00000014.txt'

it can't write to a file 'subclasses_of_http://purl.unep.org/sdg/SDGIO_00000014.txt' because it has slashes in the path which weren't removed in the regex substitution line as it was directed only for ontobee purls. The quick fix is to the line namestring = re.sub('http://purl.unep.org/sdg/', '', namestring) A better long term fix is to make a regular expression which strips out the leading part of the PURL for any relevant purls we may face. ebi unep ... whatever compensating for format etc. Alternatively it could only take the SDGIO_######## part. That's more easily doable using regex. I could do this later perhaps.

With the quick fix, the script executes, but doesn't return anything. I believe this is because the SDGIO is not hosted on ontobee where we are querying so obviously we won't get back any results. I'll check if http://uneplive.unep.org has a sparql endpoint. It doesn't look like it. I have the impression that the current UNEP live web intelligence portal isn't really that useful or semantically connected. It seems to me it's just doing text based searches on keywords mostly in journal articles. Perhaps my work could serve as a demonstration of how a sparql endpoint could help this portal?

for example when I search in their portal for exposure lead africa

the first result is an article about the NHL (national hockey league) here

When I search the portal for exposure process I only get journal articles.

When I search the un environment program main page for exposure process the results are:

Publications
Total results 182
Air quality in Europe: 2015 report - publication
https://wedocs.unep.org/bitstream/handle/20.500.1182...

Minimum Specifications for Health Care Waste Incineration - publication
https://wedocs.unep.org/bitstream/handle/20.500.1182...

Africa Environment Outlook 3: our environment, our health - publication
https://wedocs.unep.org/bitstream/handle/20.500.1182...

Health effects of black carbon - publication
https://wedocs.unep.org/bitstream/handle/20.500.1182...

Maps
Total results 4
Disasters - Population Exposure (PREVIEW) - - map
http://environmentlive.unep.org/global/data/GL#maps|...

Disasters - Past Events (PREVIEW) - - map
http://environmentlive.unep.org/global/data/GL#maps|...

Disaster Risk (PREVIEW) - - map
http://environmentlive.unep.org/global/data/GL#maps|...

Disasters (PREVIEW) - - map
http://environmentlive.unep.org/global/data/GL#maps|... 

Some of their maps are interesting, but none of it seems to be the SDGIO purl: exposure process. As I suspect as of right now those purls aren't connected to anything we should fix that.

Next I will try testing the subproperties of script for subclasses of mereotopologically related to

try it with 'overlaps' http://purl.obolibrary.org/obo/RO_0002131

./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/RO_0002131

we get the Virtuoso 42000 Error

try it with 'has part' http://purl.obolibrary.org/obo/BFO_0000051

./query_for_subproperties_of_input_purl.py http://purl.obolibrary.org/obo/BFO_0000051

get the error

Traceback (most recent call last):
  File "./query_for_subproperties_of_input_purl.py", line 78, in <module>
    query_for_subproperty = subproperty_query_function(inverse_of_in_arg)
NameError: name 'inverse_of_in_arg' is not defined

same when I run it with 'participates in' Found the error and fixed it. Now try 'has part' again. It executes and returns. I notice however there are purls which are of a different format, for example http://purl.obolibrary.org/obo/tao#integral_part_of or http://purl.obolibrary.org/obo/omp#member_of I wonder if using this list of subproperties to run the query_for_classes_which_participate_in_input_processes.py script will cause an error. Also I need to change the name of that one. As it's not just for processes and participates in, but for classes and properties in general (I'm pretty sure, I should check that as well).

./query_for_classes_which_participate_in_input_processes.py subclasses_of_ENVO_01000950.txt subproperties_of_BFO_0000051.txt

I get the error:

Traceback (most recent call last):
  File "./query_for_classes_which_participate_in_input_processes.py", line 82, in <module>
    results = endpoint.query().convert()
  File "/home/kai/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 601, in query
    return QueryResult(self._query())
  File "/home/kai/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 581, in _query
    raise e
urllib2.HTTPError: HTTP Error 414: Request-URI Too Large

I'm not sure if this is due to the #'s in the purls or because the query it large, I suspect the latter. I will try a couple things 1) query with a much smaller list of subproperties (not including #'s) 2) then query with the small list including #'s.

I'll use the mock file: subproperties_of_RO_0002131 for overlaps which I currently can't generate however I can manually make it to test if properties like 'has part' 'part of' can be found within subclasses of the ENVO sand which has subclasses which use the 'part of' relation.

which I will do here /kblumberg_masters_thesis/testing/test_querying_for_classes_which_participate_in_input_processes

subproperties_of_RO_0002131.txt will include:

http://purl.obolibrary.org/obo/BFO_0000050
http://purl.obolibrary.org/obo/BFO_0000051
http://purl.obolibrary.org/obo/RO_0002473
http://purl.obolibrary.org/obo/RO_0002229

./query_for_subclasses_of_input_purl.py http://purl.obolibrary.org/obo/ENVO_01000017

./query_for_classes_which_participate_in_input_processes.py subclasses_of_ENVO_01000017.txt subproperties_of_RO_0002131.txt This works and returns as hoped for (the purl version of):

Process Property Participant Class
desert sand part of desert
acid dune sand part of dune
beach sand part of sandy beach

Now I know that the query_for_classes_which_participate_in_input_processes.py works generally with class and property as I expected, so I should change the name of the script.

Now I will try it again but add some of the weird purls which were subproperties of 'has part' to subproperties_of_RO_0002131.txt

http://purl.obolibrary.org/obo/tao#integral_part_of
http://purl.obolibrary.org/obo/omp#member_of

the weird purls don't cause it to break, so I think were ok for now. I will modify the ./query_for_classes_which_participate_in_input_processes.py script to reflect the fact that it works for any class or property. Rename it:

query_for_classes_linked_by_input_classes_and_input_properties

test with: ./query_for_classes_linked_by_input_classes_and_input_properties.py subclasses_of_ENVO_01000950.txt subproperties_of_RO_0000057.txt

Revisions made.

I'm satisfied with these scripts for now. The next step is to retrieve data about any of these classes related to properties. To do this I'm going to need to to have annotated data to query for. I would like to annotate the data in an owl compliant way. So that querying it would be the same against ontobee. Which means I need to learn how to post compose classes in mock owl in either n3 or turtle format. Once I figure out how to do this I can annotate all the data in my datastore this way, which is an important milestone.

I will refer to the folder /kblumberg_masters_thesis/testing/test_sparql_on_owl where I have the .nt and .ttl versions of envo. I'll examine some simple classes like those in the ENVO:snow hierarchy.

Staring with a bare bones class: powdery snow

in ttl owl this is the entire code block for the class:

###  http://purl.obolibrary.org/obo/ENVO_03000027
<http://purl.obolibrary.org/obo/ENVO_03000027> rdf:type owl:Class ;
                                               rdfs:subClassOf <http://purl.obolibrary.org/obo/ENVO_01000406> ;
                                               <http://purl.obolibrary.org/obo/IAO_0000115> "Uncompacted snow containing trapped atmospheric gases." ;
                                               <http://purl.obolibrary.org/obo/IAO_0000117> "http://orcid.org/0000-0002-3410-4655" ,
                                                                                            "http://orcid.org/0000-0002-4366-3088" ;
                                               <http://www.geneontology.org/formats/oboInOwl#inSubset> "envoPolar" ;
                                               rdfs:comment "Described as having a fluffy appearance." ;
                                               rdfs:label "powdery snow"@en .

[ rdf:type owl:Axiom ;
   owl:annotatedSource <http://purl.obolibrary.org/obo/ENVO_03000027> ;
   owl:annotatedProperty <http://purl.obolibrary.org/obo/IAO_0000115> ;
   owl:annotatedTarget "Uncompacted snow containing trapped atmospheric gases." ;
   <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "https://en.wikipedia.org/wiki/Types_of_snow" ,
                                                            "https://nsidc.org/cryosphere/snow/science/characteristics.html"
 ] .

Examing the owl axiom for what appears to be the definition: [rdf:type owl:axiom; ... part

_:ub1bL64977C1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Axiom> .
_:ub1bL64977C1 <http://www.w3.org/2002/07/owl#annotatedSource> <http://purl.obolibrary.org/obo/ENVO_03000027> .
_:ub1bL64977C1 <http://www.w3.org/2002/07/owl#annotatedProperty> <http://purl.obolibrary.org/obo/IAO_0000115> .
_:ub1bL64977C1 <http://www.w3.org/2002/07/owl#annotatedTarget> "Uncompacted snow containing trapped atmospheric gases." .
_:ub1bL64977C1 <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "https://en.wikipedia.org/wiki/Types_of_snow" .
_:ub1bL64977C1 <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "https://nsidc.org/cryosphere/snow/science/characteristics.html" .

I wonder what this http://www.w3.org/2002/07/owl#annotatedTarget is.

owl:annotatedTarget a rdf:Property ;
     rdfs:label "annotatedTarget" ;
     rdfs:comment "The property that determines the object of an annotated axiom or annotated annotation." ;
     rdfs:domain rdfs:Resource ;
     rdfs:isDefinedBy <http://www.w3.org/2002/07/owl#> ;
     rdfs:range rdfs:Resource . 

My understading it it's owl way of saying we've targeted this thing (axiom or annotation) to be annotated.

This axiom (the blank node _:ub1bL64977C1) has target "Uncompacted snow containing trapped atmospheric gases." which is the definition. I bet this has to do with the way we add a database cross ref axiom to the definition. I'll quickly check this with another simple class whose definition isn't annotated. sea sand

There was only one blank node for 'sea sand' which which I figured out how an equivalence axiom is programed in owl.

Starting from the line:

_:ub1bL20362C91 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://purl.obolibrary.org/obo/ENVO_00002118> .

_:ub1bL20362C91 is sea sand's only blank node, and it's an owl restriction on property part of some 'sea sand' But where is it from?

After a bit of triple sorting I managed to make sense of the following: this blank node is part of the axiom on the class grain of sea sand:

'grain of sand'
 and ('part of' some 'sea sand')

This line indicates that grain of sand has the equivalence class _:ub1bL20361C68

<http://purl.obolibrary.org/obo/ENVO_00000347> <http://www.w3.org/2002/07/owl#equivalentClass> _:ub1bL20361C68 .

These lines indicate that _:ub1bL20361C68 (the equivalence class) is of type owl class, as well as the fact that this equivalence class is in an AND relationship, (denoted by the owl#intersectionOf relation), with another blank node _:f7df0c95ff9804381bdb281d4c42fe8b3b122.

_:ub1bL20361C68 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
_:ub1bL20361C68 <http://www.w3.org/2002/07/owl#intersectionOf> _:f7df0c95ff9804381bdb281d4c42fe8b3b122 .

Although I can't find the line stating this (hence it's probably not stated) the node _:f7df0c95ff9804381bdb281d4c42fe8b3b122 is the subject of an rdf list, which I encountered previously when parsing the union subclass class in 'new ice formation process'. This example of an rdf list has as first element ENVO:'grain of sand' and rest a pointer to _:f7df0c95ff9804381bdb281d4c42fe8b3b123.

_:f7df0c95ff9804381bdb281d4c42fe8b3b122 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://purl.obolibrary.org/obo/ENVO_00000340> .
_:f7df0c95ff9804381bdb281d4c42fe8b3b122 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:f7df0c95ff9804381bdb281d4c42fe8b3b123 .

The _:f7df0c95ff9804381bdb281d4c42fe8b3b123 has subject _:ub1bL20362C91 aka a pointer to another object and has rdf:rest nil as in this ends this rdf list.

_:f7df0c95ff9804381bdb281d4c42fe8b3b123 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:ub1bL20362C91 .
_:f7df0c95ff9804381bdb281d4c42fe8b3b123 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

The _:ub1bL20362C91 node from which we were pointed before is an owl:restriction, which is restricted on the property 'part of' by someValues from 'sea sand', i.e. this node is the axiom part: ('part of' some 'sea sand')

_:ub1bL20362C91 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Restriction> .
_:ub1bL20362C91 <http://www.w3.org/2002/07/owl#onProperty> <http://purl.obolibrary.org/obo/BFO_0000050> .
_:ub1bL20362C91 <http://www.w3.org/2002/07/owl#someValuesFrom> <http://purl.obolibrary.org/obo/ENVO_00002118> .

All together these lines specify the equivalence axiom within the class grain of sea sand

'grain of sand'
 and ('part of' some 'sea sand')

Which can be written out in mock rdf .ntriple format as follows:

<Some thing> owl:equivalentClass _:blankNodeForOwlClass .

_:blankNodeForOwlClass rdf:type owl:Class .
_:blankNodeForOwlClass owl:intersectionOf _:blankNodeForSubjectOfRDFList .

_:blankNodeForSubjectOfRDFList rdf:first obo:ENVO_00000340 .
_:blankNodeForSubjectOfRDFList rdf:rest _:blankNodeForPointerToOwlRestriction .

_:blankNodeForPointerToOwlRestriction rdf:first _:blankNodeForOwlRestriction .
_:blankNodeForPointerToOwlRestriction rdf:rest rdf:nil .

_:blankNodeForOwlRestriction rdftype owl:Restriction .
_:blankNodeForOwlRestriction owl:onProperty obo:BFO_0000050 .
_:blankNodeForOwlRestriction owl:someValuesFrom obo:ENVO_00002118 .

in turtle the same thing is expressed:

<http://purl.obolibrary.org/obo/ENVO_00000347> rdf:type owl:Class ;
                                               owl:equivalentClass [ owl:intersectionOf ( <http://purl.obolibrary.org/obo/ENVO_00000340>
                                                                                          [ rdf:type owl:Restriction ;
                                                                                            owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ;
                                                                                            owl:someValuesFrom <http://purl.obolibrary.org/obo/ENVO_00002118>
                                                                                          ]
                                                                                        ) ;
                                                                     rdf:type owl:Class
                                                                   ] ;

I've figured it out, it's currently on the board.

inside a blank node you only include the predicate and object so for example [ a owl:class]

Now my question for Pier, would it be better to post com-positionally annotate classes with subclass axioms or with equivalence axioms. For example

coccolithophore chlorophyll a

def: Chlorophyll a which is part of some Coccolithales.

Should the axiom be:

rdfs:subclassOf 'chlorophyll a' 'part of' some 'Coccolithales'

or

owl:equivalentClass 'chlorophyll a' 'part of' some 'Coccolithales'

or perhaps I should do some examples of both?

09.12.17

I may want to express the csv files as instances of a data matrix. To see how owl handle's instances I'm going to export MESO as owl/rdf, and look at some of the instance classes we created during my lab rotation. I could also look for some instances in ENVO.

For example MESO:(http://purl.obolibrary.org/obo/MESO_00000018) has the instance class: MESO:(http://purl.obolibrary.org/obo/MESO_00000019) examining the latter we have:

###  http://purl.obolibrary.org/obo/MESO_00000019
<http://purl.obolibrary.org/obo/MESO_00000019> rdf:type owl:NamedIndividual ,
                                                        <http://purl.obolibrary.org/obo/MESO_00000018> ;
                                               <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "https://github.com/EnvironmentOntology/meso/wiki/FixO3-Best-Practices#nitrate-calibration" ;
                                               rdfs:label "FixO3 best practice specification for calibrating nitrate sensors"@en .

Owl makes an instance class with the relation <instance class> rdf:type owl:NamedIndividual as well as <instance class> rdf:type <class instantiated from>

For my purposes I will put a blank node in the <class instantiated from> part of the triple. Which will be an equivalence class such as:

'data matrix' 
and ('is about' some seawater)

I can find the owl code similar to this in the envo_rdf.owl file in the class: grain of sea sand

I've created the folder /kblumberg_masters_thesis/testing/test_annotation_of_data_files in which to test this with the global_chlorophyll_a.csv data.

In the supplemental ttl file we annotate the csv file:

#the csv file is a data item.
path:global_chlorophyll_a.csv a owl:NamedIndividual;
							  a _:blankNodeForOwlClass .


_:blankNodeForOwlClass rdf:type owl:Class ;
                                               owl:equivalentClass [ owl:intersectionOf ( <http://purl.obolibrary.org/obo/OBCS_0000120>
                                                                                          [ rdf:type owl:Restriction ;
                                                                                            owl:onProperty <http://purl.obolibrary.org/obo/IAO_0000136> ;
                                                                                            owl:someValuesFrom <http://purl.obolibrary.org/obo/ENVO_00002149>
                                                                                          ]
                                                                                        ) ;
                                                                     rdf:type owl:Class
                                                                   ] .

All told this annotation expressed in english is: The csv file is an instance of an owl class, which is an owl equivalence class, which is the intersection between a "data matrix", and the owl restriction, "is about some seawater".

The annotation file along with the triple version of the csv's data parsed in a python script and into a graph is printed out in turtle as:

path:global_chlorophyll_a.csv a [ a owl:Class ;
            owl:equivalentClass [ a owl:Class ;
                    owl:intersectionOf ( obo:OBCS_0000120 [ a owl:Restriction ;
                                owl:onProperty obo:IAO_0000136 ;
                                owl:someValuesFrom obo:ENVO_00002149 ] ) ] ],
        owl:NamedIndividual ;
    ns1:rfc4180numberOfColumns 9 ;
    ns1:rfc4180numberOfRows 2 ;
    ns1:rfc4180row <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvrow/0>,
        <file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvrow/1> .

Which as far as I can tell maintains the owl structure. The parser read that the csv has a blank node, and that that blank node specifies the desired equivalence class for annotation, and joined the equivalance class in the csv's triples. An advantage to doing it this way specifying on one line csv a _blanknode. and in another set of lines: _blanknode owl:equivalentClass [...]. is that it will make it easier for me to manually annotate the data, and perhaps automatically annotate it in the future.

In principle I should make use of a similar owl-complaint structure to annotate the columns. From a previews discussion Pier had suggested the columns be modeled as qualities. I think this follows along the lines of how the concentration of classes such as concentration of ammonium in soil are qualities. This definitely work for at least some of the columns from some of the datasets, for example from the global_chlorophyll_a dataset there are columns such as Chlorophyll a, total with the units µg/l. Thus a quality class such as concentration of chlorophyll a in seawater would be appropriate here.

The equivalence axiom for such a class would be:

'concentration of'
 and ('inheres in' some 
    ('chlorophyll a'
     and ('part of' some 'sea water')))

However I need to figure out the structure by which to add this. I need the right relationship between the column about chl A concentration and the quality class concentration of chlorophyll a in seawater.

Perhaps I need to use the relation 'inheres in'

a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence

The quality column being the specifically dependent continuant, and the independent continuant would be a class along the lines of measured data item or measured value.

There is also the property towards which Pier has also previously mentioned which may solve the problem, It has the editors note:

This relation is provided in order to support the use of relational qualities such as 'concentration of'; for example, the concentration of C in V is a quality that inheres in V, but pertains to C.

Something like

<Data Colum About CHLA Concenration> a owl:NamedIndividual ; //an instance of
                                     towards _blankNodeForColumn .

_blankNodeForColumn -> owl:equivalentClass [ owl:intersectionOf ( concentration of chl A in seawater )

Except I realize this won't work as towards is suppose to be used with quality towards entity. Perhaps we could use the 'is about' relation? but to be consisitent with the pattern used in the annotation of the csv file, the is about would need to be nested within the blank node equivalence class.

Perhaps something along these lines?

<Data Colum About CHLA Concenration> a owl:NamedIndividual ;
                                     a _blankNodeForColumn .

_blankNodeForColumn -> owl: equivlance class: ``'data matrix column' and ('is about' some 'concentration of chla in seawater')

Which I would do probably with a series of blank nodes as not to get super confused. (apologies for mixing sudo turtle with sudo protege owl). Where we'd need a new OBCS class for 'data matrix column' or something to that effect.

Additionally I need to work out how to describe a data item about something which should itself be described using an equivalence class. For example, the global_chlorophyll_a data set, should be annotated by an equivalence class such as chlorophyll A in a marine waterbody. The equivalence axiom being:

'chlorophyll a' 
and ('part of' some 'marine water body')

The joint equivalence class would presumably be something like:

'data matrix' 
 and ('is about' some 
    ('chlorophyll a' 
     and ('part of' some 'marine water body')))

I can find the owl code similar to this in the envo_rdf.owl file in the class: 'concentration of ammonium in soil'.

10.12.17

I have an idea about how to solve the column issue: As there doesn't currently exist a term like OBCS:'data matrix column', we could instead express it as ('part of' some 'data matrix')

And we could express a data column as for example:

<Data Colum About CHLA Concenration> a owl:NamedIndividual ;
                                     a _blankNodeForColumn .

_blankNodeForColumn -> owl: equivlance class: ``('part of' some 'data matrix') and ('is about' some 'concentration of chla in seawater')

This way were saying the Data Colum About CHLA Concenration is an instance of an equivalence class which intersect a part of a 'data matrix' and is about a 'concentration of chla in seawater'

I going off the logic that a data column from a csv file, is part of that csv file.

So we could add triples (written in sudocode) such as:

global_chlorophyll_a.csv obo:haspart global_chlorophyll_a.csvTotalChlA
global_chlorophyll_a.csv obo:haspart global_chlorophyll_a.csvProkChlA

Currently any23 doesn't add a relation which links a csv file to it's columns. It sort of does this in a way with the naming scheme: csv file: global_chlorophyll_a.csv column from that csv file: global_chlorophyll_a.csvProkChlA. I've previously managed to query triples using the triple patern where any23 doesn't explicitly link columns to their csv.

I was able to do so in a long workaround way where I looked for csv's annotated as a 'data matrix' then looked at their rows, filtered for ones matching an input list of PURLS, then fetched the columns and their associated values from such rows, cleaning and filtering along the way to get rid of unnecessary info. I don't know but I suspect this complex query strategy probably has pretty bad runtime. Because it needs to look at all csv's that are data matrix's and then look at all their rows see if any of them have columns we are interested in, then bring back that column and it's data.

I think a better query would look for data matrices which have columns which match desired inputs, then for only those data matrices and columns fetch their rows. I'm hoping this would improve the runtime, and make it easier to ask a query for data from a csv file which is either about some desired input class or has columns which are about such desired input classes. Perhaps later I could test the runtime of both strageties, but for now I want to try and get the latter (which I think a better strategy to work). Thus to the file /kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.ttl I'll add the has part relations between the csv and their columns.

Now to test if I can use this 'has part' column to get columns with a given annotation, they currently have rdf lables, so I will try to query for those labels.

in file /kblumberg_masters_thesis/testing/test_annotation_of_data_files/test_column_query.rq

I've played around with it a bit and I'm convinced this now makes it easier to access the data matrix's columns, and the filter them for one's annotated with a list of terms of interest.

Now to figure out the query pattern to get just the data item that is about a thing of interested (in this case there is only one data matrix but I can still set up the tests where I do and do not get this data matrix which is about some ENVO_00002149 (sea water).

in file /kblumberg_masters_thesis/testing/test_annotation_of_data_files/test_query_for_equivlance_class_about.rq

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?s ?value
FROM <datastore.ttl>
WHERE {

?s rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first obo:OBCS_0000120 ; 
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?value.
}

This works to query the data as I have it to get the csv file and the class that it 'is about'.

--------------------------------------------------
| s                          | value             |
==================================================
| <global_chlorophyll_a.csv> | obo:ENVO_00002149 |
--------------------------------------------------

I probably need to adapt this to deal with cases where is not just 'is about' some seawater but a case such as 'is about' some 'concentration of chla in seawater'. However for now I can now filter for csv's about given input targets. Done in the file: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/test_query_for_input_equivalence_class_about.rq

Finally got this script to work. I forgot that I can't use arq jena to properly interact with the local queries about csv files, (hence why I switched to python in the first place).

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?s 
FROM <datastore.ttl>
WHERE {

?s rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first obo:OBCS_0000120 ; 
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?filterValue.

VALUES (?filterValue) {
(<http://purl.obolibrary.org/obo/ENVO_00002149>)
(<http://purl.obolibrary.org/obo/ENVO_01001037>)
}
}

When I run it, it returns to me the file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv when I run it without the line (<http://purl.obolibrary.org/obo/ENVO_00002149>) it returns nothing. Hence it is finding only the csv of interest based on an input list of PURL's of interest. Thus I can use this to find data which is about a list of input purls, (once I have a datastore annotated in this way).

Now to attempt to annotate a column. As an example I will try to annotate the column ...

with the owl equivalence class: ('part of' some 'data matrix') and ('is about' some 'concentration of ammonium in soil') Because that's a concentration class that already exists. Later I can try to change the annotation to a blank node which is the post compositional creation of a class 'concentration of chla in seawater'

Do this in file column_query_annotated.rq

Got it to work:

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?c ?value
FROM <datastore.ttl>
WHERE {

#get me some thing which is a data matrix about seawater
#could put a ?filterValue in place of obo:IAO_0000136 to screen for input values. 
?s rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first obo:OBCS_0000120 ; 
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom obo:ENVO_00002149 ;
   obo:BFO_0000051 ?c .

#get me a something (a column) which is part of some data matrix and is about some value, and return me that value
#could put a ?filterValue for ?value to screen for input values. 
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:onProperty obo:BFO_0000050 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:someValuesFrom obo:OBCS_0000120 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?value .
}

returns

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006

I can also look for both csv's about and csv columns about an input data list:

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?s ?c ?filterValue
FROM <datastore.ttl>
WHERE {


{
#get me some thing which is a data matrix about seawater
#could put a ?filterValue in place of obo:IAO_0000136 to screen for input values. 
?s rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first obo:OBCS_0000120 ; 
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?filterValue .#obo:ENVO_00002149 .
   #obo:BFO_0000051 ?c .
} UNION
{
#get me a something (a column) which is part of some data matrix and is about some value, and return me that value
#could put a ?filterValue for ?value to screen for input values. 
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:onProperty obo:BFO_0000050 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:someValuesFrom obo:OBCS_0000120 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?filterValue .
}

VALUES (?filterValue) {
(<http://purl.obolibrary.org/obo/ENVO_00002149>)
(<http://purl.obolibrary.org/obo/ENVO_09200006>)
#(<http://purl.obolibrary.org/obo/PATO_0001595>)
(<http://purl.obolibrary.org/obo/ENVO_01001037>)
}
}

Which returns:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv | None | http://purl.obolibrary.org/obo/ENVO_00002149
None | file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006

When I uncomment the line #(<http://purl.obolibrary.org/obo/PATO_0001595>) the query will return that column as well.

11.12.17

Now I will try to post-compositionally annotate a column with a more complex class annotation and query for it.

Currently I have the global_chlorophyll_A csv's TotalChlA column annotated as the equivlance class: ('part of' some 'data matrix') and ('is about' 'concentration of ammonium in soil'). This would work if we had a class 'concentration of chlorophyll A in seawater', but as we don't I want to try post composing this.

('part of' some 'data matrix') 
  and ('is about' some

   'concentration of'
    and ('inheres in' some 
          ('chlorophyll a'
           and ('part of' some 'sea water')))
      )

Unfortunately the envo owl file I exported as turtle doesn't have the concentration of ammonium in soil classes or the specifically dependent continuant hierarchy, but I found similar axioms in the class acidification of an aquatic environment to examine it's substantially complex 'has output' axiom. Using this I was able to learn how the axiom algebra works in pieces and assemble the above axiom. Then I used this massive and deep axiom to annotate the global_chlorophyll_a.csvDiatomChlA column:

path:global_chlorophyll_a.csvDiatomChlA a owl:NamedIndividual ;
                                        a _:blankNodeForglobal_chlorophyll_a.csvDiatomChlA .


_:blankNodeForglobal_chlorophyll_a.csvDiatomChlA rdf:type owl:Class ;
                                               owl:equivalentClass [ owl:intersectionOf ( [ rdf:type owl:Restriction ;
                                                                                            owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ;
                                                                                            owl:someValuesFrom <http://purl.obolibrary.org/obo/OBCS_0000120>
                                                                                          ]
                                                                                          

                                                                                          [ rdf:type owl:Restriction ;
                                                                                            owl:onProperty <http://purl.obolibrary.org/obo/IAO_0000136> ;
                                                                                            owl:someValuesFrom  [ owl:intersectionOf ( <http://purl.obolibrary.org/obo/PATO_0000033>
                                                                                                                                        [ rdf:type owl:Restriction ;
                                                                                                                                         owl:onProperty <http://purl.obolibrary.org/obo/RO_0000052> ;
                                                                                                                                         owl:someValuesFrom [ owl:intersectionOf ( <http://purl.obolibrary.org/obo/CHEBI_18230>
                                                                                                                                                                                    [ rdf:type owl:Restriction ;
                                                                                                                                                                                      owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ;
                                                                                                                                                                                      owl:someValuesFrom <http://purl.obolibrary.org/obo/ENVO_00002149>
                                                                                                                                                                                    ]
                                                                                                                                                                                  ) ;
                                                                                                                                                               rdf:type owl:Class
                                                                                                                                                            ]
                                                                                                                                        ]
                                                                                                                                      ) ;
                                                                                                                    rdf:type owl:Class
                                                                                                                ]
                                                                                          ]
                                                                                        ) ;
                                                                     rdf:type owl:Class
                                                                   ] .

The rdflib graph object seems to have successfully parsed this column's annotation.

Now I want to see if I can get rid of the blank node which I was previously using as:

path:global_chlorophyll_a.csvWaterDepth a owl:NamedIndividual ;
                                        a _:blankNodeForglobal_chlorophyll_a.csvWaterDepth .

Yes I was successful just replace it with: a [ <REST INSIDE HERE> ].

path:global_chlorophyll_a.csvWaterDepth a owl:NamedIndividual ;
                                        a [ rdf:type owl:Class ;
                                            owl:equivalentClass [ owl:intersectionOf ( [ rdf:type owl:Restriction ;
                                                                                         owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ;
                                                                                         owl:someValuesFrom <http://purl.obolibrary.org/obo/OBCS_0000120>
                                                                                       ]                                           

                                                                                       [ rdf:type owl:Restriction ;
                                                                                         owl:onProperty <http://purl.obolibrary.org/obo/IAO_0000136> ;
                                                                                         owl:someValuesFrom <http://purl.obolibrary.org/obo/PATO_0001595>
                                                                                       ]
                                                                                      ) ;
                                                                   rdf:type owl:Class
                                                                ]
                                          ].

Removed unnecessary blank nodes from the annotations I have.

Now to query this monster. doing it in file: /kblumberg_masters_thesis/testing/test_annotation_of_data_filesquery_deeply_annotated_column.rq

I've gotten it to work preliminarily

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?c ?value
FROM <datastore.ttl>
WHERE {
#get me a something (a column) which is part of some data matrix and is about some value, and return me that value
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:onProperty obo:BFO_0000050 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:someValuesFrom obo:OBCS_0000120 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 .
{
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first ?value .
}
UNION
{
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first/owl:someValuesFrom ?value.
}
}

which returns:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/CHEBI_18230
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL52C98
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/ENVO_00002149

For which I'd need to clean in python to remove the blank node: ub2bL52C98 but otherwise tells us that this column is about chl a, and seawater.

As of now I'm able to union together these queries which search annotations for PURLS of interest which are at different depths, but it's done in a very directed way, as in I know the structure of the annotation before hand and am writing a query which gets specifically that. What I'd really like to do is find a way to be able to have the query recursively go through a path without any prior knowledge of the structure of the data.

Something like ?c ./.* <FILTER PURL of INTEREST>

This stackoverflow page how to use Union/or in sparql path with arbitrary length also answered by Joshua Taylor, May offer some hints. It may help, but I'm not understanding it yet.

The other idea is that rdfs:subclassOf+ gives us transitivity for free, perhaps making use of everything being linked as having subclass. I think Pier mentioned this in terms of developing an ontology it would make querying easier if everything in a complex axiom was just a subclass. We don't want to dumb down the ontologies, however perhaps annotation with subclass instead of equivalence classes would simply things.

Another page on lenth of query paths in protege, (looks like protege has a plugin for sparql), says:

For more complex cases that involve choices e.g. the lack of property path expressions imposes some inconvenience and queries such as { ?x rdfs:label | dce:title ?y }, will need to be written by the user, if possible.

Perhaps I could redo the annotation exercise using rdfs:subClassOf instead of owl:equivalentClass but depth here isn't the issue. I guess that's more for subclass of subclass etc. I wonder if querying triplestores annotated in owl with ontology terms is how it is even done. I'd be really nice to get some industry insider knowledge here, to bad the rich control everything and public science receives very little funding.

The sparql 1.1 docs on property paths also kindof useful. You can use | as or in the path apparently, so some |'s with some *'s may do it.

You can use ()'s and |'s and *'s (similar to regex). Perhaps I can build something like

?c rdf:type/owl:equivalentClass/((owl:intersectionOf | owl:someValuesFrom)/(rdf:rest | rdf:first))* ?value

It is actually possible to do something like this. I played around with it a bit on the ontobee sparql endpoint

with the query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
PREFIX obo: <http://purl.obolibrary.org/obo/> 

SELECT ?o2 #?p ?o3
WHERE { 
obo:ENVO_01000630 rdfs:subClassOf/owl:someValuesFrom/( owl:unionOf | owl:intersectionOf )/( (rdf:first) | (owl:unionOf) | (rdf:rest)* )* ?o2. 
}

Something along these lines could work it just gets a little messy trying to tackle to many cases at once. I feel like some concise code for this would require dealing with a lot of annotation examples and making different sub-cases for the different common possibilities.

Returning to this stackoverflow page on unions and values You can't use property paths as variables so you can't pass in paths via a VALUES. which I found out trying the following and from here

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX :path1 <rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first>
PREFIX :path2 <rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first/owl:someValuesFrom>
SELECT ?c ?value
FROM <datastore.ttl>
WHERE {
#get me a something (a column) which is part of some data matrix and is about some value, and return me that value
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:onProperty obo:BFO_0000050 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:someValuesFrom obo:OBCS_0000120 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 .
?c ?paths ?value .
VALUES(?paths){
(:path1)
(:path2)
} 
}

This doesn't work.

When I did a big union of all the posiblities of paths to grab anotation pieces from, it works but I'm concerned about the runtime of having so many union classes.

PREFIX obo: <http://purl.obolibrary.org/obo/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 

SELECT ?c ?value
FROM <datastore.ttl>
WHERE {
#get me a something (a column) which is part of some data matrix and is about some value, and return me that value
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:onProperty obo:BFO_0000050 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:first/owl:someValuesFrom obo:OBCS_0000120 ;
   rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:onProperty obo:IAO_0000136 .
{
?c  rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom ?value .
}
UNION
{
?c  rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:first ?value .
}
UNION
{
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first ?value .
}
UNION
{
?c rdf:type/owl:equivalentClass/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest/rdf:first/owl:someValuesFrom/owl:intersectionOf/rdf:rest*/rdf:first/owl:someValuesFrom ?value.
}
}

Which returns: the following results:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvWaterDepth | http://purl.obolibrary.org/obo/PATO_0001595
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL48C52
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/PATO_0000033
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/CHEBI_18230
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL52C98
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006

12.12.17

Nearly all the datasets I'm interested in representing are probably to be annotated with a single term. Perhaps ice_algal_chlorophyll is to be annotated with an or: 'snow depth' or 'ice depth'

in the file: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/query_for_data_about_csv_and_columns.rq I was able to use unions, group by and group concat to condense the results down to 2 columns, the first for which csv file or csv file column the results come from and the second, a list of the terms which are used in for annotation.

The results are:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv | http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL48C52 , http://purl.obolibrary.org/obo/PATO_0000033 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL52C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvWaterDepth | http://purl.obolibrary.org/obo/PATO_0001595

I want to try and clean this up by using or statements "|" to deal with the different cases as not to use so many union blocks.

Do this in file: query_for_data_about_csv_and_columns_2.rq It works, and uses much less code and only one union between the csv files and csv columns cases.

Trying it (in same file as above) with a values filtering block results are:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv | http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | http://purl.obolibrary.org/obo/PATO_0000033 , http://purl.obolibrary.org/obo/CHEBI_18230 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvWaterDepth | http://purl.obolibrary.org/obo/PATO_0001595

This is great because this tells me what I get back when I pass in args via the filtering block, this can be used to get csv's or csv columns about an input list of terms, and retrieve the data.

The query patterns I currently have, for a csv's data columns allow me to get from an equivalence class which is part of a data matrix, and is about some:

 X

or is about some:

 'X'
 and ('any property' some 
       ('Y '
          and ('any property' some 'Z')
        )
      )

Thus I need to add the case:

'X'
 and ('any property' some 
       ('Y ')
     )

To be able to create an annotation such as:

depth and ('inheres in' some 'water body')

I will test this by annotating the global_chlorophyll_a csv's global_chlorophyll_a.csvLongitude column

with longitude and ('inheres in' some 'water body')

Got it to work, the modified query_for_data_about_csv_and_columns_2.rq returns (without the values filtering):

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv | http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvLongitude | ub2bL71C52 , http://purl.obolibrary.org/obo/OBI_0001620 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL48C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL51C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL52C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvWaterDepth | http://purl.obolibrary.org/obo/PATO_0001595

Which are all the correct values for all three cases. Now I have the ability to annotate data and query for it in the following forms

is about some:

'X'
'X'
 and ('any property' some 
       ('Y ')
     )
 'X'
 and ('any property' some 
       ('Y '
          and ('any property' some 'Z')
        )
      )

I made a file: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/example_annotations/example_annotations.ttl to use as a template to annotate data using these design patterns.

The annotations for the csv files being about a single class works for nearly all of the cases I want to use in my datastore except for the dataset: ice_algal_chlorophyll which ``is about ('snow depth' or 'ice depth') Thus I will figure out how to annotate a csv file with an 'OR' (owl:unionOf) two classes, and figure out how to query for it.

Doing this in the same folder /kblumberg_masters_thesis/testing/test_annotation_of_data_files and adding another csv dataset: snow_height.csv a very reduced 2 column 2 row version of the original dataset, to be annotated with the axiom: ('firn' or 'neve'). I was successfully able to annotate with a union class, and to query it.

Output is:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/snow_height.csv | ub2bL43C52 , http://purl.obolibrary.org/obo/ENVO_03000002 , http://purl.obolibrary.org/obo/ENVO_03000000
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csv | http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/snow_height.csvSnowHeightMean | http://purl.obolibrary.org/obo/PATO_0000915
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvLongitude | ub2bL85C52 , http://purl.obolibrary.org/obo/OBI_0001620 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/snow_height.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvTotalChlA | http://purl.obolibrary.org/obo/ENVO_09200006
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvDiatomChlA | ub2bL62C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL65C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL66C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/global_chlorophyll_a.csvWaterDepth | http://purl.obolibrary.org/obo/PATO_0001595

I've updated

/kblumberg_masters_thesis/testing/test_annotation_of_data_files/example_annotations/example_annotations.ttl for the union of csv case.

13.12.17

added some suggestions from Masi for seawater types in ENVO to the cryomix ontology page.

To annotate the csv's going into the datastore I have some new axiom properties to deal with:

for chlorophyll_a and global_chlorophyll_a

I have the case: 'chlorophyll a' and ('part of' some 'marine water body')

More generally:

X and ('any property' some Y)

For the influence_snow_depth I have: 'physical quality' and ('inheres in' some ('marine water body' and ('adjacent to' some 'sea ice')))

More generally:

X and ('any property' some (Y and ('any property' some Z)))

for ice_algal_chlorophyll we have: ('chlorophyll a' and 'sea ice')

More generally:

(X and Y)

For this case I want to see if only specifying one of X or Y in a values query will still return this result. Because I want this to be and/or not strict and. If so I will simply use (X and Y)

To test this case (X and Y) I will add a ice_algal_chlorophyll.csv to /kblumberg_masters_thesis/testing/test_annotation_of_data_files and annotate with ('chlorophyll a' and 'sea ice'). It works. If we only pass it one of X or Y, it still returns the csv. Becuase I used the rdf:rest*/rdf:first/owl:someValuesFrom it gets everying in the and statment. *for 0 or more elements. Thus this is for the case: (X and Y and Z and ...)

Now to do the X and ('any property' some Y) case. Should mirror the similar case for the column. Got it.

Now for X and ('any property' some (Y and ('any property' some Z))) Same as the code I had before, should have realized that earlier.

Next I need to deal with some new column annotation cases from the ice_algal_chlorophyll dataset.

  1. depth and ('inheres in' some ('sea ice' or 'snow'))
X and ('any property' some ( Y or ...)
  1. 'part of' some 'sea ice'
'any property' some X
  1. 'concentration of' and ('inheres in' some ('chlorophyll a' and ('part of' some ('sea water' and 'part of' some 'water-based planetary surface'))))
X and 
   ('any property' some 
      (Y and 
         ('any property' some 
            (Z and 'any property' some W)
         )
      )
   )

Pick this up tomorrow, as well as update the column names for the global_chl csv (which I changed) on the datastore page. Once I have all annotation axioms for all columns of all csv's plus annotations of the csv's, I could make template annotation.ttl files for each dataset, so this can be optimized to be fit over multiple instances of a dataset. I could write a python script to do some wgets on a list of datasets. Parse each one with sed or something like that to get rid of the header lines and add my header lines. Print out a clean csv version of each. template: some_file.csv, files: some_file1.csv, some_file2.csv etc. Have a ttl some_file.ttl template, and use this to produce some_file1.ttl, some_file2.ttl, etc. Then pipe all those to a folder to call the merge_triple files script to build a massive datastore.

14.12.17

For the EVENT ID fields: is a 'centrally registered identifier' and 'is about' some (observing process or specimen collection process )

could be either or observing process specimen collection process depending on if the data has saved specimens. That way we use this to differentiate between actual samples and just data about something, and answer a competency question about if someone can use this to check if anyone has a a specimen about something they are interested in. Could later connect this to sample and data managment systems -> outlook.

silicate(4-)

velocity and ('inheres in' some ('marine current' and 'has quality' some PATO:horizontal )

Flux classes:

Seston Flux

flux and ('inheres in' some 'marine snow')

Calcium Carbonate Flux

flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('calcium carbonate' and part of some 'organic molecular entity') ))

Particulate Organic Carbon Flux

flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('carbon atom' and part of some 'organic molecular entity') ))

Particulate Organic Nitrogen Flux

flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('nitrogen atom' and 'part of' some 'organic molecular entity') ))

Particulate Silicon Flux

flux and ('inheres in' some ('part of' some 'marine snow') and ('composed primarily of' some ('silicon atom' and 'part of' some 'organic molecular entity') ))

Snow Height Mean:

'expected value' and 'is about' min 2 'data item' about (thickness and inheres in some snow)  

if I can't query the min 2, use some. note handling cardinality in queries. may need a synonym for 'data item'

sea ice type:

'categorical label' is about 'sea ice'

Sea Ice Type Portion:

'fiat object part' and 'part of' some 'sea ice'

add another dummy dataset which is an OTU table and label it something like (is about a microbial community part of sea ice) ask Mariana if she has such a thing I can use, preferably also some functional genomic data to link to processes from GO terms.

Notes from conversation with pier. P.H.D proposal being that I can create this data interlinking system within in the context of the FRAM research projects to help bridge knowledge between projects, and make use of multiple datasets of various types to report on a given phenomena/process.

for the [inorganic_nutrients] csv:

Elevation and Water Depth columns.

(elevation or depth) and ('inheres in' some 'marine water body' )

Make a issue in the ENVO tracker about HYDROFORMS vs waterbody located in vs part of.

the Column: GEAR ID 'centrally registered identifier' and 'manufactured product'

Add comment to PATO cardinal directions issue about expressing the current velocities and cross reference the NCIT cardinal directions.

check if archaea have chlorophyll a, if not prokaryote Chl A is only about bacteria.

Check for envo issue about start time and end time for the Date Time End and Duration columns in biogenic_particle_flux

Look in IAO for a class like sample label use this for local labels, use 'centrally registered identifier'

for ice_algal_chlorophyll site use BFO: site

for the salinity columns, check if the data is only about seawater, if so use the annotation seawater otherwise if it's potentially 'fresh water' or 'melt water' check and use an or statement for all of them. Check the salinity value's if ~30% then seawater.

18.12.17

Goal to: add and test annotation patterns for remaining column data. I coudln't find any envo issue about start time and end time. So I went with

for Date Time and Date Time End 'zero-dimensional temporal region'

and for Duration 'one-dimensional temporal region'

I couldn't find any IAO or related class such as sample label, thus I will post compose it using:

'categorical label' 'is about' some specimen

changed the Salinity column for the ice_algal_chlorophyll dataset to be about meltwater instead of seawater after having looked at the values which were much lower than 30-35% as would be expected for seawater axiom:

osmolarity and ('inheres in' some ('salt' and ('part of' some meltwater)))

Try to condense the code in query_for_data_about_csv_and_columns_2.rq

owl code to handle min 1:

[ rdf:type owl:Restriction ;
                                                                 owl:onProperty <http://purl.obolibrary.org/obo/RO_0000057> ;
                                                                 owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ;
                                                                 owl:onClass <http://purl.obolibrary.org/obo/ENVO_01000760>
                                                               ] ;

19.12.17

Yesterday I added all the notes except to email Mariana about omics data.

Working on new annotation/querying cases for columns in the datastore. I double checked and I have all necessary cases for annotations of csv files, now I just need to get the column cases.

I've decided I will iteratively (for each dataset) make the annotation.ttl file while testing the annotations/querying so that I don't do double work, and can use the actual annotations as the test cases to make sure I annotate and query properly for those cases.

I will do this in the folder /kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries

with script: annotation_of_data_files.py and query query_annotation_of_data_files.rq

In doing so I found some bugs in the merge_triples_to_datastore.py script so I need to debug this. It had two bugs: 1) it wasn't filtering for .csv files fixed with the addition of [.]. Second bug I wasn't deleting datastore.tll files properly, fixed with if statement using lines from the os module. the os module is python 3 stuff, so make sure to keep track of the versions of python I'm using if make a demo for people to use this.

I've decided to always put the rdf:type owl:Class line above the owl:... line to keep it consistent easy to read and hopefully in the future easier to automate.

a [ rdf:type owl:Class ;
   owl:equivalentClass [ ...

I've successfully annotated the inorganic_nutrients data in the file: inorganic_nutrients.ttl and can query for all classes used in annotation using the query: query_annotation_of_data_files.rq

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvPhosphate | ub2bL195C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL196C79 , ub2bL198C72 , http://purl.obolibrary.org/obo/CHEBI_26020 , ub2bL199C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvEvent | ub2bL124C52 , http://purl.obolibrary.org/obo/IAO_0000578 , ub2bL125C78 , ub2bL127C72 , http://purl.obolibrary.org/obo/BCO_0000003 , http://purl.obolibrary.org/obo/OBI_0000659
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvNitrate | ub2bL161C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL162C79 , ub2bL164C72 , http://purl.obolibrary.org/obo/CHEBI_17632 , ub2bL165C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvLongitude | http://purl.obolibrary.org/obo/OBI_0001621
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csv | http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvNitrite | ub2bL178C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL179C79 , ub2bL181C72 , http://purl.obolibrary.org/obo/CHEBI_16301 , ub2bL182C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvLatitude | http://purl.obolibrary.org/obo/OBI_0001620
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvAmmonium | ub2bL81C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL82C79 , ub2bL84C72 , http://purl.obolibrary.org/obo/CHEBI_28938 , ub2bL85C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvSilicate | ub2bL212C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL213C79 , ub2bL215C72 , http://purl.obolibrary.org/obo/CHEBI_29241 , ub2bL216C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvElevation | ub2bL109C52 , ub2bL110C62 , ub2bL111C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/inorganic_nutrients.csvWaterDepth | ub2bL229C52 , ub2bL230C62 , ub2bL231C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999

Finished .ttl files put into the folder: /kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/completed_annotation_files

Finished physical_oceanography.ttl

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvNorth-southCurrentVelocity | ub2bL87C52 , http://purl.obolibrary.org/obo/PATO_0002242 , ub2bL88C79 , http://purl.obolibrary.org/obo/ENVO_01000067
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvHorizontalCurrentVelocity | ub2bL72C52 , http://purl.obolibrary.org/obo/PATO_0002242 , ub2bL73C79 , http://purl.obolibrary.org/obo/ENVO_01000067 , http://purl.obolibrary.org/obo/PATO_0001855
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvGearIdentificationNumber | ub2bL62C52 , http://purl.obolibrary.org/obo/IAO_0000578 , http://purl.obolibrary.org/obo/ENVO_00003074
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvPressure | ub2bL114C52 , http://purl.obolibrary.org/obo/PATO_0001025 , ub2bL115C79 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvWaterDepth | ub2bL153C52 , http://purl.obolibrary.org/obo/PATO_0001595 , ub2bL154C79 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvEast-westCurrentVelocity | ub2bL50C52 , http://purl.obolibrary.org/obo/PATO_0002242 , ub2bL51C79 , http://purl.obolibrary.org/obo/ENVO_01000067
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvOxygen | ub2bL99C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL100C79 , ub2bL102C72 , http://purl.obolibrary.org/obo/CHEBI_15379 , ub2bL103C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvTemperature | ub2bL141C52 , http://purl.obolibrary.org/obo/PATO_0000146 , ub2bL142C79 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csv | http://purl.obolibrary.org/obo/ENVO_01000067
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvCurrentDirection | http://purl.obolibrary.org/obo/PATO_0000039
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/physical_oceanography.csvSalinity | ub2bL126C52 , http://purl.obolibrary.org/obo/PATO_0001655 , ub2bL127C79 , ub2bL129C72 , http://purl.obolibrary.org/obo/CHEBI_24866 , ub2bL130C98 , http://purl.obolibrary.org/obo/ENVO_00002149

got chlorophyll_a to work.

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvElevation | ub2bL55C52 , ub2bL56C62 , ub2bL57C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csv | ub2bL12C52 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL13C78 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvLatitude | http://purl.obolibrary.org/obo/OBI_0001620
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvLongitude | http://purl.obolibrary.org/obo/OBI_0001621
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvChlorophyllA | ub2bL31C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL32C79 , ub2bL34C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL35C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvWaterDepth | ub2bL99C52 , ub2bL100C62 , ub2bL101C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/chlorophyll_a.csvEvent | ub2bL68C52 , http://purl.obolibrary.org/obo/IAO_0000578 , ub2bL69C78 , ub2bL71C72 , http://purl.obolibrary.org/obo/BCO_0000003 , http://purl.obolibrary.org/obo/OBI_0000659

and fixed the Water Depth column in physical_oceanography to be (elevation or depth) and ('inheres in' some 'marine water body')

Finished global_chlorophyll_a.ttl

data_files.py 
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvHaptophyteChlorophyllA | ub2bL57C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL58C79 , ub2bL60C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL61C98 , http://purl.obolibrary.org/obo/NCBITaxon_418917
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvTotalChlorophyllA | ub2bL117C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL118C79 , ub2bL120C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL121C98 , http://purl.obolibrary.org/obo/ENVO_00002149
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvLongitude | http://purl.obolibrary.org/obo/OBI_0001621
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvOrdinalNumber | ub2bL90C52 , http://purl.obolibrary.org/obo/OBI_0000963 , ub2bL91C78 , http://purl.obolibrary.org/obo/OBI_0100051
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvLatitude | http://purl.obolibrary.org/obo/OBI_0001620
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvDiatomChlorophyllA | ub2bL42C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL43C79 , ub2bL45C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL46C98 , http://purl.obolibrary.org/obo/NCBITaxon_2836
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvProkaryoteChlorophyllA | ub2bL102C52 , http://purl.obolibrary.org/obo/PATO_0000033 , ub2bL103C79 , ub2bL105C72 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL106C98 , http://purl.obolibrary.org/obo/NCBITaxon_2
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvWaterDepth | ub2bL132C52 , ub2bL133C62 , ub2bL134C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csv | ub2bL12C52 , http://purl.obolibrary.org/obo/CHEBI_18230 , ub2bL13C78 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/global_chlorophyll_a.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021

Finished influence_snow_depth.ttl

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvLongitude | http://purl.obolibrary.org/obo/OBI_0001621
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvRelativeDistanceX | ub2bL61C52 , http://purl.obolibrary.org/obo/PATO_0000040 , ub2bL62C79 , http://purl.obolibrary.org/obo/ENVO_00002200
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvSeaIceThickness | ub2bL85C52 , http://purl.obolibrary.org/obo/PATO_0000915 , ub2bL86C79 , http://purl.obolibrary.org/obo/ENVO_00002200
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvLatitude | http://purl.obolibrary.org/obo/OBI_0001620
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000021
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvRelativeDistanceY | ub2bL73C52 , http://purl.obolibrary.org/obo/PATO_0000040 , ub2bL74C79 , http://purl.obolibrary.org/obo/ENVO_00002200
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csv | ub2bL12C52 , http://purl.obolibrary.org/obo/PATO_0001018 , ub2bL13C79 , ub2bL15C72 , http://purl.obolibrary.org/obo/ENVO_00001999 , ub2bL16C100 , http://purl.obolibrary.org/obo/ENVO_00002200
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/influence_snow_depth.csvSignalStrength | ub2bL97C52 , http://purl.obolibrary.org/obo/PATO_0015013 , ub2bL98C79 , http://purl.obolibrary.org/obo/ENVO_00002200

20.12.17

I had the idea about a competency question involving querying the datastore about anything relevant to silicon biogeochemical cycling. If I could query ontobee for any thing that has some relation to silica, and retrieve both 'silicon atom' and 'silicate(4-)', I could then use those terms to query my datastore and retrieve the Silicate column from the inorganic_nutrients dataset and the column Particulate Silicon Fluxfrom the biogenic_particle_flux dataset. This would be a super cool question as I could actually try to do some basic real ecological analysis. As they both have a water depth, so I could relativize the units (z score or something like that) and see how these vary at either continuous depth, or categorical depth (surface and seafloor). Unfortunately I can't find any obvious axioms in either of the CHEBI classes which would all for us to query for both of them knowing they are about the element Silicon.

Here's a link it's not as obvious as anything that has part silicon, but that's currently how CHEBI is.

silicate(4−) subClassOf{3} silicon coordination entity subclassOf silicon molecular entity

we have silicon molecular entity which has part some silicon atom

silicon molecular entity    ->   has part some silicon atom
  silicon coordination entity
    silicon oxoanion
      silicate ion
        silicate(4-)

This query against the ontobee sparql endpoint:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?s  
WHERE { 
?s rdfs:subClassOf/rdfs:subClassOf/rdfs:subClassOf/rdfs:subClassOf <http://purl.obolibrary.org/obo/CHEBI_26677> .
}

yields:

s
http://purl.obolibrary.org/obo/CHEBI_48125
http://purl.obolibrary.org/obo/CHEBI_29241
http://purl.obolibrary.org/obo/CHEBI_48122
http://purl.obolibrary.org/obo/CHEBI_48124
http://purl.obolibrary.org/obo/CHEBI_48137
http://purl.obolibrary.org/obo/CHEBI_53333
http://purl.obolibrary.org/obo/CHEBI_53334
http://purl.obolibrary.org/obo/CHEBI_53565
http://purl.obolibrary.org/obo/CHEBI_48137
http://purl.obolibrary.org/obo/CHEBI_53333

where http://purl.obolibrary.org/obo/CHEBI_29241 is the desired silicate(4−)

curious when I run this query against ontobee with different values in side the {} I get different levels, but subclassOf+ which should be all subclasses and their subclasses doesn't result in the desired silicate(4−)

?s rdfs:subClassOf{4} <http://purl.obolibrary.org/obo/CHEBI_26677> . does because it's at the 4th level of subclass downn from silicon molecular entity I wonder if I run it against the http://sparql.hegroup.org/sparql/ proper (non-test sparql endpoint) if I will get different results with subclassOf+.

Do this in folder: /kblumberg_masters_thesis/testing/script_prototyping/query_for_subclasses_of_input_purl

It does work. Pulling in values from each of the X levels in the rdfs:subClassOf{X}. I the front end ontobee sparql endpoint limits these queries probably for efficiency reasons.

This still doesn't get us silicon atom but we can make use of the relation in silicon molecular entity: subClassOf: has part silicon atom

I figure out how this could work:

I notice that the subclasses of carbon group molecular entity all have has part X atom

To get the has part X atom (for example from silicon molecular CHEBI_26677) I can use the query:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix obo: <http://purl.obolibrary.org/obo/> 

SELECT DISTINCT ?o
WHERE { 
obo:CHEBI_26677 rdfs:subClassOf/owl:onProperty obo:BFO_0000051. 
obo:CHEBI_26677 rdfs:subClassOf/owl:someValuesFrom ?o.
}

which returns http://purl.obolibrary.org/obo/CHEBI_27573 (silicon atom)

I could make a version of the script: query_for_subclasses_of_input_purl.py for chebi purls which does a union between it's current subclass+ query and the above the get the desired atom of interest, which would return a list of all the subclasses of for example silicon molecular entity plus silicon atom. Which I could then use as the inputs in the script which gets any data about a list of input purls (which I was going to make next anyway).

Essentially I'll make a special CHEBI version of the query_for_subclasses_of_input_purl.py script which also gets the atom, then use that to pull data about silica and silicon and do a mini ecological analysis about the Si cycle from that as a model for how these semantics can help to study biogeochemical cycling.

The query in the script would be like:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix obo: <http://purl.obolibrary.org/obo/> 

SELECT DISTINCT ?s
WHERE { 
{
?s rdfs:subClassOf+ <http://purl.obolibrary.org/obo/CHEBI_26677> .
}
UNION
{
obo:CHEBI_26677 rdfs:subClassOf/owl:onProperty obo:BFO_0000051. 
obo:CHEBI_26677 rdfs:subClassOf/owl:someValuesFrom ?s.
}
}

21.12.17

Added page for Vocamp.

add more datasets annotations: Start with biogenic_particle_flux.

new case:

W and ('any property' some ('any property' some X) and ('any property' some (Y and 'any property' some Z)))

Finished biogenic_particle_flux.ttl

query results in all annotation classes:

file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvCalciumCarbonateFlux | ub2bL55C52 , http://purl.obolibrary.org/obo/PATO_0001030 , ub2bL56C79 , ub2bL58C72 , ub2bL59C82 , ub2bL61C112 , http://purl.obolibrary.org/obo/ENVO_01000158 , ub2bL63C92 , http://purl.obolibrary.org/obo/CHEBI_3311 , ub2bL64C117 , http://purl.obolibrary.org/obo/CHEBI_50860
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvSestonFlux | ub2bL175C52 , http://purl.obolibrary.org/obo/PATO_0001030 , ub2bL176C79 , http://purl.obolibrary.org/obo/ENVO_01000158
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csv | ub2bL13C52 , http://purl.obolibrary.org/obo/ENVO_03000010 , ub2bL14C80 , http://purl.obolibrary.org/obo/ENVO_01000158
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvParticulateOrganicCarbonFlux | ub2bL109C52 , http://purl.obolibrary.org/obo/PATO_0001030 , ub2bL110C79 , ub2bL112C72 , ub2bL113C82 , ub2bL115C112 , http://purl.obolibrary.org/obo/ENVO_01000158 , ub2bL117C92 , http://purl.obolibrary.org/obo/CHEBI_27594 , ub2bL118C118 , http://purl.obolibrary.org/obo/CHEBI_50860
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvParticulateOrganicNitrogenFlux | ub2bL130C52 , http://purl.obolibrary.org/obo/PATO_0001030 , ub2bL131C79 , ub2bL133C72 , ub2bL134C82 , ub2bL136C112 , http://purl.obolibrary.org/obo/ENVO_01000158 , ub2bL138C92 , http://purl.obolibrary.org/obo/CHEBI_25555 , ub2bL139C118 , http://purl.obolibrary.org/obo/CHEBI_50860
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvDateTimeEnd | http://purl.obolibrary.org/obo/BFO_0000148
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvDuration | http://purl.obolibrary.org/obo/BFO_0000038
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvParticulateSiliconFlux | ub2bL151C52 , http://purl.obolibrary.org/obo/PATO_0001030 , ub2bL152C79 , ub2bL154C72 , ub2bL155C82 , ub2bL157C112 , http://purl.obolibrary.org/obo/ENVO_01000158 , ub2bL159C92 , http://purl.obolibrary.org/obo/CHEBI_27573 , ub2bL160C118 , http://purl.obolibrary.org/obo/CHEBI_50860
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvWaterDepth | ub2bL189C52 , ub2bL190C62 , ub2bL191C105 , http://purl.obolibrary.org/obo/PATO_0001687 , http://purl.obolibrary.org/obo/PATO_0001595 , http://purl.obolibrary.org/obo/ENVO_00001999
file:/home/kai/Desktop/grad_school/marmic/master_thesis/kblumberg_masters_thesis/testing/test_annotation_of_data_files/build_datastore_annotations_and_queries/biogenic_particle_flux.csvDateTime | http://purl.obolibrary.org/obo/BFO_0000148

next: snow_height

make the annotation for Snow Height Mean

'expected value' and 'is about' min 2 ('data item' and 'is about' some (thickness and 'inheres in' some 'snow'))

But I haven't yet been able to successfully query it... I'll come back to this later.

Clone this wiki locally