Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine default graph and optional graphs via UNION to return all the types available in the dataset #1327

Open
wants to merge 1 commit into
base: skosmos-2
Choose a base branch
from

Conversation

kinow
Copy link
Collaborator

@kinow kinow commented Jun 1, 2022

Reasons for creating this PR

Only the types of the default graph are returned in the REST API, unless the user enables the option in Jena that includes all graphs.

Link to relevant issue(s), if any

Description of the changes in this PR

In the parts of the query where it references types (like rdfs label, or a subclass, etc) I have added a sibling UNION that includes the GRAPH ?g { same_statement }, to simulate what Jena does with that setting.

With the changes in this branch, after clearing the Local Storage in my browser, I can see all the types in both the REST response, and also in the UI (e.g. search auto-complete, see #1323 ), without needing to enable that option to merge the default graph in Jena Fuseki.

Known problems or uncertainties in this PR

Does it need a test? Does it sound like a valid solution? I thought about doing a simpler { { query_as_is } UNION { GRAPH ?g { query _as is } } } but since the query wasn't very long I opted for this current approach.

Checklist

  • phpUnit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works (if not, explain why below)
  • The PR doesn't introduce unintended code changes (e.g. empty lines or useless reindentation)

@codecov
Copy link

codecov bot commented Jun 1, 2022

Codecov Report

Merging #1327 (5d375f4) into master (5d193c2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master    #1327   +/-   ##
=========================================
  Coverage     70.68%   70.68%           
- Complexity     1646     1647    +1     
=========================================
  Files            32       32           
  Lines          3786     3787    +1     
=========================================
+ Hits           2676     2677    +1     
  Misses         1110     1110           
Impacted Files Coverage Δ
model/Model.php 80.81% <100.00%> (ø)
model/sparql/GenericSparql.php 91.28% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d193c2...5d375f4. Read the comment docs.

@osma
Copy link
Member

osma commented Jun 2, 2022

Thanks for giving this a shot @kinow , this has been a longstanding issue that has bitten many users (including yourself)!

However, I'm not sure this is the right approach, adding lots of UNION clauses to the query which seems overcomplicated to me. The intent of the types query is simply to list all the concept types across all the vocabularies. The result should be similar, if not identical, to the result of performing a types query separately for each vocabulary and merging them into one list.

There are at least a few ways of doing that that I can think of:

  1. Relying on the union default graph (current status quo). This is really a hack and should be avoided. For one, it relies on the Fuseki setting. Second, this isn't really accurate as the dataset may include other graphs that are not configured as Skosmos vocabularies but may still contain concept type definitions that will end up in the results.
  2. Adding a single GRAPH ?g { block around the whole query. I think that this should be equivalent to the above, at least in sensible scenarios (each graph contains both the type definition and its label). It's still a bit problematic since types from graphs that are not configured as vocabularies may be included.
  3. Same as 2, but selecting only those graphs that are configured as vocabularies and using VALUES ?g to enumerate those, thus excluding irrelevant graphs.
  4. Same idea as 3, but using FROM NAMED clauses for selecting individual graphs instead of GRAPH ?g with VALUES ?g.

Would you like to try some of the approaches 2-4? I think any of them would be an improvement over the current situation, as long as the query still performs well.

@kinow
Copy link
Collaborator Author

kinow commented Jun 2, 2022

Would you like to try some of the approaches 2-4? I think any of them would be an improvement over the current situation, as long as the query still performs well.

Sounds like fun! I will give it a try later after or in between the other JQuery/Bootstrap work. Will be a good way to brush up SPARQL and Jena Fuseki :)

Do you have any suggestion on which item to try first (or to start thinking about), between options 2, 3, and 4?

Thanks!

@osma
Copy link
Member

osma commented Jun 2, 2022

I think 3 or 4 would be most preferable as they don't suffer from the problem of irrelevant data being picked up from other possible named graphs in the dataset.

model/sparql/GenericSparql.php Outdated Show resolved Hide resolved
private function generateQueryTypesQuery($lang) {
$fcl = $this->generateFromClause();
private function generateQueryTypesQuery(string $lang): string {
$vocabs = $this->model->getVocabularies();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no vocabs, then the query will return all the types from all the graphs. Would it be better to return an empty result int that case?

@kinow
Copy link
Collaborator Author

kinow commented Jun 5, 2022

I decided to try FROM NAMED clauses since we already had the code for that. We actually called the function for that creating the $fcl variable, but without passing any vocabularies, so it wasn't returning anything.

Here's an example of what the query may look like. It is from my development environment with a few vocabularies configured.

PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX  isothes: <http://purl.org/iso25964/skos-thes#>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  finaf: <http://urn.fi/URN:NBN:fi:au:finaf:>

SELECT DISTINCT  ?type ?label ?superclass
FROM NAMED <http://skos.um.es/unescothes/>
FROM NAMED <http://zbw.eu/stw/>
FROM NAMED <http://vocabs.rossio.fcsh.unl.pt/tesauro/>
FROM NAMED <http://www.yso.fi/onto/yso/>
FROM NAMED finaf:
WHERE
  { GRAPH ?g
      { {   { BIND(skos:Concept AS ?type) }
          UNION
            { BIND(skos:Collection AS ?type) }
          UNION
            { BIND(isothes:ConceptGroup AS ?type) }
          UNION
            { BIND(isothes:ThesaurusArray AS ?type) }
          UNION
            { ?type rdfs:subClassOf/(rdfs:subClassOf)* skos:Concept }
          UNION
            { ?type rdfs:subClassOf/(rdfs:subClassOf)* skos:Collection }
        }
        OPTIONAL
          { ?type  rdfs:label  ?label
            FILTER langMatches(lang(?label), "en")
          }
        OPTIONAL
          { ?type  rdfs:subClassOf  ?superclass }
        FILTER EXISTS { ?s  a               ?type ;
                            skos:prefLabel  ?prefLabel
                      }
      }
  }

image

And the REST API response:

{
  "@context": {
    "skos": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#",
    "uri": "@id",
    "type": "@type",
    "rdfs": "http:\\/\\/www.w3.org\\/2000\\/01\\/rdf-schema#",
    "onki": "http:\\/\\/schema.onki.fi\\/onki#",
    "label": "rdfs:label",
    "superclass": {
      "@id": "rdfs:subClassOf",
      "@type": "@id"
    },
    "types": "onki:hasType",
    "@language": "en",
    "@base": "http:\\/\\/localhost:9090\\/rest\\/v1\\/"
  },
  "uri": "",
  "types": [
    {
      "uri": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept",
      "label": "Concept"
    },
    {
      "uri": "http:\\/\\/zbw.eu\\/namespaces\\/zbw-extensions\\/Descriptor",
      "label": "Descriptor",
      "superclass": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept"
    },
    {
      "uri": "http:\\/\\/zbw.eu\\/namespaces\\/zbw-extensions\\/Thsys",
      "label": "Thsys",
      "superclass": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept"
    },
    {
      "uri": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Collection",
      "label": "Collection"
    },
    {
      "uri": "http:\\/\\/purl.org\\/iso25964\\/skos-thes#ConceptGroup",
      "label": "Concept group"
    },
    {
      "uri": "http:\\/\\/purl.org\\/iso25964\\/skos-thes#ThesaurusArray",
      "label": "Array of sibling concepts"
    },
    {
      "uri": "http:\\/\\/www.yso.fi\\/onto\\/yso-meta\\/Concept",
      "label": "General concept",
      "superclass": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept"
    },
    {
      "uri": "http:\\/\\/www.yso.fi\\/onto\\/yso-meta\\/Individual",
      "label": "Individual concept",
      "superclass": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept"
    },
    {
      "uri": "http:\\/\\/www.yso.fi\\/onto\\/yso-meta\\/Hierarchy",
      "label": "Hierarchical concept",
      "superclass": "http:\\/\\/www.w3.org\\/2004\\/02\\/skos\\/core#Concept"
    }
  ]
}

),
'http://www.w3.org/2004/02/skos/core#Collection' => array(),
'http://purl.org/iso25964/skos-thes#ThesaurusArray' => array(),
'http://purl.org/iso25964/skos-thes#ConceptGroup' => array()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at testconfig.ttl, and I think the function shuold return these new types now as they are types used in vocabularies there. These extra values are added now since the query is searching types in all graphs for all configured vocabs.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jun 5, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

'http://www.w3.org/2004/02/skos/core#Concept',
'http://www.w3.org/2004/02/skos/core#Collection',
'http://purl.org/iso25964/skos-thes#ThesaurusArray',
'http://purl.org/iso25964/skos-thes#ConceptGroup',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change in this array is the order of ThesaurusArray and ConceptGroup. Not sure why these two swapped places in this array.

@kinow
Copy link
Collaborator Author

kinow commented Jun 5, 2022

The SonarCloud security issue reported is for a prefix URL, so that shouldn't be a blocker, I think. Ready for review again 👍

@osma osma self-assigned this Sep 12, 2022
FILTER EXISTS {
?s a ?type .
?s skos:prefLabel ?prefLabel .
GRAPH ?g {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me. You are already using $fcl i.e. a generated block of FROM clauses. There is no need for the GRAPH ?g { block.

@osma
Copy link
Member

osma commented Sep 13, 2022

I can try to finish this off now that @kinow you are busy with the qtip-to-CSS PR #1324 ...

@kinow
Copy link
Collaborator Author

kinow commented Sep 13, 2022

I can try to finish this off now that @kinow you are busy with the qtip-to-CSS PR #1324 ...

That'd be great, please :D Thank you @osma!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The REST API types query relies on union default graph
2 participants