Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparql data stored by database-ms has subject, predicate and object as URL's ( Edit: Issue Description Updated) #60

Open
prasanthhs opened this issue Jun 28, 2018 · 16 comments
Assignees
Milestone

Comments

@prasanthhs
Copy link
Contributor

prasanthhs commented Jun 28, 2018

Are there any extractors which index labels against URL's in the sparql db from the current code? I have tried open IE MS and it returns URL's for all the three fields of triples. I am not sure about other extractors.

We had a discussion with Ricardo about this and he mentioned that Auto Index will index URL vs Label and not URL vs URL.

Some sample FOX answers (shared by Ricardo):

Query : Barack Obama is born in Hawaii
FOX_output_for__Barack_Obama_is_born_in_Hawaii__.txt

Currently, the extractors store

foxr:1530176121568  a  oa:Annotation , rdf:Statement , foxo:Relation ;
        rdf:object     dbr:Hawaii ;
        rdf:predicate  foxo:stanford_livein ;
        rdf:subject    dbr:Barack_Obama ;

but should also create this:
dbr:Barack_Obama rdfs:label "Barack Obama" .

Finally, Label vs URL has to be indexed in Auto Index and not URL vs URL.

Any suggestions on how to proceed further?

[Additional Information] : The following Query must work against database-ms's sparql db

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n
?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}". If yes, then isEntityCustomized and EntitySelectQuery can be omitted from the parameters passed and just end point URL would suffice.

[EDIT 05.07.2018] : Current Query which works with Auto Index and is passed by database-ms to AutoIndex.

SELECT DISTINCT ?key1 ?key2
WHERE{
?key1 rdfs:label ?key2 .
}
Current Tested Extractor : FOX.

EXPECTED Query to be passed to Auto Index:

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n
?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}".

@KHaack
Copy link
Contributor

KHaack commented Jun 28, 2018

Hmm one possible solution is to use jena to parse the data. The second is, that we parse every output of any extractor before we insert the data into the database or select an other output format if possible (and its sometimes not possible)

@RicardoUsbeck
Copy link
Contributor

Parsing it into a Jena Model plus running a query. Otherwise too much data ends up as garbage in the score. That should also concern @hjshah142

@sepidetari
Copy link
Collaborator

sepidetari commented Jun 28, 2018

ok so you need something like this?

subject (as uri) -----predicate (as label)-----> object (as uri)

@RicardoUsbeck
Copy link
Contributor

From "Barack Obama is born in Hawaii" the following should be sent to the triple store:

dbr:Barack_Obama foxo:stanford_livein dbr:Hawaii.
dbr:Barack_Obama rdfs:label "Barack Obama" .
dbr:Hawaii rdfs:label "Hawaii" .
foxo:stanford_livein rdfs:label "born in".

@sepidetari
Copy link
Collaborator

@RicardoUsbeck Does It mean that we just have to store turtle format in DB?

We parsed the ttl into Jena model and stored it in DB, the following pictures are the result of it. Is this fine?
db 1
db 2

@Suganya31
Copy link
Contributor

@RicardoUsbeck @prasanthhs If this looks fine, we can try making the open IE and Sorookin to give output in the form of TTL.

@RicardoUsbeck
Copy link
Contributor

Would be great. Is that also possible for FOX so that harsh has three annotators for the ensemble learning?

@prasanthhs
Copy link
Contributor Author

prasanthhs commented Jun 28, 2018

@Suganya Does this work against the Sparql Query I posted? If it doesn't return anything, please let me know. We need to check how we can get the data in this case.

Query is as below, add prefix for owl and rdfs before this query. This query is same as what is executed on dbpedia and a few other sparql end points.

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n
?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}".

@RicardoUsbeck
Copy link
Contributor

No it does not since there is no triple that can bind to ?key1 a owl:Thing .

SELECT DISTINCT ?key1 ?key2 
WHERE{
?key1 rdfs:label ?key2 .
FILTER(regex(str(?key1), "/ontology/[a-z]" ))}

This one should work and return labels for propertie for example

@prasanthhs
Copy link
Contributor Author

prasanthhs commented Jun 28, 2018

@RicardoUsbeck yes i know. But the problem is that that query is a generic default query which runs for all remote end points.. Should i make the query you've given the default instead for only local sparql end points? What about queries for classes and properties which are also executed against this local database since it was decided that we keep generic behaviour for both remote and local end points.

Plus there's the case of missing Prefixes. In the extractor's data there are prefixes which are not included by default with auto Index. Isn't it better if the query is passed instead with necessary prefixes ( opposite to what we discussed) ?

@Suganya31
Copy link
Contributor

@RicardoUsbeck I am not sure about FOX but I can try Sorookin and Open IE. @KHaack Is it possible for FOX also?

@RicardoUsbeck
Copy link
Contributor

@prasanthhs the problem is, that the SASK knowledge graph is not complete but yes, you can also configure it so the query is passed to the autoindex

@prasanthhs
Copy link
Contributor Author

@RicardoUsbeck Yes I understand..Then, it is probably better for database-ms to pass the new query posted here instead of configuring auto index since this is a sask related work in progress. Once the complete implementation is done, the custom queries can be removed.

@Suganya31
Copy link
Contributor

Fox stores the proper TTL in the database now.

@prasanthhs
Copy link
Contributor Author

prasanthhs commented Jul 3, 2018

Snapshot of Elastic Search Repository for Data extracted by FOX extractor + Auto Index integration.

Input to the extractor : "Barack Obama is married to Michelle Obama."

Query received by Auto Index for the Sparql End point: SELECT DISTINCT ?key1 ?key2 WHERE{?key1 rdfs:label ?key2 .
screen shot 2018-07-03 at 13 03 56

@RicardoUsbeck
Copy link
Contributor

If we could get the label of the property in the future it would be even better, but currently FOX does not return them. Maybe @Suganya31 can open an issue for that

@prasanthhs prasanthhs changed the title Sparql data stored by database-ms has subject, predicate and object as URL's Sparql data stored by database-ms has subject, predicate and object as URL's ( Edit: [05.07.2018] Issue Description Updated) Jul 5, 2018
@prasanthhs prasanthhs changed the title Sparql data stored by database-ms has subject, predicate and object as URL's ( Edit: [05.07.2018] Issue Description Updated) Sparql data stored by database-ms has subject, predicate and object as URL's ( Edit: Issue Description Updated) Jul 5, 2018
@AndreSonntag AndreSonntag removed their assignment Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants