Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning rdf:type range in the enriched STO #1

Open
mmaltsev opened this issue Mar 22, 2018 · 6 comments
Open

Cleaning rdf:type range in the enriched STO #1

mmaltsev opened this issue Mar 22, 2018 · 6 comments
Assignees

Comments

@mmaltsev
Copy link
Collaborator

In the DBpedia, in our area of interest, range of the property rdf:type sometimes consists of irrelevant data.

Example can be sto:BBF_TR-069 -- rdf:type -- dbpcy:Rule106652242.
In this case, the unrelated object dbpcy:Rule106652242 is just a result of implementing the predicate rdfs:subClassOf to dbpcy:Protocol106665108.

Thus, we have the full chain dbpcy:Protocol106665108 < dbpcy:Rule106652242 < dbpcy:Direction106786629 < dbpcy:Message106598915 < dbpcy:Communication100033020 < dbpcy:Abstraction100002137 in the list of ranges for rdf:type of the sto:BBF_TR-069.

The question is - should anything from such chains be removed from the enriched ontology?

Another example is sto:SCOR -- rdf:type -- dbpcy:Person100007846.

This case is easier because such a concept is simply wrong and we can exclude the whole chain with dbpcy:Person100007846 in it from the enriched ontology.

@igrangel
Copy link
Collaborator

To meet this requirement, there should be something to compare to. The ontology could be one and maybe instances could be compared if they are correct instantiations of a given class in the ontology. In case that this occurs, these classes should be removed from the full chain.
Still, in the end, we need to truth to compare with.

@mmaltsev
Copy link
Collaborator Author

The only solution that came to my mind was to narrow down the classes for each standard. That is - to exclude all super classes and leave only those which are at the bottom level of the "DBpedia class tree". Such an approach was implemented here.

Applying it to the OPC_UA leads to the following.
before:

sto:IEC_62541 a dbpcy:Abstraction100002137,
                dbpcy:Communication100033020,
                dbpcy:Direction106786629,
                dbpcy:Measure100033615,
                dbpcy:Message106598915,
                dbpcy:Protocol106665108,
                dbpcy:Rule106652242,
                dbpcy:Standard107260623,
                dbpcy:SystemOfMeasurement113577171,
                dbpcy:WikicatComputerStandards,
                dbpcy:WikicatNetworkProtocols,

after:

sto:IEC_62541 a dbpcy:WikicatComputerStandards,
                dbpcy:WikicatNetworkProtocols,

Applying it to the enriched ontology yields into this. Such a process removes 429 triples overall. In addition, some of the class chains, like WikicatBusinessModels -> ... -> PhysicalEntity100001930 were totally excluded.

@igrangel
Copy link
Collaborator

igrangel commented Apr 15, 2018

The problem, in this case, is that we may be removing facts that are true. E.g., OPC UA can be considered as a dbpcy:Communication100033020, and dbpcy:Standard107260623. To make this right we need to have a Gold Standard or at least to be able to access the ontology.
Which criteria did you use to remove the triples? How this can be validated?

@mmaltsev
Copy link
Collaborator Author

The reason why I excluded such superclasses as dbpcy:Communication100033020 and dbpcy:Standard107260623 was:

  1. they don't provide any additional information because dbpcy:WikicatStandards or dbpcy:WikicatANSIStandards or any other "bottom-level" class is automaticaly a dbpcy:Standard107260623.
  2. dbpcy:Standard107260623 itself is just some inner uuid inside DBpedia which doesn't even always mean that it is "Standard" as we understand it. Moreover, this kind of information doesn't provide us any useful knowledge - we can't really use it.

Some of the classes were removed because their "top-level" super class was PhysicalEntity100001930 which generally describes people, events, etc.

This solution might be not the best because it excludes some of the classes which are true, but at least it narrows down to those classes which are easy to check and to unerstand where they come from.

@igrangel
Copy link
Collaborator

Can you evaluate what would be the precision only of this example, with and without removing? - Check this

@mmaltsev
Copy link
Collaborator Author

For the sto:IEC_62541 in terms of precision, considering that from a human perspective, standard is not a Communication (dbpcy:Communication100033020), Direction (dbpcy:Direction106786629), or Message (dbpcy:Message106598915), then the precision before the cleaning would be p = 8/11 and after p = 2/2 = 1. It'll look like that, again, only after human interpretation.

From the perspective of DBpedia, as a system, all of these classes, i.e. Communication100033020 or Message106598915 just represent different layers of abstraction hierarchy for the DBpedia resource. Thus, in this case either way (before and after the cleaning), p = 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants