-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidated diagrams Core Vocabularies #37
Comments
Thanks for sharing, especially also the consolidated diagrams. Following version one I have done some consolidation (privately) and included those V1 consolidated diagrams in (analysis) work for (government) customers. One striking difference is the omission in V2 of the attribute Identifier type. As I understand it, in V2 this type is assumed to be (somehow) part of the identifier, or derived from issuing authority (let's call this implicit type). On the other hand the identifier class is used for identifying persons, legal entities, public organisations, (and beyond that bank accounts, vehicles, etc).
However:
Use case to consider: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32018R1240 |
I love such ideas! While I'm new here, I think this idea of explicit identifier is more what they call "application profile" (in the sense of direct use). This means that even for countries which can use English, the exact term varies (sometimes often depending on the issuing organization).
Another relevant point is some subjects are sensitive (not privacy, but collaborators become targets), so in addition of be viable (have data conventions with documentation which allow be implemented de facto) on these issues even well intentioned workers from government or companies may be forced (often by bureaucracy, comply with local laws, wait for approval,...) to delay because this often can be used to track misbehavior. This is quite complicated even if leadership (first minister/president) wants it, but at the sub national level or at the organizational level (think police departments not implementing because if they do while others do not, they become news and are punished).
I think this is CCCVE. Until 4 days ago on #39 (comment) I was not aware that this was a thing. Compared to CPV, CCCV is not trivial to implement with translations of only CCCVE. My approach on these 3 topicsTL;DR: these points are a quick brainstorming after the @janbmgo comments. Point 3, which seems the focus of Goossenaerts to be usable on production, I'm not as sure as the part 1. part 1At least for Core Vocabularies (maybe application profiles, if they get translations) I think in the following months I could have proof of concepts of multilingual format. The one goal would be, based on templated files, by reading the file with everything possible to specify the target language. Then not just could it be easier to generate CPV documentation in languages such as Portuguese, but reuse directly to create more close to application profiles. Then, something we could do is in addition to the translations, also have language tags we're BCP47 explicitly labels the country (and, for several standards inside a country) we have conventions on x-private extensions for BCP47 (https://tools.ietf.org/rfc/bcp/bcp47#section-2.2.7). This could handle the fact that some countries are likely to be already using other terms. By reusing the logic of BCP47, the only additional languages needed would be the ones where the base translations already do not have. This might be easier explained in practice (that's why even if the tooling part allows, it would take months to wait to prepare translations). But my point here is to automate a bit the way to generate translations (and as side effects, allow it to be reusable for data standards). part 2For part 1 to become more effective to allow translations be ready to use, we would need to be realistic that even well intentioned governments and their inside regions willing to replace old local standards can take time. Also, since in the mean time they may be using other terms to mean the same thing, would it make sense to have public list of terms used (ideas such as "data by request", don't work in practice). But if is going on sensitive topics, then, except if the terms used already is compiled by some cross national organization on that topic and SEMICeu republishes, I think mentioning name of persons (or even "the proposal for this term used locally comes from government ) will attach less collaboration. First, it is unrealistic to wait for every government to have someone to submit. Then there is a problem with the ones which are collaborating but could get more exposed and eventually be removed from their jobs and replaced with someone who will avoid help on this. I think a win-win situation here would allow anyone to suggest local terms as long as there are some documents which attest that such terms exist (like a public form or a webpage). In general, most issues which are good to discuss here would be Part 1 (which are more technical). Bun in special concepts which are necessary for evidence (see part 3) not only would need some easier way to get collaboration for source concepts (and some code to allow translations, even if source term changes) but also some way to understand that the best specialists may not want to get public notoriety by this. It's also very important they get published and be found, otherwise audience would not use. From practical part, we from @HXL-CPLP even use simple online spreadsheets (which could be hidden behind http://proxy.hxlstandard.org/) to generate everything else (from data on other formats like JSON and XML to strings used on scripts). Part 3My current thinking on translating CCCVE to be closer to be directly usable (as in tabular data exchange) is actually thinking it is better expressed as narrow data. Similar issues I found here (EticaAI/hxltm#11, use case https://github.com/EticaAI/tico-19-hxltm/blob/gh-pages/data/original/tico-19-terminology-google.csv) were the columns are fixed, but most meaning is part of data on the columns. For example, both Facebook and Google collaborate with world lists ("terminologies" without definitions) for over 100 languages, but the variables (column names) are just a few. The wide format equivalent (which is friendly to work direct by humans; also for optimized software access) would have over 100 columns more. And each new column means that the database schema would need to change. I don't think I would have proof of concept anytime soon of something such as CCCVE using such type of transposition, but if considering storing with narrow format (which is friendly to exchange between systems) the new need would start to have some place to document/translate every term (concept) which already is not machine parseable (such as date). For example if some "passport" can be evidence, then the term (or code) would need to be stored for "passport" (and it cannot be a RDF or semantic web, this really needs to be a code, even if the code actually is.... a full URL). But even with narrow data, the constraints could be either together (in very repetitive) with the data or they be on separate tables while the code can be used to find those constraints. One reason for allowing store both together or separate if a government is sending data (narrow data often is generated by computers, not manually) while even do not have time to translate the concept, or worst case scenario the additional fields on the tabular format would explain logic the computer could validate. Like I said, this part I don't plan to have proof of concept, but a narrow data approach is easier to implement and be production ready. But the new focus would be made as easier as possible to allow others publish codes used (like is the language codes used to transpose from narrow to large on https://github.com/EticaAI/tico-19-hxltm/tree/gh-pages/data/original). PS.: Since I'm not aware what were the "One striking difference is the omission in V2 of the attribute Identifier type" mentioned on the Goossenaerts comment, maybe there is something I'm not aware which could be on V2 itself, without be derived work. It that's is the case, then I actually endorse any identifiers. In fact, this was one of the reasons for the #39 (which was not written to be a release blocker of 2.00 since it could take time if decided to go on some structured numeric appoach) |
@janbmgo I created an issue in adms-ap space, to relate it to a possible update of adms:Identifier. In short on the structural handling of the properties dct:identifier and adms:identifier (pointing to an class adms:Identifier). The first dct:identifier can only store the value. Using the RDF typed literal approach a little bit of information about the value space can be provided. But non of the above addresses the metadata description needs you expressed above. That is addressed by using (often in addition) adms:identifier.
This offers a structure to add any necessary metadata about the identifier. For instance adding the responsible for the identifier.
adms:Indentifier already provides support for
The second corresponds to your IdentifierType request, the last 2 address the ownerschip of the schema. If this already satisfies the base of your request, then you can add e.g. dct:issued to indicate the creation moment of the identifier. Which one to use actually depends on your usage context. If the usage context actually determines all metadata of the identifier, the dct:identifier is probably the way to go, but in a broader context where this is unclear, adms:identifier is the way to go. But also both can be combined. The Core Vocabularies do not make a stand on this. In DCAT-AP we are having webinars on identifiers. In there not only the above, but also the expectation on whether there is one identifier for an entity is being discussed. Because in your explanation of the issue, this is an unspoken constraint. Core Vocabularies do not express cardinality constraints, but in the practice this is an topic. |
Closing as managed in ADMS-AP |
As introduced during the fifth webinar, two consolidated diagrams have been produced combining all core vocabularies: Core Person Vocabulary, Core Location Vocabulary, Core Business Vocabulary and Core Public Organization Vocabulary. These diagrams intend to give an overview of the classes and properties of the different vocabularies.
The Consolidated diagram in an exhaustive manner while the Simplified version focuses on the main concepts of each vocabulary and their connections.
With this issue, we would like to invite you to provide feedback on these diagrams.
The text was updated successfully, but these errors were encountered: