-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ontology starting point #2
Comments
more thoughts on the subject: I fill that a rigid tree hierarchy works well for the administrative regions. a The categories are optional (maybe apart from the This model does not however work for at least 2 examples (feel free to add counter examples, I'm sure there are more): A non administrative region that regroup others administrative regionseg Marne-la-Vallée that is a group of cities in france. It has no administrative meaning, but is well know by locals. A non administrative region that intersect many adminseg.le marais, a french touristic zone that span across parts of 2 paris districts neither the non administrative zone nor the administrative zone contains the other, they just intersect. Rough idea on how to handle thoseI think it's nice for any zone (administrative of not) to have administrative parents as it's helpful to know that One idea would be, for any zone to have:
As a starting point I think we can use the same algorithm for both relationship: what does that meansMarne-la-ValléeAll the cities of Le maraisThere is no soft link between implication for the use casesattaching zones to a pointTo attach zones to a point (like we need to do in a geocoder), we'll search for all leaf-zone that contains the point. As a first implementation, we can even just search for all zones that contains the point and filter the leaf (so lowest level admin + all non related non-administrative zones) finding the most meaningful zone for a pointI don't know 😉 limitations
|
Soft linksI like the idea to have a strong representation where everybody can hang onto, and something less organized for local specific things. Post codesA post code could be a soft zone, right? If we ever get a shape of those (that might be a problem in many places of the world) then you could be precise, otherwise a list should be sufficient. Local knowledgeAs you mentioned, le Marais is better known by the locals than the administrative quarters. The problem here is that there is no specific border, but rather fuzzy. An attribute on the zone could help here, even if we don’t know how to use the data (or represent it on OpenStreetMap). Broad internationnal agreementsIt would also be nice to be able to represent the Schengen zone. External knowledgeThe airport of Paris is not in Paris, and tourists think that the Château de Versailles is in Paris. This would mean that the ontology has a notion on who’s asking? Can it be a tree?With strict inclusions, we will have a tree 🌴, which is nice. The obviousThe lowest level must be variable. The Useless trivia: Google believes there is a quarter 🍌 in Paris. Unknown by the inhabitants and not an official one neither https://encrypted.google.com/search?hl=en&q=quartiers%20la%20banane%20paris The easyWhen going from the lowest to the highest, the system needs to have holes. For instance the Useless trivia : this island is under direct authority of a ministry, with no intermediate administration https://en.wikipedia.org/wiki/Clipperton_Island The challengeÅland belongs to Finland. Finland belongs to the European Union, yet Åland does not belong to the European Union (yay! cheap booze 🍾) For the vast majority of situations, this can be ignored. Maybe it could be handled with an explicit exception once the big work is done. There seem to be surprising few situations of that kind https://en.wikipedia.org/wiki/Dependent_territory The painA territory can be under the sovereignty of two countries, like https://en.wikipedia.org/wiki/Pheasant_Island Ok. I think we can ignore this one. |
If I might be so bold, the 🍌 area actually does exist and is known to, at least some, inhabitants. This could be a typical example of how different people view the same area differently. |
Indeed, I should not take my ignorance as a general rule. I would be curious to know where the data from Google comes from |
Regarding the tree structure, not sure it works. It can be a DAG though I think. Take postal codes for example. In France you will have, potentially, several |
hum for postal codes, don't you think soft links (so outside the official hierarchy) would be enough ? You're right, the tree Vs DAG is really an important question, we really need to think about this carefully |
Some thoughts concerning wikidata. It is a database closely linked to Wikipedia defining semantic relations between objects. The licence is CC0, so that won’t be a problem. With OSM as the geographic leg, Wikidata as the semantic one, Cosmogony should be able to have all the needed informations. Stable IDFirst obvious benefit: the ID will probably be much more stable than OSM elements or even Wikipedia pages. It handles the historization of elements, meaning that an Id will not be recycled for a new object (e.g. two communes that merge). Paris will always be Q90. The wikidata ID should be in the OSM object tags. We should do a batch to have the order of magnitude of Higher confidence when building the hierarchyThere is already an hierarchy with the property P131 that indicates the belonging to a larger zone. This could avoid some wrong hierarchies that would be only detected through geographical inclusion (simplified borders, weird enclaves…) Contribute to a good databaseWikidata has good chances to keep working over time. Any hand made fix will therefore stay there for good and will help to improve commons. This will reduce the need of adhoc databases. Manipulating the dataThe dump is 20Gb large. This will be a problem for someone working on a small territory. My guess it that it will be very easy to generate a subset that focuses on the admin regions. |
I also had a look at wikidata as a potential source of information to build the hierarchy. I agree stable IDs would be useful. P131 is promising, but does not seem so easy to use. Its definition is unclear and I can easily find inconsistencies in the data. See Quimper (Q342) :
A similar issue is visible with Marne-la-Vallée (Q1886380) (we really like that example ^^): Note : here is the SPARQL query I used, to find geographical entities with multiple P131 statements. |
Anyway, we have two candidate approaches to build the hierarchy (geographical inclusion and wikidata). We may choose one, and create some QA checks or additional tools to test our data against the other one. |
Here's the issue to start the discussion about schema of our zones "hierarchy".
The aim of this issue is to fill the concerned section in the README
here are my non structured thoughts:
categories
I like libpostal categories, libpostal is quite a reference in the address parsing world and we can hope their categories can handle all the countries specificities all around the world, but I don't think it handles all the corner cases (and it's not the only category out there, for example Wof uses another).
libpostal does not handle non administrative regions apart from the
suburb
(and maybe thecountry_region
). So it would be difficult to represent Marne-la-Vallée or parc du mercantourThere is also the question of postal codes. I don't know whereas we could/should have postal codes zones in the hierarchy (should we create a separate issue for this ?)
Pyramidal hierarchy or graph-based ?
Can a zone have at most one parent or can it have several.
I fill that it might be a failing of Wof to have a pyramidal hierarchy. I don't think it will complicate cosmogony that much to be able to have several parents.
I don't think it's useful for purely administrative regions (but maybe there are countries where it's relevant), but for non-administrative regions I think a pyramidal hierarchy will be too restrictive.
Eg. what would we link Marne-la-Vallée to ? ile de france ? but then it would be difficult to link it back to the cities that are part of it.
The same apply for non official suburbs that can span across several district
links coherence
Wof hierarchy is nice, but being linked to all parents brings incoherence (like france empire that contains france country but the empire has less descendant than the country.
I fill like outputting only the first level of relationship force the dataset to be coherent (even if so it will make the dataset harder to use without tools)
The text was updated successfully, but these errors were encountered: