Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of ISO19115-3:2016 Geospatial Standard #212

Open
camfindlay opened this issue Jan 29, 2019 · 20 comments
Open

Support of ISO19115-3:2016 Geospatial Standard #212

camfindlay opened this issue Jan 29, 2019 · 20 comments

Comments

@camfindlay
Copy link

What would be require to add support for this standard in this module?

We have agencies that are outputting CSW originating from this standard and the current CSW harvest in this module error when attempting to ingest this.

I note this is referenced over in this GH issue too GSA/data.gov#764

Keen to get a read on what's required so I can look at potentially tackling this soon.

Thanks all.

@kalxas
Copy link

kalxas commented Jan 30, 2019

A good starting point would be implementing support for ISO19115-3:2016 in OWSLib.

@camfindlay
Copy link
Author

@kalxas thanks, as in it's not supported in https://github.com/geopython/OWSLib ? and the spatial extension for CKAN is using this as a dependancy? If I look to get this solved, would be great to make sure peoples input, knowledge and approaches can be included :) Any other insight into chasing this would be appreciated (feel free to brain dump here).

@kalxas
Copy link

kalxas commented Feb 12, 2019

Actually there is an iso parser in https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py but OWSLib is also used in other places.

@fostermh
Copy link

It's a bit of a hack but as part of setting up cioos.ca we have forked ckanext-spatial and started modifying it to support harvesting iso19115-3. https://github.com/cioos-siooc/ckanext-spatial

Most of the changes to date have been in https://github.com/cioos-siooc/ckanext-spatial/blob/cioos-siooc/ckanext/spatial/model/harvested_metadata.py. I would love a figure out a way to do this so that we could maintain support for iso19139 as well and contribute back to the community.

@ccancellieri
Copy link
Contributor

Hi @fostermh,
what's the status of this fork?
is it implementing iso19115-3 completely?

If so I could try to provide a pull req based on your fork very soon.

@fostermh
Copy link

It's not 100% coverage, unfortunately. It does handle most of the required and recommended fields but some of the optional fields are not currently part of our standard and so have not yet been addressed. At a quick look, there appear to be around 25 fields that do not currently have an iso19115-3 xpath. It would probably take a day or two to finish that up. If there is interest I could work on finishing up the last few fields next week.

@ccancellieri
Copy link
Contributor

I'm checking changes, I see quite a lot of interesting stuff, maybe I can cherry pick some other than
https://github.com/ckan/ckanext-spatial/compare/master...ccancellieri:cioos-siooc?expand=1

Any suggestions? (f.e.: I see an interesting GeoNetworkHarvester, currently I was planning to use CSW with a full profile but I'm not sure it will ship all the iso fields)

Should I / would you like to create a raw pull req (only to share comments and improvements)

looking forward for comments / suggestions.

Note: I'd love to keep PR as small and focused as possible so it will be easy to merge and review.

@fostermh
Copy link

I think if the focus is ISO19115-3 support, then ckanext-spatial/ckanext/spatial/model/harvested_metadata.py is likely the only file of interest. Yes, we did quite a bit with harvesters. That was more related to how data flows between organizations in CIOOS so probably best to ignore it for the moment. No need to confuse the issue more than it already is. :-)

A raw pull request with just the minimal changes needed sounds like a great idea. I will put a PR together and tag this issue when it's ready.

@ccancellieri
Copy link
Contributor

Great, quick note, I'm testing a port of that file to my branch but looks like it requires (over Alpine-3.14):

apk add gdal
apk add gdal-dev
pip install gdal

due to osgeo import:

def infer_spatial(self, values):
...
            try:
                geom = ogr.CreateGeometryFromGML(xmlGeom)
            except Exception:
                try:
                    geom = ogr.CreateGeometryFromWkt(xmlGeom)
                except Exception:
                    try:
                        geom = ogr.CreateGeometryFromJson(xmlGeom)
...

Adding so many dependencies can be tricky (but can be discussed with @amercader and maintainers) do you think that feature is necessary or not reproducible withoud gdal?

Thanks

@fostermh
Copy link

fostermh commented Oct 20, 2021

It seems likely to me that spatial information will come in the form of GML, I added a couple of other options because I could but we are using GML internally I think. It seemed troublesome to store the GML in solr, and not very useful down the line, so I converted to GeoJSON. I used ogr for this. If we only support GML input then a custom function to parse it would be possible for sure. So.. yes I think it could be done without osgeo but seemed like a lot of work at the time.

@ccancellieri
Copy link
Contributor

ccancellieri commented Oct 22, 2021

Ciao,
hope this helps:

I'm porting all the necessary changes to the schema (heavily based on your work) here:
https://github.com/ckan/ckanext-spatial/compare/master...ccancellieri:my_master?expand=1

That is 'my_master' and is shipping also some minor improvement I'm providing as isolated pull request (mainly over csw), but necessary to go on supporting other types and ... in my opinion not bad to have.
Those will be wiped out from this branch once merged over master... @amercader I'm sorry about pushing :)

Focusing on the ckanext/spatial/model/harvested_metadata.py I've some question related to backward compatibility....

I'm seeing that most of them are compatible but some of them (please search for '#TODO' string) are not, f.e. change in multiplicity

I'm suspecting that a new ISO family should be created instead of enriching the existing one.

This implies a better iso 'guesser' which is doable but could be tricky and verbose!!! @amercader @fostermh what do you suggest?

Thank you!

@fostermh
Copy link

Unfortunately, the link you provided doesn't work for me.

As to your comments regarding backward compatibility, this was exactly my issue as well. The paths and multiplicity can likely be taken care of with relative ease. for example, if iso19115 expects multiple entries and iso19139 expected one we could adjust the value in an 'infer' function based on which standard was being harvested.

The primary issue I was having is the namespaces that need to change between standards. If we use the same standard for all datasets in a harvester (possibly across one ckan) then we should be able to adjust based on a harvester config setting or ckan.ini setting. If we want to support a mix of standards then it gets tricky. It has been a while since I looked at it, but I believe the namespaces are set during the first harvest creation. So if the first dataset being harvested is ISO19115 standard, the namespaces are set up to support this standard and all harvest objects after will use the same namespace. Obviously, that will not work if you are harvesting a mix of ISO19139 and ISO19115 xml files, for example.

@ccancellieri
Copy link
Contributor

ccancellieri commented Oct 22, 2021

Unfortunately, the link you provided doesn't work for me.
Strange, here you could find that file (with '#TODO' ) here also:
https://github.com/ccancellieri/ckanext-spatial/blob/my_master/ckanext/spatial/model/harvested_metadata.py
Also required some small change to base.py (most are [] replaced by .get()).

As to your comments regarding backward compatibility, this was exactly my issue as well. The paths and multiplicity can likely be taken care of with relative ease. for example, if iso19115 expects multiple entries and iso19139 expected one we could adjust the value in an 'infer' function based on which standard was being harvested.

Mmm this is tricky, I'm not sure I'm understanding.
What I see is that the function guessing the iso needs some improvement but I'm not sure in which way to contribute, if we touch too many parts it would be very difficult to merge not having a deep test coverage.

What I'm thinking is for sure to

  • properly guess the iso type
  • properly infer the right iso hierarchy
  • save into the metadata type (native ckan types) the iso type so I can them show an appropriate GUI/API (see below)

I would really appreciate to have support on any of these from anyone (also approvals or kind of blockers...).

The primary issue I was having is the namespaces that need to change between standards. If we use the same standard for all datasets in a harvester (possibly across one ckan) then we should be able to adjust based on a harvester config setting or ckan.ini setting.

Right I've also had some issue to download them with csw, now should be possible to configure an OutputSchema from the json which configure the harvester. (see below)

If we want to support a mix of standards then it gets tricky. It has been a while since I looked at it, but I believe the namespaces are set during the first harvest creation. So if the first dataset being harvested is ISO19115 standard, the namespaces are set up to support this standard and all harvest objects after will use the same namespace. Obviously, that will not work if you are harvesting a mix of ISO19139 and ISO19115 xml files, for example.

This is interesting, but can be handled (not easi but doable) with different harvesters configured as specified here: #259

Thank you for your help!

FYI: My short term goal is to implement a set of harvester to import the following formats:

  • iso19139
  • iso19115-2
  • iso19115-3.2018
  • iso19115-3.2018 MultiLanguage
    Then provide an interface and an API on each of them thanks to scheming and fluent, dcat. ;)

@fostermh
Copy link

ok I see TODO's and can lend a hand there. do you want me to do a PR to https://github.com/ccancellieri/ckanext-spatial or would you like to start a draft PR from https://github.com/ccancellieri/ckanext-spatial to https://github.com/ckan/ckanext-spatial and I can comment/push there? I think we talked about me starting a raw PR but it looks like you have most of it already and I would just need to remove/comment on a few bits.

@ccancellieri
Copy link
Contributor

ccancellieri commented Oct 22, 2021

I added line by line comments #iso19115-3 as you did. I did that with blocks (separated by a space to the end.
Then row by row, so it should be quite accurate and the conflicts can be commented/addressed hopefully one by one.
Thanks!

FYI:

this branch is rebased to master and isolated to those 2 motels files but still requires #259 for multiple formats
if you want to provide inline comments i can adjust them following your suggestions
if you want to work from your ide please fork my branch and provide fixes on it via pull req, so we can merge on top of this

@ccancellieri
Copy link
Contributor

The plugin looks far more complex than expected, I think:

We've to provide a new harvester implementation providing an ad-hoc ISOElement(s) hierarchy not touching base.py and harvested_metadata.py (which are dedicated to iso19139).

For this reason, please hold-on few days I'll try to produce a better integration and I'll give you the go to in a couple of days to pull if interested in contributing.
A dedicated plugin can also be provided but would love to integrate it here to reduce thousands of duplicated rows.

I'll try to give both options

@fostermh stay tuned! :)

@ccancellieri
Copy link
Contributor

ccancellieri commented Oct 24, 2021

Ok with the latest push (I apologize, I have had to rebase and squash to make it clear) we are ready to discuss:

There are several ticket/issues to open to be able to reuse code but I think most of them are summarized by:

https://github.com/ccancellieri/ckanext-spatial/blob/iso19115-3/ckanext/spatial/harvesters/iso19115/spatial_harvester.py
https://github.com/ccancellieri/ckanext-spatial/blob/iso19115-3/ckanext/spatial/harvesters/iso19115/harvester.py

Now why we have 2 harvester?

The explanation is quite complex and mainly depends on several tickets that can be opened discussed (look at TODO).

The the main harvester is the iso19115_csw_harvester (harvester.py - which must be selected as harvester type into the configuration type), it implements IHarvester delegating as much as possible to the parent the tasks (CSWHarvester).

The iso19115 should be loaded as well (spatial_harvester.py - must be present into the plugins list) but it is mainly used to implement ISpatialHarvester to override some other method from base.py which is not so fault tolerant (most of this code may go if master will accept contributions.

As you see now we have a separate model:
https://github.com/ccancellieri/ckanext-spatial/blob/iso19115-3/ckanext/spatial/harvesters/iso19115/model.py

Hope now we can remove all the 19139 parts from there, feel free to contribute on it, it should not be backward compatible anymore (but still may/should support all the 19115 versions).

Which is used if any of the validators validate the incoming metadata

otherwise other plugins will be used to test against the incoming metadata and the first match will be used.

This means that the plugin is able to harvest both is19115-(1,2,3) and iso19139 and the recognition is automatic.

@amercader I'm trying to develop following indications from ISpatialHarvester and IHarvester but looks like they are not so well integrated, should I open some tickets to fix them? I'll probably open a discussion since I feel there's still some missing but useful method to add to the ISpatialHarvester interface

@torfsen over to you on the model.py!

@torfsen
Copy link
Contributor

torfsen commented Oct 25, 2021

@ccancellieri: I'm not sure what you meant when mentioning me ("over to you on the model.py"). Unfortunately I'm currently not able to contribute much time to CKAN, but if you have a specific question for me I'll see what I can do.

@fostermh
Copy link

Thanks @ccancellieri, I have been sidetracked by a few issues but will get back to this soon.

@frafra
Copy link
Contributor

frafra commented Mar 1, 2022

Is there anything that can be done to support this effort?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants