Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance Needed for Handling Datasets in Geoconnex #196

Open
ksonda opened this issue Jul 24, 2023 · 3 comments
Open

Guidance Needed for Handling Datasets in Geoconnex #196

ksonda opened this issue Jul 24, 2023 · 3 comments
Assignees

Comments

@ksonda
Copy link
Member

ksonda commented Jul 24, 2023

There is an emerging use case, most prominently CUAHSI but also people running CKAN or Socrata based CMS/DMS and perhaps things like Sciencebase.gov, of datasets that can be tagged as being schema:about geoconnex features but are not themselves monitoring locations and thus do not have geoconnex PIDs.

A grey area would be datasets that for example, timeseries about an organizational monitoring location that could probably be a reference location if it is not already (eg. USBR RISE data catalog items for reservoir operations data, MonitorMyWatershed stuff).

We need to develop guidance on:

  1. How to submit data urls that are not PIDs so that we can still crawl them
  2. When to submit PIDs vs when to submit data URLs
  3. When to submit PIDs vs when to submit PIDs and Reference PIDs to the appropriate reference repository
@ksonda
Copy link
Member Author

ksonda commented Jul 24, 2023

an undoubtedly non-exhaustive list of options

Option 1: (I have been telling people to do this until now). Set up organizational monitoring location pages whether they are reference features or not, and set them up in such a way that they serve as sub-data catalogs for all datasets about them. These should have geoconnex PIDs.

eg.

{"@id":"https://geoconnex.us/usgs/monitoring-location/{numbers}", 
"schema:subjectOf":
["stuff about page/API call for to parametercode 1",
"stuff about page/API call/ data download for data for parametercode 2"]}

Option 2: Tag their dataset to be schema:about or some HY relationship to a geoconnex feature, whether a reference location or just some kind of featureofinterest like a mainstem or cataloging feature like a HUC. No geoconnex PID, so must give us a list of dataset URLs to crawl.

{"@id":"a non permanent dataset URL", 
"schema:about":
["https://geoconnex.us/ref/gages/1000001", "https://geoconnex.us/ref/hu10/0102030405"]}

It's probably a wash between options 1 and 2 in terms of effort for the data provider. In terms of our own data management, it adds a layer of complexity to administer, and possibly an order of magnitude greater crawling compute to do Option 2, but is more consistent with the SELFIE architecture I suppose. However, it does open us up to link rot,

Option 3: Allow both options, but require some sort of geoconnex PID for datasets. Like an additional special organizational sub-namespace. eg

id: https://geoconnex.us/usgs/datasets/{datasetid}
target: https://waterdata.usgs.gov/monitoring-location/03451200/#parameterCode=00010&period=P7D
description: parametercode 00010 for https://geoconnex.us/usgs/monitoring-location/03451200

@dblodgett-usgs
Copy link
Member

By my read, you are including few specific use cases here and it would be worth while to break them apart a bit more for the sake of clear recommendations.

I want to avoid geoconnex IDs for datasets and other abstract digital objects (reports). Those should really have DOIs.

The monitoring context adds a dimension to the use cases that needs to be split out -- so I'll just avoid that nuance in the rest of my response.

So there are two patterns we might have.

  1. a data provider's resource that is an "in band" semantic resource that is either about or subject of a geoconnex feature.
  2. a data providers resource that is an "out of band" object that is either about or subject of a geoconnex feature.

If the data provider is decorating their resource, and it's a semantic resource, I think having the structured data be about its self makes the most sense.

e.g.

{
	"@id": "a url that returns semantic content",
	"schema:about": ["https://geoconnex.us/ref/hu10/0102030405"]
}

If the data provider is decorating their resource and it's a non-semantic resource, we should use the other structure.

e.g.

{
	"@id": "https://geoconnex.us/...",
	"schema:subjectOf": {
		"schema:url": "digital object that is out of band"
	}
}

@ksonda
Copy link
Member Author

ksonda commented Aug 1, 2023

Discussed with @webb-ben in the midst of implementing #198 and #202 . We can support adding arbitrary, not just geoconnex.us sitemaps by adding them to the namespaces subdirectories. Then we just need to figure out and document guidance.

Some cases here

image

@webb-ben webb-ben changed the title Data URLs Guidance Needed for Handling Datasets in Geoconnex Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants