Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 #10

Closed
1 of 3 tasks
sync-by-unito bot opened this issue Oct 7, 2022 · 13 comments
Closed
1 of 3 tasks

4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 #10

sync-by-unito bot opened this issue Oct 7, 2022 · 13 comments
Assignees
Labels
pm.GREI https://docs.google.com/document/d/1RdifpHJDFqx8Y8-Dsv_VnnTgezjNHKpSyRei4cw3C-k/edit?usp=sharing pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Oct 7, 2022

References:

Problem Statement

This first year deliverable is clear from the title

Proposed Solution

The focus of the first year is on the metadata.
We have an existing backlog of fixes.
The existing list may not capture all existing problems, but it is extensive.

Deliverables for the first year:

  • Create list of current OAI-PMH issues,
  • Fix the issues,
  • Test and document the fixes.

Last updated: Mon Dec 5 2022

Last updated: Thu Dec 15 2022 before I left for the holiday
Report: Dec 2022

Work has continued on the initial backlog created from the initial spike. The following issues have been addressed in recent sprints. Expand the suite of automated tests of the Harvesting functionality #8843, invalid schema and metadataNamespace fields in OAI-PMH ListMetadataFormats response #3621, [feature request] stop an harvest job in progress #7940

70%


Ordered list of Issues that make up this deliverable:
updated: 2023_01_09

Full list is under construction in: #25


┆Issue is synchronized with this Smartsheet row by Unito

@mreekie mreekie self-assigned this Oct 7, 2022
@mreekie
Copy link
Collaborator

mreekie commented Oct 7, 2022

This issue represents a deliverable funded by the NIH
This deliverable supports the NIH Initiative to Improve Access to NIH-funded Data

Aim 4: Improve harvesting and packaging standards to share metadata and data across repositories

Our proposed project will significantly improve the widely-used Harvard Dataverse repository to better support NIH-funded research.

A critical measure of the GREI program’s success is to standardize the discoverability across generalist repositories. To help with this, we propose to improve the existing harvesting functionality in the Dataverse software based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, and coordinate with other repository packaging standards to share or move metadata and data.

Dataverse already supports the Bags as defined by the Research Data Alliance (RDA) Research Data Repository Interoperability Working Group.

Here we proposed to improve the support for Bags, test it for NIH-funded datasets, and explore and define the appropriate standard to use to move the metadata and data across generalist repositories

  • This will help with a sustainable and succession plan.
    • if one repository cannot support anymore a specific dataset, it will allow to easily move the dataset to another repository without losing any information about the dataset.
  • Additionally we propose to implement Signposting in the Dataverse software.
    • By adding additional http link headers throughout the application, we can more easily support automated metadata and data discovery in the repository, and allow for other applications and services to more accurately and completely represent the content in the Harvard Dataverse repository.
. Aim Deliverable
1.4.1 4 Resolve OAI-PMH harvesting issues
1.4.2 4 Create working group on packaging standards to share metadata and data across repositories
2.4.1 4 Implement packaging standards based on working group feedback
3.4.1 4 Test packaging and harvesting with other generalist repositories
4.4.1 4 Assess and improve packaging and harvesting across repositories

@mreekie
Copy link
Collaborator

mreekie commented Oct 12, 2022

Who:

  • Leonid
  • Phil
  • Jim

@mreekie
Copy link
Collaborator

mreekie commented Nov 4, 2022

September update:
(1.4.1, 1.4.2) A spike (Dataverse GitHub Issue IQSS/dataverse-pm#24) has been completed by the team to inventory existing issues with the current harvesting functionality to prepare for upcoming collaborative work on packaging standards. There are 20 GitHub Issues that were identified, and work has started on those Issues in priority order. The first two of these issues (#8139 and #8484) have been addressed and their fixes integrated into the code base, to be released in Dataverse 5.12. The Search & Metadata sub Working Group can serve as a forum to explore cross-repository metadata sharing.

@mreekie
Copy link
Collaborator

mreekie commented Dec 5, 2022

Last updated: October 2022 (no change)

(1.4.1, 1.4.2) A spike (Dataverse GitHub Issue IQSS/dataverse-pm#24) has been completed by the team to inventory existing issues with the current harvesting functionality to prepare for upcoming collaborative work on packaging standards. There are 20+ GitHub Issues that were identified, and work has started on those Issues in priority order. The first two of these issues (#8139 and #8484) have been addressed and their fixes integrated into the code base, and have been released in Dataverse 5.12. The Search & Metadata sub Working Group can serve as a forum to explore cross-repository metadata sharing.

@mreekie
Copy link
Collaborator

mreekie commented Dec 6, 2022

Last updated: Mon Dec 5 2022

(1.4.1, 1.4.2) Work has continued on the initial backlog created from the initial spike. The following issues have been addressed in recent sprints. Trying to set up or complete a harvesting client through the API crashes Dataverse, OAI server: metadataPrefix unknown: Internal server error #37410, Feature Request/Idea: Documentation for the API to create and edit harvesting clients IQSS/dataverse#8267, and OAI-PMH responses indicating errors should be processable by OAI-PMH clients IQSS/dataverse#3797. The Search & Metadata sub Working Group can serve as a forum to explore cross-repository metadata sharing.

70%

@mreekie
Copy link
Collaborator

mreekie commented Dec 16, 2022

This needs grooming to determine if we have satisfied the deliverable scope.
Next Step:

  • Get help in grooming the issues

@mreekie
Copy link
Collaborator

mreekie commented Dec 16, 2022

Last updated: Thu Dec 15 2022 before I left for the holiday
Report: Dec 2022

Work has continued on the initial backlog created from the initial spike. The following issues have been addressed in recent sprints. Expand the suite of automated tests of the Harvesting functionality #8843, invalid schema and metadataNamespace fields in OAI-PMH ListMetadataFormats response #3621, [feature request] stop an harvest job in progress #7940

@mreekie
Copy link
Collaborator

mreekie commented Jan 9, 2023

Met with leonid.
The objective was to scope the rest of the deliverable.
Leonid proposed the following;

  1. Do a review of all the existing issues around harvesting as a Spike: Spike: Inventory and prioritize all existing Harvesting related issues #24
  2. From there find a natural breakpoint to set an ending point for 1.4.1
  3. document that sub-list in: Collection: Keep track of list of issues that we want to address as part of 1.4.1 #25

In addition:

  • I re-ordered the "NIH bklog items (Stefano)" stewarded backlog column. It is made up solely of issues from 1.4.1 right now. I made it's order match the order dictated by leonid in the NIH OTA 1.4.1 planning tab
  • Leonid approves of the items being done that are in the "Ordered" column of the in the NIH OTA 1.4.1 planning tab. So those are slated already to be part of the definition of done.
  • The first 6 of 11 open items in the "ordered" list in the NIH OTA 1.4.1 planning tab are present in the global backlog after our work.
  • Replicated the ordered list in the description of this issue

@mreekie
Copy link
Collaborator

mreekie commented Feb 8, 2023

Next steps:

  • Get Leonids issues prioritized and into the queue.

@mreekie
Copy link
Collaborator

mreekie commented Feb 8, 2023

Monthly Update January

(1.4.1, 1.4.2) Continued work on backlog and revised the remaining work.. Work has continued on the initial backlog created from the initial spike #8574.

@mreekie
Copy link
Collaborator

mreekie commented Mar 3, 2023

February Update

(1.4.1) Continued work on backlog and revised the remaining work.. Work
has continued on the initial backlog created from the initial spike
https://github.com/IQSS/dataverse/issues/8574. We are at 9 issues
completed, with 2 queued for immediate work.This is a long term activity as
it affects many aspects of harvesting that will likely continue in year 2.

@mreekie mreekie transferred this issue from IQSS/dataverse Mar 3, 2023
@mreekie mreekie added the pm.GREI https://docs.google.com/document/d/1RdifpHJDFqx8Y8-Dsv_VnnTgezjNHKpSyRei4cw3C-k/edit?usp=sharing label Mar 3, 2023
@mreekie mreekie added pm.GREI-d-2.4.1 NIH, yr2, aim4, task1: Implement packaging standards based on working group feedback pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues and removed pm.GREI-d-2.4.1 NIH, yr2, aim4, task1: Implement packaging standards based on working group feedback labels Mar 18, 2023
@mreekie
Copy link
Collaborator

mreekie commented Apr 10, 2023

March update

(1.4.1) This activity was completed at an extent of 85% in year 1 and transferred to year 2.

@mreekie
Copy link
Collaborator

mreekie commented Apr 18, 2023

draft yr 1 report summary: FY1 Annual Summary

This activity was completed at an extent of 85% in year 1. The team created a prioritized list and started with the most critical items. At the end of the year 18 of 27 items have been resolved. The estimated completion attempts to reflect the varying complexity of the items on the list. Year 2 work toward completion will be tracked as yr:2 aim:4 task:1a (2.4.1A) starting at 85% complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pm.GREI https://docs.google.com/document/d/1RdifpHJDFqx8Y8-Dsv_VnnTgezjNHKpSyRei4cw3C-k/edit?usp=sharing pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues
Projects
Status: No status
Development

No branches or pull requests

1 participant