-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What would Data INRA like Harvard Dataverse to harvest? #236
Comments
I sent @DS-INRA (Dimitri) a link to this issue. |
Hello, reacting only today, thanks for raising the issue ! |
Perfect. Thanks @DS-INRA! We'll use that set then. I'll also change the name of the collection at https://dataverse.harvard.edu/dataverse/inra_harvested, to "Recherche Data Gouv Harvested Dataverse", the collection's URL to https://dataverse.harvard.edu/dataverse/recherchedatagouv, and I'll remove the description. Let me know if you'd like any done differently with those things changed, too. |
I told Dataverse to delete the harvesting client "inra". The records were removed from https://dataverse.harvard.edu/dataverse/inra_harvested within minutes, but the client remains in the table at https://dataverse.harvard.edu/harvestclients.xhtml?dataverseId=1, with "DELETE IN PROGRESS" in the "Last Results" column. This sounds similar to what was reported in the GitHub issue at IQSS/dataverse#7052, although I'm not sure if Harvard Dataverse's server was rebooted while the records were being deleted. I'll check tomorrow to see if the client has been deleted. If it has been, I'll create a new harvesting client to harvest the ALL set from https://entrepot.recherche.data.gouv.fr/oai. If the client hasn't been deleted by tomorrow, I'll ask my developer colleagues if the server was rebooted this afternoon, if that's why the client hasn't been deleted, and if the server can be deleted another way so that I can create a new client and harvest records from the ALL set from https://entrepot.recherche.data.gouv.fr/oai. |
Client was deleted 🎉, and records in the ALL set of https://entrepot.recherche.data.gouv.fr/oai are being harvested into https://dataverse.harvard.edu/dataverse/recherchedatagouv. I'll close this issue and open another if there are any problems. Thanks @DS-INRA! |
Great, many thanks ! |
Just an update here. 37 records were harvested into https://dataverse.harvard.edu/dataverse/recherchedatagouv and Dataverse reports that it couldn't harvest 2325 records. I'll update the issue at #92, where I've been writing about failing harvests. |
Thanks, feel free to tag me there so that I don't miss it! |
@DS-INRA you can also click "subscribe" on the issue. |
@DS-INRA, #92 was closed instead, a few months after I left that comment about updating that Github issue. So I don't think you need to subscribe to it. Harvard Dataverse has still harvested just 37 records from Recherche Data Gouv and has stopped harvesting from all repositories that it used to so that we can address indexing issues that were affecting how well it harvests. Eventually we'll keep looking into specific cases where Harvard Dataverse isn't harvesting well or at all from certain repositories, like Recherche Data Gouv, and we'll ping you at @DS-INRA if we wind up creating a new GitHub issue about it. |
The harvesting job that used to harvest metadata into the collection at https://dataverse.harvard.edu/dataverse/inra_harvested has been failing for a while. It's one of several failing jobs. See #92.
So clicking on the title of each dataset no longer leads to the dataset but to a 404 page, and Harvard Dataverse hasn't been updating the metadata it's harvested.
The installation's URL, https://data.inra.fr, now redirects to https://entrepot.recherche.data.gouv.fr/dataverse/inrae. Dimitri Szabo let us know about this and we updated the Dataverse map (IQSS/dataverse-installations#162), but we didn't adjust the harvesting job settings, which are still trying to use the OAI-PMH endpoint https://data.inra.fr/oai, which isn't working anymore.
We might need to update the settings so that Harvard Dataverse is harvesting from https://entrepot.recherche.data.gouv.fr/oai instead, and possibly adjust the URL, name and description of the collection at https://dataverse.harvard.edu/dataverse/inra_harvested depending on what the folks from INRAE would like Harvard Dataverse to harvest.
Their harvesting sets are listed at https://entrepot.recherche.data.gouv.fr/oai?verb=ListSets and the list includes a set called INRAE. Who ever works on this issue might ask Dimitri Szabo ([email protected]) about what they'd like Harvard Dataverse to harvest, then adjust the settings to re-harvest their metadata and ensure that it stays up to date.
The text was updated successfully, but these errors were encountered: