-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest: map publisher tag to distributorName #9013
base: develop
Are you sure you want to change the base?
Conversation
If you are still interested in this PR, can you please merge and resolve any merge conflicts with the latest from develop? If so, we can prioritize reviewing and QAing the changes. If we don’t hear from you by May 22, 2024, we’ll go ahead and close this PR (it can always be reopened after that date, if there is still interest). |
95d9c69
to
d92d048
Compare
The branch is now up to date with develop and the sql migration file was renamed to match the latest version - please tell me if there is any convention I'm missing. |
@plecor can you please add a release note snippet that describes what this pull request does? Please see https://guides.dataverse.org/en/6.2/developers/version-control.html#writing-release-note-snippets Also, it would be nice to add something to the guides but I'm not sure where. Maybe we should add a changelog to https://guides.dataverse.org/en/6.2/admin/harvestserver.html ? Would that make sense? Finally, tests are always nice but not strictly required. There are some in HarvestingServerIT.java. |
Thanks @pdurbin. I created a release note and added the relevant test to HarvestingClientsIT. |
@plecor I went ahead and added a new changelog for harvesting clients: https://dataverse-guide--9013.org.readthedocs.build/en/9013/admin/harvestclients.html#harvesting-client-changelog In review we'll see what others think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some feedback.
src/main/resources/db/migration/V6.2.0.2__8739-publisher-during-harvesting.sql
Outdated
Show resolved
Hide resolved
@@ -299,6 +299,11 @@ private void harvestingClientRun(boolean allowHarvestingMissingCVV) throws Inte | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's been so long that https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-9013/7/display/redirect shows a failure but that job is now a 404 so I can't see any details.
After you merge the latest from develop let's keep an eye on the new Jenkins run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing failures. I just kicked off another run. Fingers crossed: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-9013/12/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last time all checks were green except for "continuous-integration/jenkins/pr-merge " that was pending. I'm not clear on what it is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Now that the app is being deployed following the SQL script renaming, I can see that a couple harvesting-related tests are failing:
- edu.harvard.iq.dataverse.api.HarvestingClientsIT.testHarvestingClientRun_AllowHarvestingMissingCVV_False(HarvestingClientsIT.java:187)
- edu.harvard.iq.dataverse.api.HarvestingClientsIT.testHarvestingClientRun_AllowHarvestingMissingCVV_True(HarvestingClientsIT.java:191)
@plecor can you please take a look? Do you need help with how to run these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pdurbin ,
I've re-run the tests from scratch and I wonder if it could be a chicken and egg situation. The new assertion fails if I run the tests from this branch because the sql migration file it's adding hasn't been run. If I manually update the db then the tests run fine. Could it be something similar going on with Jenkins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the issue is that in a new install, the corresponding data is initialized by afterMigrate__1-7256-upsert-referenceData.sql
after the other migrations, so the data we're trying to update doesn't exist yet.
I updated this file with the same change as the sql migration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Thanks.
@poikilotherm judging from https://github.com/IQSS/dataverse/blame/c8499ba9553ac46cf3adc56d1b9e56f0c781d30f/src/main/resources/db/migration/afterMigrate__1-7256-upsert-referenceData.sql you added that "after migrate" file. What do you think? Any risk in changing it? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers took a look and had this to say:
"that afterMigrate script appears to run after everything else, every time you restart, rather than being one of the scripts tracked by hash value in the flyway_schema_history table. So - it looks OK to me to update it. (Mostly from looking at https://documentation.red-gate.com/fd/callback-concept-184127466.html and related and verifying that I don't see it in the flyway_schema_history table)."
@@ -0,0 +1 @@ | |||
update foreignmetadatafieldmapping set datasetfieldname = 'distributorName' where foreignfieldxpath = ':publisher'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about datasets that are already harvested? The docs say, "This change affects all newly harvested datasets."
The plan is to have a mix?
Or should we add to the release note that all datasets should be re-harvested? Is that crazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the fix isn't retroactive, unless all datasets are re-harvested. This is not great but I have no idea how we could do this another way.
Maybe we could only suggest re-harvesting only if people are affected by/care about the discrepancy it fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plecor for now, can you please add something to the release note. "All datasets should be re-harvested to pick up this change." Or whatever makes sense to you. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. To reflect this, the full text of the release note is now:
The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. This improves the citation associated with these datasets, but the change only affects newly harvested datasets. All datasets should be re-harvested if you wish to pick up this change on already harvested datasets. For more information, see the guides, #8739, and #9013.
Co-authored-by: Philip Durbin <[email protected]>
Co-authored-by: Philip Durbin <[email protected]>
…' into 8739-publisher-during-harvesting
What this PR does / why we need it:
Which issue(s) this PR closes: