-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File description metadata of ingested files are not in the DDI exported metadata #5051
Comments
2024/05/06
|
2024/07/10
|
@amberleahey or @stevenmce might know off the top of their head where a file description should go in DDI. |
The spec is here: https://ddialliance.org/Specification/DDI-Codebook/2.5/ Are we updating to DDI version 2.1 or 2.5 (2.6 is also on it's way, but only just released for review: https://github.com/ddialliance/ddi-c_2). |
Here's a sample XML from the DDI spec site: |
Thanks for the links @stevenmce. I think Dataverse is using DDI 2.5 already, right? That's what we say in the Appendix page of the Dataverse Guides and I see references to that version in the DDI exports. And when I opened this GitHub issue, I was referring to how Dataverse uses that version of DDI Codebook. I think it'll be helpful to add what some folks from the Dataverse core team said about this GitHub issue during a planning meeting this week:
With other priorities I'm not able to focus on this issue, so I'm recommending we move it out of the sprint ready column of the IQSS Dataverse Project board. @sbarbosadataverse, do you agree? |
I think exposing file descriptions via DDI is a great idea. I took a quick look at the links above but I wasn't able to quickly figure out which DDI field to use. 🤷 |
A few things are happening for file metadata and DDI Codebook exports:
Overall, I think the tabular data ingested files could remain in the File Dscr section and we add a TXT or NOTE tag to the set for the descriptions. We also noticed there were issues with mapping the new standard CC licenses (these do not get into the DDI) but custom licenses do so we had to set this up for all of Odesi. There are other mapping issues with Codebook that could be tackled by the DDI community and a new exporter could be built to support 2.5 , 2.6 with these improved mappings.... |
@amberleahey thanks, that helped me find the writeFileDescription method that does indeed write to the DDI
|
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
2024/08/23: Reopening because issue was already sized and prioritized. |
Just a quick note before I make a PR:
Tabular ("ingested") files do need to remain in I'm seeing that this was estimated as a "3", which is what we use for most straightforward fixes - like the amount of effort it would take to implement what I just described above, so I'll try and stay within that. :) |
So, it'll look like this:
|
Thanks @landreev! I opened this GitHub issue and merely described something that seemed inconsistent to me. But I think I should have also encouraged us to think about how we'll know that however this is resolved was a good way to resolve it. And I hope that we can discuss this now while considering your solution. I imagine this would help anyone who needs to export the DDI-Codebook metadata of data in their repository in order to preserve that metadata. Does that sound right? This change has no affect on how findable harvested datasets are, since I think Dataverse doesn't index any of the file-level metadata that it harvests from DDI-Codebook metadata. |
This change may potentially affect our data explore and our other tool (odesi). We will need to test that. |
I'll test it with your Data Explorer also. I can't imagine it actually causing a problem - since the new note has attributes clearly marking it as different from the other kinds of notes that can be found under Was good to see you at Dagstuhl! 🙂 |
FWIW, the test in |
@jggautier Yeah, it was just a weird inconsistency. Was worth fixing just for the sake of striving to export as much of the information about the data as possible. Whether it'll ever benefit anyone significantly in real life, idk. |
Right, I thought about DataDscr that is using additional note sections for curation, so yes, you are right, for data curation it should not matter, it does not using fileDscr. Although, I should talk to my colleague @nana-boateng. He is using xml codebook for our search tool odesi. I believe it should not matter, but we need to test it too. |
@nana-boateng confirms that the change should not affect our odesi. |
adding description info to the fileDsc seciton in DDI CodeBook. #5051
I've seen this omission/bug since at least Dataverse version 4.9 and verified it in Dataverse version 6.1:
In the exported metadata of datasets with ingested files, for files that Dataverse ingests, the file description text (which depositors enter for uploaded files) are not in the exported DDI.xml. For example, for the dataset at https://doi.org/10.7910/DVN/1ZPAKL, the ingested file's file description, "Data from vignette survey experiment conducted in Denmark in June 2023", is not in the exported DDI.
I think next steps would be to figure out if:
The text was updated successfully, but these errors were encountered: