-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use/Maintain Appropriate File Formats for Preservation and Reproducibility #6006
Comments
Not just for preservation. For simple things like opening the tab-separated file in Excel. Please see https://twitter.com/Ray_J__/status/1202296388618457089 and the screenshot below: |
Review as part of Add originalFileName field to json #2734 when that is picked up in development. |
I'm not sure I understand what the change would be. We always maintain the original file.
I re-opened #2720 because I feel strongly that we should use .tsv instead of .tab Given the above, is there any reason to keep this issue open? Vote to close. |
About the first comment about reproducibility, the Dataverse software always maintains the original file but the file and information about it is not always easily accessible. I think this has improved since this issue was opened, but I can think of at least one case where it could be handled better: The last time I ran the Binder integration on a dataset I uploaded, Binder ignored my dataset's .csv files and tried instead to use the .tab files that were created by the Dataverse software's ingest process. But my dataset's Python script was written to do things with the .csv files. It assumed the files would be .csv files. To work around this, I had to replace the .csv files in my dataset with .tab files and adjust my Python script to do things with .tab files instead. I would imagine that a researcher who wants to make their computational workflow reproducible by uploading it to a Dataverse repository and using something like Binder would not anticipate needing to use .tab files instead of .csv files. |
@jggautier you'd definitely right that there's something to fix for Binder. I just launched my dataset there and what I see is the .tab version, like you're saying. Binder uses repo2docker under the covers and here's where Dataverse support was added: jupyterhub/repo2docker#739 We could submit a PR to repo2docker to change the behavior so that original files rather than preservation (.tab) files are downloaded from Dataverse. I'd be worried about backward compatibility though. Anyway, we need a specific, actionable plan. I'm happy to talk about this whenever. |
This is exactly what I did: |
Closing in favor of this issue: |
We discussed #6002 and #2720 in sprint planning today and plan to work on both during a future sprint. I'm closing out both of those in favor of this one.
We should determine a way to meet both needs.
The text was updated successfully, but these errors were encountered: