Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload: Tar file upload does not automatically unpack like zip as in v3.6. #2195

Closed
kcondon opened this issue May 22, 2015 · 16 comments
Closed

Comments

@kcondon
Copy link
Contributor

kcondon commented May 22, 2015

Not sure whether this was intentionally dropped or missed in the dev process but tar file currently do not automatically unpack as they did in v3.6

@pdurbin
Copy link
Member

pdurbin commented May 22, 2015

Unfortunately, the 4.0 User Guide is inaccurate. It says:

"Compressed files in tar and zip format are unpacked automatically." -- http://guides.dataverse.org/en/4.0/user/dataset-management.html#compressed-files-tar-zip

@posixeleni
Copy link
Contributor

can someone confirm that tar is expected to be supported in 4.0? If so @pdurbin the mention of tar should remain in the documentation and the bug should be fixed

@kcondon
Copy link
Contributor Author

kcondon commented May 22, 2015

I honestly don't know whether the omission was intentional or not. I tested it and it is not working. This may have been a case of testing by ticket -there was a ticket for zip files and it was tested. No ticket for tar files, not tested, and was missed in the gap analysis of the feature list.


From: Eleni Castro [[email protected]]
Sent: Friday, May 22, 2015 2:21 PM
To: IQSS/dataverse
Cc: Condon, Kevin
Subject: Re: [dataverse] File upload: Tar file upload does not automatically unpack like zip as in v3.6. (#2195)

can someone confirm that tar is expected to be supported in 4.0? If so @pdurbinhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pdurbin&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=Apu3ZIx9aRo0tbwPWUKL5RB42H31BfCg2NQ3tT-klQM&s=50VcKLvsDwarrwxzMWv6AqRg2_Xeh1vB50eKKyYgHU8&e= the mention of tar should remain in the documentation and the bug should be fixed


Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2195-23issuecomment-2D104736686&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=Apu3ZIx9aRo0tbwPWUKL5RB42H31BfCg2NQ3tT-klQM&s=IweSXE2RpzEcq6XbeLfXMDA3dpk0TzFLA4CM4jLflxc&e=.

posixeleni pushed a commit that referenced this issue May 22, 2015
Per findings in #2195 removed tar and added a note that it is coming soon.
posixeleni pushed a commit that referenced this issue May 22, 2015
posixeleni pushed a commit that referenced this issue May 22, 2015
@posixeleni
Copy link
Contributor

In our guides, I clarified that we do not currently support tar and referenced this ticket. @kcondon when you QA this before you close this ticket would you please also make sure that someone has added in the documentation that tar is fully supported? Thanks!

@scolapasta scolapasta modified the milestones: Dataverse 4.0: Post-Deployment Curation, Candidates for 4.0.3 Jun 1, 2015
@scolapasta scolapasta modified the milestones: In Review, Candidates for 4.2 Sep 17, 2015
@bencomp
Copy link
Contributor

bencomp commented Jan 6, 2016

Possibly related: #1612

@mheppler
Copy link
Contributor

In a Google Doc for ingest requirements, dated Nov '14 for the last edit, I found a requirement of "Automatically unpacks - mandatory (Delivered)" under the ZIP Files section. Whether or not that requirement was changed for one reason or another was not documented in that FRD however.

In a comment from @landreev on ticket #1175 dated Dec '14, he states:

If there's more files in the zip archive than the limit, it is ingested as a single zip file, and the warning message ("too man files; the limit is ... blah ... upload a zip archive with fewer files if you want them ingested as individual datafiles...") is shown to the user.

Then there are the tickets #2055 and #2017 which debate "to zip or not to zip", the latter of which is still open, which probably makes this ticket a duplicate.

@kcondon
Copy link
Contributor Author

kcondon commented Jan 22, 2016

This is not a duplicate because it involves unpacking files of .tar format rather than unpacking files of .zip format.
In v3.6.x we did both and the assumption is we still wanted that but it was overlooked.


From: Michael Heppler [[email protected]]
Sent: Friday, January 22, 2016 10:21 AM
To: IQSS/dataverse
Cc: Condon, Kevin M
Subject: Re: [dataverse] File upload: Tar file upload does not automatically unpack like zip as in v3.6. (#2195)

In a Google Doc for ingest requirementshttps://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1pGCiH5oPfQD2V5wzGoz62M4pDElGHU50NKBioG2IT9c_edit&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=cNvXQBqYdmTYLQ0vR1JQEHYMWcFfv9sVL74cjwqb1bA&e=, dated Nov '14 for the last edit, I found a requirement of "Automatically unpacks - mandatory (Delivered)" under the ZIP Files section. Whether or not that requirement was changed for one reason or another was not documented in that FRD however.

In a comment from @landreevhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_landreev&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=Rrfk7hAyBgVXhLOPKgxAE8O6i2ExsYGsGUoZt9FhCSc&e= on ticket #1175https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_1175&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=5MHqW4J5K-_6GjD0KAV0C3eN3CHJ6CsFP1DfTZrdU2I&e= dated Dec '14, he states:

If there's more files in the zip archive than the limit, it is ingested as a single zip file, and the warning message ("too man files; the limit is ... blah ... upload a zip archive with fewer files if you want them ingested as individual datafiles...") is shown to the user.

Then there are the tickets #2055https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2055&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=qN-OkC8z2AIYMXflO1LUkgVsKuNzzqcg0B5ZDgpVzDw&e= and #2017https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2017&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=63Pgx2MqFHB_nsKyH1TuFVDDt0-oZWigs5eCDJ2orcw&e= which debate "to zip or not to zip", the latter of which is still open, which probably makes this ticket a duplicate.


Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2195-23issuecomment-2D173948412&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=L5VD4EcOACe73-r3X2R8aRbX_uoUfaSX99lCXvAzvIY&s=O3vodvd05beqIPKifMwszY5eI43kJ24KQ7y8NgDlHU8&e=.

@pdurbin
Copy link
Member

pdurbin commented Jan 22, 2016

Here's how the tar upload feature was documented in DVN 3.6:

"An alternative to selecting files individually is to first create an archive of files in .zip or .tar format and then select the appropriate “multiple files” Data Type when uploading your archive. The zip file or tarball will be unpacked so that the individual files will be added to the page." http://guides.dataverse.org/en/3.6.2/dataverse-user-main.html

@dfear0
Copy link

dfear0 commented Dec 16, 2016

We need the option to KEEP .zip and .tar from unpacking automatically, which was there in the 3.x's. Often users should download single files that keep all the components together in a folder, which .zip does. Our only workaround is to create dummy files, add .zip files directly to the server, and rename them to bypass the interface. Downloading TAR is not as desirable because some users do not know how to unpack them.

@pdurbin
Copy link
Member

pdurbin commented Dec 16, 2016

@dfear0 this issue having to do with the "double zip" workaround is highly related to what you want, I believe: #3439.

@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2017

Does anyone out there care if Dataverse supports upload of tarballs or not? We seem to be getting by fine right now with zip file upload. And I say this as a Unix hacker who loves me some tarballs.

@dfear0
Copy link

dfear0 commented Jun 30, 2017 via email

@oscardssmith
Copy link
Contributor

why would we be closing this? Is it fixed?

@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2017

@dfear0 thanks for your feedback! I removed my "vote to close" label. 😄

I also added some new labels I'm experimenting with called Help Wanted: Code and Mentor: pdurbin. For more background on these, please see what I wrote at https://groups.google.com/d/msg/dataverse-dev/Pkces_MBqR8/4819N2tmBQAJ

@oscardssmith since there's a workaround of uploading using a zip file rather that a tarball, I changed this from "Type: Bug" to "Type: Suggestion". I think of this issue as implement a suggestion rather than fixing a bug. If you'd like to work on this one, please clear with with @djbrooke and I'm happy to mentor you. This one adds value to @dfear0 and probably others.

Oh, by the way, there's a "3.6 & 4.0 Feature Comparison" spreadsheet here if anyone is interested: https://docs.google.com/spreadsheets/d/1ftHM6E4b9Dft_AvA6KTzeklJyLguItz2_0hNAFL8-VM/edit?usp=sharing

@mankoff
Copy link
Contributor

mankoff commented Jul 30, 2020

I'm adding a vote to keep tar (and tar.gz?) files as an upload option with unpacking. The reason being that it is easier to generate tar files and rewrite filenames internally than zip files. For example:

tar -zvcf upload.tar.gz --transform 's/foo/bar/' ./files/*

Let's me rename files from foo to bar in the uploaded data set without renaming on my computer. I find this useful. I have folders of model output with complicated names, but want to upload them to different datasets with the same name. tar supports this behavior.

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests