Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving the original file type for ingested tabular files is broken as of 4.6.2 #3952

Closed
landreev opened this issue Jun 22, 2017 · 7 comments

Comments

@landreev
Copy link
Contributor

landreev commented Jun 22, 2017

Tabular ingest has been modified in 4.6.2 to better handle error conditions, and prevent leaving the files in "half-ingested" state. To achieve this we had to change the order in which we modify the content of the file and the database entries. One unfortunate side effect of that change ia that we now have this line:

dataFile.setContentType(FileUtil.MIME_TYPE_TAB);

before this one:

tabDataIngest.getDataTable().setOriginalFileFormat(dataFile.getContentType());

meaning, the type is already overwritten by the time we try to preserve it.
The fix is to swap the order of the 2 lines (trivial).
But also provide a script, or an API call that would properly reset these original format entries for all the files that were ingested in 4.6.2; fairly straightforward too.

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

@landreev good catch. While we're in that part of the code I wonder if we could also address #2734 or at least come up with a game plan.

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

Let's also at least do a little investigation of #2822

@landreev
Copy link
Contributor Author

A direct effect of this for the user is that downloading the original results in getting it without any extension. They still get a correct file, and if they figure out what to open it with - Stata, SPSS, etc. - it can be used... but definitely a usability issue.

@landreev
Copy link
Contributor Author

(the above has been confirmed)

landreev added a commit that referenced this issue Jun 23, 2017
@landreev
Copy link
Contributor Author

Checked in a quick fix, in branch 3952-fix-original-type.
This is just a fix for the bug itself; with the plan to (potentially) release it in 4.7. The fix for the affected files already in the database will require a little bit more coding; and will be addressed in 4.7.1

@landreev
Copy link
Contributor Author

@pdurbin #2822 is pretty straightforward. When a file is ingested, its type becomes "tabular". But the original type is preserved in DataTable.OriginalFileFormat.
It can be searched on in the database; but it appears that we are not indexing it; and that's what this user asked for.
Would be trivial to address; we somehow just missed the request and never addressed it.

@landreev
Copy link
Contributor Author

pull request: #3954

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants