Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the missing "original format type", left by the bug in 4.6.2 #4005

Closed
landreev opened this issue Jul 13, 2017 · 5 comments
Closed

Fix the missing "original format type", left by the bug in 4.6.2 #4005

landreev opened this issue Jul 13, 2017 · 5 comments
Assignees

Comments

@landreev
Copy link
Contributor

A bug in 4.6.2 left the ingested tabular files with the "tab-separated-values" in the "original format" field (instead of the actual original format).
The bug was addressed in #3952. Meaning that, as of 4.7 the original type is being set correctly again.
However, the fix in #3952 was for the bug only. Meaning that the tabular files ingested under 4.6.2 are still missing the valid type entry. So this needs to be fixed, both in our production database, and in the databases of everybody else who has installed 4.6.2.

The fix is fairly straightforward. We just need to decide whether it should be a standalone script that we'll provide, or a special API call. (I'm leaning towards the latter).

@djbrooke djbrooke added this to the 4.8 - Large Data Upload Integration milestone Jul 20, 2017
landreev added a commit that referenced this issue Jul 25, 2017
@landreev
Copy link
Contributor Author

Pull request: #4022

Moving into code review.

@landreev
Copy link
Contributor Author

landreev commented Aug 1, 2017

For the record, all these files with missing original formats in our production have been fixed. But, we still need to QA this API fix and pass it to all the other Dataverse installations. Hopefully with the next release.

@landreev
Copy link
Contributor Author

landreev commented Aug 2, 2017

The following should be added to the release notes for 4.7.2 (and can be used to test the fix, for example on vm5):

IMPORTANT: A bug introduced in v.4.6.2 (and fixed in 4.7) resulted in ingested tabular files missing the "original format type" label. The most obvious manifestation of the bug would be in the "Download" pulldown menu - you would see "Original File Format (Tabular Data)", instead of the real original format - such as Stata, CSV, SPSS etc.
This release comes with a fix that can be run once to restore all these missing original type labels.

To find out if your Dataverse have any tabular data files affected by the bug, you can run the following database query:

SELECT id FROM datatable WHERE originalfileformat='text/tab-separated-values'

Generally, if any tabular files were ingested in your Dataverse while it was running v. 4.6.2, they are affected by this issue.

To restore all the missing type labels, run the fix with the following API call:

http://localhost:8080/api/admin/datafiles/integrity/fixmissingoriginaltypes

This will produce a confirmation message, including the number of the affected files on your system, and start the fixer job in the background. To monitor on the progress, look for messages that look like this in your main Glassfish server.log file:

Original file type determined: application/x-stata (file id=..., datatable id=...; file path: ...

(end of the instruction).

dvn-vm5 is now running the copy of the prod. db that has all the affected files ingested back in June, still not fixed; so it can be used to re-test the fix. Note that I've copied the saved originals for these files onto vm5 too - as the fix need them to determine the type.

@kcondon
Copy link
Contributor

kcondon commented Aug 3, 2017

OK tested this and it works as described. A couple notes/ improvements:
Code:

  1. add "done" message in the logs when it finishes. now it just stops logging.
  2. For the release notes info, make the api endpoint a curl command and say run this rather than just list the endpoint.
  3. suggest they run the sql command again after the api is run to demonstrate no problems remain.
    Otherwise looks great! I'm adding a note in my deployment doc to add this note to my release notes.

@landreev
Copy link
Contributor Author

landreev commented Aug 4, 2017

Added the extra logging message; also synced the branch up with develop.

@kcondon kcondon closed this as completed Aug 4, 2017
@kcondon kcondon removed the Status: QA label Aug 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants