-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix singlepart direct upload #8
Conversation
@DonRichards, sorry for the delay in response. Yes, this is a bottleneck, unfortunately. I have tried to extend the maximum concurrency of registration tasks, but it failed. Dataverse likely struggles to process many requests simultaneously and simply errors out if there are too many. I have added a soft fix for this by allowing requests to be retried upon failure. Although this is not a guaranteed speed-up, it might be helpful to increase performance slightly. Would you mind trying it out to see if it helped in your case? If this is still too slow, an option would be to divide your files into multiple |
Any suggestions on how to trace why the registration of files has stopped working suddenly? Is there a way to see what's causing the registration to fail? |
@DonRichards this is most likely due to Dataverse shutting down the connection due to too many requests. I am still trying to find a sweet spot, but it varies greatly between instances. You can only traceback the actual error within the logs of your Dataverse instance. |
@DonRichards good news! I have talked to the Dataverse Dev Team, and there is a way to register bulk data at Dataverse without requiring a request per file. Hence, the registration is now way faster and more stable. I have just pushed the changes to this PR and prior tested it with 10k small files locally without any issues. Do you mind testing the updated PR? |
Tested it with batches of 200 files at a time and it works as expected. |
@DonRichards thanks for testing! Does this resolve your issue #7? |
I do believe so. Thanks! I really appreciate the work. |
@DonRichards perfect! Will merge this PR then to close the issue #7 |
Overview
In issue #7, it was highlighted and discussed that direct upload of a single file (not multipart) to an S3 storage raises a
Not implemented
exception on AWS side. This issue is related to streaming files for POSTing to the S3 storage. To tackle this issue, thefile_sender
function has been removed and replaced with a simpleopen
function to upload a file. Additionally, this PR introduces some printing enhancements and allows to force native upload.Changes
open
instead offile_sender
for file uploads.Closes
closes #7