-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading files larger than 2GB does not work #137
Comments
Forgive me it this isn't relevant. Uploading really large files - in my case Lidar data - I use an s3 bucket set for direct-upload. Now that doesn't work with pyDataverse but for uploading really large files individually a direct-upload bucket is helpful. |
I understand that this is not relevant for you. However, if the dataverse installation in question does not use an s3 storage backend, then this becomes instantly relevant. |
The issue is, i am on parental leave right now (until may 2022), and we at AUSSDA do not use S3 - so I can not test this. The best way to move forward, would be to resolve the issue by yourselves. |
We also just ran into this. From looking at the Dataverse side, uploads using For the sending side, looks like "requests-toolbelt" has something we could use: https://toolbelt.readthedocs.io/en/latest/uploading-data.html Maybe it would be good to detect the filesize and either go for a normal upload when <2GB or multipart for larger? (I don't have the capacity right now to look into this.) |
Can this bug be reproduced at https://demo.dataverse.org ? Currently the file upload limit there is 2.5 GB, high enough for a proper test, it would seem. |
Also related to #136 |
Update: I left AUSSDA, so my funding for pyDataverse development has stopped. I want to get some basic funding to implement the most urgent updates (PRs, Bug fixes, maintenance work). If you can support this, please reach out to me. (www.stefankasberger.at). If you have feature requests, the same. Another option would be, that someone else helps with the development and / or maintenance. For this, also get in touch with me (or comment here). |
I know I shall not expect movement here (unless someone else picks it up or we find funding). But to not let newly found insights slip away and for what it's worth: how about exchanging I know aiohttp is much larger as a dependency, but it does support multipart uploads. https://docs.aiohttp.org/en/stable/multipart.html |
Not sure that helps out-of-the-box since our multipart direct upload involves contacting Dataverse to get signed URLs for the S3 parts, etc. FWIW, I think @landreev implemented our mechanism in python, it just hasn't been integrated with pyDataverse. |
@qqmyers you are right - direct upload needs more. Maybe one day we also extend pyDataverse for this. That said: this issue here is about uploading with simple HTTP upload via API. As |
Bug report
1. Describe your environment
2. Actual behaviour:
Trying to upload a file larger than 2GB causes an error. Uploading the same file using curl works fine.
3. Expected behaviour:
To upload the file. Or at least say that this will not work because the file is too big.
4. Steps to reproduce
The program and stack trace are as follows:
5. Possible solution
Some possible solutions streaming upload or chunk-encoded request) are written here:
https://stackoverflow.com/questions/53095132/how-to-upload-chunks-of-a-string-longer-than-2147483647-bytes
I am not very versed in python, but I will try to fix this in the following week, and submit a pull request. If I fail, feel free to fix this bug!
The text was updated successfully, but these errors were encountered: