Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message 'Not Implemented' #7

Closed
DonRichards opened this issue Feb 6, 2024 · 10 comments · Fixed by #8
Closed

Message 'Not Implemented' #7

DonRichards opened this issue Feb 6, 2024 · 10 comments · Fixed by #8
Assignees
Labels
bug Something isn't working

Comments

@DonRichards
Copy link

DonRichards commented Feb 6, 2024

Not sure what this error indicates.

I'm trying to upload FITS files to a DOI. I can use the UI and it uploads without an issue.

  • config.yml
persistent_id: doi:10.7281/T1/HSYYY0
dataverse_url: https://dataverse-test.jhu.edu
api_token: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
files:
- filepath: /mnt/FitsFiles/Platinu_1005323342868820864.fits
  mimetype: image/fits
  description: Posterior distributions of the stellar parameters for the star with
    ID from the Gaia catalog Platinu_1005323342868820864.
- filepath: /mnt/FitsFiles/Platinu_1006486484435386496.fits
  mimetype: image/fits
  description: Posterior distributions of the stellar parameters for the star with
    ID from the Gaia catalog Platinu_1006486484435386496.

I'm not sure if this is a problem with the uploader, the way I'm upload, the dataset in general, etc. Apologies if this turns out not to be associated with the python dvuploader. I assume it does since I can use the UI for the same files.

@JR-1991
Copy link
Member

JR-1991 commented Feb 7, 2024

@DonRichards, thank you for submitting the issue! Based on the error message you provided, it seems to originate from the AWS store. Unfortunately, the error message "Not implemented" is difficult to interpret. Could you please provide me with the version of Dataverse you are using?

I ran some local tests using Dataverse 6.0 and Localstack, which act as a simulation of AWS. However, I was unable to replicate the error. Both direct uploads to the S3 store and the native upload path worked. I plan to conduct further testing on an actual AWS store and hopefully identify the bug causing the issue.

I assume it does since I can use the UI for the same files.

As far as I know, the UI does not support direct uploads to an S3 store. Therefore, the uploads are done through the standard HTTP method available in DV's native API. This clarifies why the UI functions properly and suggests that the issue might lie with the AWS store.

@JR-1991 JR-1991 self-assigned this Feb 7, 2024
@JR-1991 JR-1991 added the bug Something isn't working label Feb 7, 2024
@DonRichards
Copy link
Author

I found something odd when I changed a variable name within my code I got a different error. From DVUploader(files=files) to DVUploader(files=upload_files). Not sure what this indicates.

When I examined the files being passed to the upload it looks like this. Do these values look correct? fileName and file_id I would expect to have something.

 File(
    filepath='/mnt/FitsFiles/Platinum_2416.fits',
    description='Posterior distributions of the stellar parameters for the star with ID from the Gaia DR3 catalog Platinum_2416.',
    directoryLabel='',
    mimeType='image/fits',
    categories=['DATA'],
    restrict=False,
    checksum_type=<ChecksumTypes.MD5: ('MD5',<built-in function openssl_md5>)>,
    storageIdentifier=None,
    fileName=None,
    checksum=None,
    to_replace=False,
    file_id=None
 ),

@JR-1991
Copy link
Member

JR-1991 commented Feb 9, 2024

@DonRichards this is expected since fileName and file_id are populated upon upload when hashes are calculated. Do you think this is confusing? I am happy to change it to extracting the filename when initialized.

Can you share the error message you have received upon changing variable names?

@DonRichards
Copy link
Author

It starts to upload but then throws this and exits
An error occurred with uploading: Cannot write to closing transport

Screenshot from 2024-02-09 14-14-34

@JR-1991
Copy link
Member

JR-1991 commented Feb 11, 2024

I came across this issue on StackOverflow, and found a solution provided by another user. I will implement the fix and create a pull request to see if it resolves the issue.

May I ask about your file size to test this on another server?

@DonRichards
Copy link
Author

Each of the 401,000 files I'm attempting to upload with a single DOI is approximately 1.6MB in size. I have a script that is breaking them up in batches of 20 at a times so the uploader should only be given a list of 20 files. Any idea what I can do from here to get this to work? I'd create a PR if I could but I don't know this app well enough.

@JR-1991
Copy link
Member

JR-1991 commented Feb 12, 2024

Great, thanks for the info! The PR is almost ready for submission. I'll run some tests on Demo Dataverse to check for any issues. Once I'm done, I'll let you know and you can test the updated version. Hope this will fix it 😊

@DonRichards
Copy link
Author

Great! Thanks for that!

@JR-1991
Copy link
Member

JR-1991 commented Feb 12, 2024

@DonRichards, I have created a pull request #8 that fixes the issue. Unfortunately, the issue is related to streaming files to the S3 backend. AWS is not capable of handling async streams, which is a pity.

To test this, I downloaded a sample FITS file and replicated it 2000 times to simulate a case similar to yours. The error has not been raised on our test server, and the upload works. The upload to S3 itself is quite fast if you set n_parallel_uploads to 30, but the only downside is that registering the uploaded files at Dataverse takes considerable time. DVUploader has no influence on the time it takes, unfortunately.

Can you test and verify that it works on your side?

Regarding the bulk upload in general, would it be an option to use Dataverse's native upload instead? This library supports automatic zipping of multiple file batches of max. 2 GB, which are unzipped on Dataverse's side if the direct upload is not enabled. This way, you may overcome the additional time to register files using direct upload.

@DonRichards
Copy link
Author

DonRichards commented Feb 12, 2024

Dumb question, how do I test the PR? Should I clone this repo and do something to my code to use the cloned repo instead of the python library? Googled it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants