Message 'Not Implemented' #7

DonRichards · 2024-02-06T21:13:59Z

Not sure what this error indicates.

I'm trying to upload FITS files to a DOI. I can use the UI and it uploads without an issue.

config.yml

persistent_id: doi:10.7281/T1/HSYYY0
dataverse_url: https://dataverse-test.jhu.edu
api_token: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
files:
- filepath: /mnt/FitsFiles/Platinu_1005323342868820864.fits
  mimetype: image/fits
  description: Posterior distributions of the stellar parameters for the star with
    ID from the Gaia catalog Platinu_1005323342868820864.
- filepath: /mnt/FitsFiles/Platinu_1006486484435386496.fits
  mimetype: image/fits
  description: Posterior distributions of the stellar parameters for the star with
    ID from the Gaia catalog Platinu_1006486484435386496.

I'm not sure if this is a problem with the uploader, the way I'm upload, the dataset in general, etc. Apologies if this turns out not to be associated with the python dvuploader. I assume it does since I can use the UI for the same files.

JR-1991 · 2024-02-07T10:54:54Z

@DonRichards, thank you for submitting the issue! Based on the error message you provided, it seems to originate from the AWS store. Unfortunately, the error message "Not implemented" is difficult to interpret. Could you please provide me with the version of Dataverse you are using?

I ran some local tests using Dataverse 6.0 and Localstack, which act as a simulation of AWS. However, I was unable to replicate the error. Both direct uploads to the S3 store and the native upload path worked. I plan to conduct further testing on an actual AWS store and hopefully identify the bug causing the issue.

I assume it does since I can use the UI for the same files.

As far as I know, the UI does not support direct uploads to an S3 store. Therefore, the uploads are done through the standard HTTP method available in DV's native API. This clarifies why the UI functions properly and suggests that the issue might lie with the AWS store.

DonRichards · 2024-02-09T16:48:57Z

I found something odd when I changed a variable name within my code I got a different error. From DVUploader(files=files) to DVUploader(files=upload_files). Not sure what this indicates.

When I examined the files being passed to the upload it looks like this. Do these values look correct? fileName and file_id I would expect to have something.

 File(
    filepath='/mnt/FitsFiles/Platinum_2416.fits',
    description='Posterior distributions of the stellar parameters for the star with ID from the Gaia DR3 catalog Platinum_2416.',
    directoryLabel='',
    mimeType='image/fits',
    categories=['DATA'],
    restrict=False,
    checksum_type=<ChecksumTypes.MD5: ('MD5',<built-in function openssl_md5>)>,
    storageIdentifier=None,
    fileName=None,
    checksum=None,
    to_replace=False,
    file_id=None
 ),

JR-1991 · 2024-02-09T17:05:18Z

@DonRichards this is expected since fileName and file_id are populated upon upload when hashes are calculated. Do you think this is confusing? I am happy to change it to extracting the filename when initialized.

Can you share the error message you have received upon changing variable names?

DonRichards · 2024-02-09T19:15:34Z

It starts to upload but then throws this and exits
An error occurred with uploading: Cannot write to closing transport

JR-1991 · 2024-02-11T10:01:29Z

I came across this issue on StackOverflow, and found a solution provided by another user. I will implement the fix and create a pull request to see if it resolves the issue.

May I ask about your file size to test this on another server?

DonRichards · 2024-02-12T15:14:28Z

Each of the 401,000 files I'm attempting to upload with a single DOI is approximately 1.6MB in size. I have a script that is breaking them up in batches of 20 at a times so the uploader should only be given a list of 20 files. Any idea what I can do from here to get this to work? I'd create a PR if I could but I don't know this app well enough.

JR-1991 · 2024-02-12T15:20:36Z

Great, thanks for the info! The PR is almost ready for submission. I'll run some tests on Demo Dataverse to check for any issues. Once I'm done, I'll let you know and you can test the updated version. Hope this will fix it 😊

DonRichards · 2024-02-12T15:34:37Z

Great! Thanks for that!

JR-1991 · 2024-02-12T19:19:30Z

@DonRichards, I have created a pull request #8 that fixes the issue. Unfortunately, the issue is related to streaming files to the S3 backend. AWS is not capable of handling async streams, which is a pity.

To test this, I downloaded a sample FITS file and replicated it 2000 times to simulate a case similar to yours. The error has not been raised on our test server, and the upload works. The upload to S3 itself is quite fast if you set n_parallel_uploads to 30, but the only downside is that registering the uploaded files at Dataverse takes considerable time. DVUploader has no influence on the time it takes, unfortunately.

Can you test and verify that it works on your side?

Regarding the bulk upload in general, would it be an option to use Dataverse's native upload instead? This library supports automatic zipping of multiple file batches of max. 2 GB, which are unzipped on Dataverse's side if the direct upload is not enabled. This way, you may overcome the additional time to register files using direct upload.

DonRichards · 2024-02-12T21:25:12Z

~~Dumb question, how do I test the PR? Should I clone this repo and do something to my code to use the cloned repo instead of the python library?~~ Googled it

JR-1991 self-assigned this Feb 7, 2024

JR-1991 added the bug Something isn't working label Feb 7, 2024

JR-1991 mentioned this issue Feb 12, 2024

Fix singlepart direct upload #8

Merged

JR-1991 closed this as completed in #8 Mar 4, 2024

JR-1991 mentioned this issue Aug 26, 2024

Extend Direct Upload docs for async stream uploads IQSS/dataverse#10798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Message 'Not Implemented' #7

Message 'Not Implemented' #7

DonRichards commented Feb 6, 2024 •

edited

Loading

JR-1991 commented Feb 7, 2024

DonRichards commented Feb 9, 2024

JR-1991 commented Feb 9, 2024 •

edited

Loading

DonRichards commented Feb 9, 2024

JR-1991 commented Feb 11, 2024

DonRichards commented Feb 12, 2024

JR-1991 commented Feb 12, 2024

DonRichards commented Feb 12, 2024

JR-1991 commented Feb 12, 2024

DonRichards commented Feb 12, 2024 •

edited

Loading

Message 'Not Implemented' #7

Message 'Not Implemented' #7

Comments

DonRichards commented Feb 6, 2024 • edited Loading

JR-1991 commented Feb 7, 2024

DonRichards commented Feb 9, 2024

JR-1991 commented Feb 9, 2024 • edited Loading

DonRichards commented Feb 9, 2024

JR-1991 commented Feb 11, 2024

DonRichards commented Feb 12, 2024

JR-1991 commented Feb 12, 2024

DonRichards commented Feb 12, 2024

JR-1991 commented Feb 12, 2024

DonRichards commented Feb 12, 2024 • edited Loading

DonRichards commented Feb 6, 2024 •

edited

Loading

JR-1991 commented Feb 9, 2024 •

edited

Loading

DonRichards commented Feb 12, 2024 •

edited

Loading