-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow file uploads for non-zip files through the API #1612
Comments
👍 |
👍 I think having an endpoint that would accept multipart forms containing files would make the most sense. |
I agree - the native upload API should allow both individuals files and zipped files (as it is the case through the web UI for data upload). |
@rliebz @samchrisinger @chrisseto in Dataverse you can fill in a description for a file when you upload it via the GUI. Do you have any ideas of how you would ideally fill in a description for a file uploaded via an API endpoint? Would it be two curl commands or one? I'm talking about a non-zipped file, a PNG for example. Maybe the description is "a picture of my cat". How do we get that description to the Dataverse server? |
@pdurbin I would suggest using multipart forms and then parsing them based on a the name attribute or just nesting them inside of each other. That way you could upload a any amount of files with metadata in a single request. Some relevant RFCs are here and here. Ideally a request would look like that from the bottom of the former link.
|
👍 |
1 similar comment
👍 |
@chrisseto I like this idea! However, rather than...
... we actually might want to take this one step further since we support more metadata at the file level than just "description". We also support categories so perhaps instead of
And I suppose we could support more metadata at the file level in the future this way. See also this related ticket that has more of an emphasize on the SWORD API than the Native API: Let file metadata (i.e. description) be specified during zip upload #723 |
This sounds great. |
@pdurbin from an API point of view allowing json file metadata would be a great idea! You may want to keep in mind that if you kept everything as form fields then this endpoint could become the canonical file upload endpoint. This way you can can have slightly DRYer code and only have to worry about testing a single endpoint. Either way is great IMO 👍 |
Since I was just looking at the SWORD spec again for @posixeleni I'll mention that while the Dataverse implementation of SWORDv2 only accepts zip files, this was an implementation decision on my part. Here's what the SWORD spec says:
It was for this reason that I had Dataverse support a simple zip format for upload via SWORD. I figured I had to if clients are supposed to assume that format if a format is not specified. All this is to say that we probably could support various formats in the SWORD API. My preference, however, is to develop a "native" API endpoint for uploading files so we won't be constrained by SWORD (but that spec if full of good ideas about format negotiation and such... good job @richard-jones). |
👍 |
Having this API available is a dependency for the Sloan grant and adding subsets to the Dataverse. |
@richarda23 hi! Perhaps you saw IQSS/dataverse-client-java#8 where I mentioned that the new "native" add and replace APIs are available for testing on https://dev1.dataverse.org Everyone should please check out the announcement I made at https://groups.google.com/d/msg/dataverse-community/8WTs3wYF6dc/AzPOxzRKFwAJ for details about how we'd love API users out there to try out the new APIs before we ship them. @richarda23 don't worry, the SWORD API will remain unchanged. We're adding new APIs based on feedback on this issue from @rliebz @samchrisinger @chrisseto @keflavich and others. As usual, the API endpoints are documented in the API Guide, so please look for "/add" and "/replace" under "Native API" on this page for now http://guides.dataverse.org/en/2290-file-replace/api/native-api.html . We'll make a pull request soon for #2290 and this issue (#1612) will be associated with that pull request. |
I had a look at the new file add/replace methods - this will be great if we can eventually drop the Sword dependency, as the Sword library we use in the Java client is moribund and not available on Maven Central. If we can use pure native Dataverse API, we can remove this dependency and we'll be able to more easily make the Java client available on Maven. Anyway.. several comments.. probably some of these are not unique to these 2 methods:
works nicely.
fails with output:
which I'm not quite sure what to do about - could the error message perhaps say something about what's needed to make the query correct? |
@otter606 thanks for the feedback and I see you're working away on IQSS/dataverse-client-java#8 ! Much appreciated! Yeah, the PUT vs. POST was a problem with the API Guide so I fixed it in a784a40. Thanks for catching that. Both "add" and "replace" use POST but now you've got me wondering if "replace" should use PUT instead. I need to refresh my REST thinking and maybe check with you, @michbarsinai @raprasad @landreev @scolapasta and others. As for the It is by design that a file with identical content (md5) is rejected by the "replace" API endpoint. I hear you on the lack of curl examples for "add" and "replace". I need to switch gears but I'll try to come back to this. Honestly, I think the API Guide could be much better organized but I don't want to bite off too much. Please keep the feedback coming! That goes for everyone! 😄 Thanks, @otter606 !! |
I tried the replace method again but got a new error saying I didn't have permission, any ideas?: Given an original successful command to add a file: which produces gives an error message: |
Hi Phil, From the REST point of view - I'm not super familiar with the issue itself - these are the semantics:
Generally, in the context of files, you'd expect new files to be POSTed to some dataset/files URL (so the server can generate a unique, legal, URL, if it decides to accept the files). File replaces could be done using PUT, since a replace action changes the content of a known, legal URL. But again - that's the "HTTP purist view", in some cases it may make sense to bend these rules. -- M * The Dataverse API returns 201 for some endpoints and 200 on others - we should improve on that but it's not very pressing. At least, I'm not aware of anyone asking for a proper 201 response. |
That makes sense - adding a new File and creating something with a new ID
should be a POST. Not sure about replace - if there is no new entity or URL
generated then I'd think a PUT would be more appropriate?
…On Sun, Jan 22, 2017 at 10:17 PM, Michael ***@***.***> wrote:
Hi Phil,
From the REST point of view - I'm not super familiar with the issue itself
- these are the semantics:
- POST $url => "dear server, store the content of this message
somewhere under $url. So if I take a resource and send it to a server in
POST a/b/c/, I can reasonably expect it to have a the URL a/b/c/12345, but
not a/x/7777 (it can have that URL additionally, but a URL under a/b/c/ is
"mandatory"). Ideally, the server should return a 201 CREATED
<https://tools.ietf.org/html/rfc2616#section-10.2.2> with the actual
URL, rather than 200 OK*.
- PUT $url => "dear server, store the content of this message at
$url". Subsequent GET $url should return the sent content.
Generally, in the context of files, you'd expect new files to be POSTed to
some dataset/files URL (so the server can generate a unique, legal, URL, if
it decides to accept the files). File replaces could be done using PUT,
since a replace action changes the content of a known, legal URL.
But again - that's the "HTTP purist view", in some cases it may make sense
to bend these rules.
-- M
* The Dataverse API returns 201 for some endpoints and 200 on others - we
should improve on that but it's not very pressing. At least, I'm not aware
of anyone asking for a proper 201 response.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1612 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJA6nSIpWb_edkmwBNuZykId62phcRDbks5rU9VugaJpZM4Dqu4Y>
.
|
In general, yes (again, I'm nor fully familiar with this case).
The intuition is: if the server has to decide on the URL, use `POST` to the some prefix of the future URL. If you already know the URL (e.g. it was given or can be calculated) use `PUT` to that URL.
…-- Michael
On 23 Jan 2017, at 14:20, otter606 ***@***.***> wrote:
That makes sense - adding a new File and creating something with a new ID
should be a POST. Not sure about replace - if there is no new entity or URL
generated then I'd think a PUT would be more appropriate?
On Sun, Jan 22, 2017 at 10:17 PM, Michael ***@***.***> wrote:
> Hi Phil,
>
> From the REST point of view - I'm not super familiar with the issue itself
> - these are the semantics:
>
> - POST $url => "dear server, store the content of this message
> somewhere under $url. So if I take a resource and send it to a server in
> POST a/b/c/, I can reasonably expect it to have a the URL a/b/c/12345, but
> not a/x/7777 (it can have that URL additionally, but a URL under a/b/c/ is
> "mandatory"). Ideally, the server should return a 201 CREATED
> <https://tools.ietf.org/html/rfc2616#section-10.2.2> with the actual
> URL, rather than 200 OK*.
> - PUT $url => "dear server, store the content of this message at
> $url". Subsequent GET $url should return the sent content.
>
> Generally, in the context of files, you'd expect new files to be POSTed to
> some dataset/files URL (so the server can generate a unique, legal, URL, if
> it decides to accept the files). File replaces could be done using PUT,
> since a replace action changes the content of a known, legal URL.
>
> But again - that's the "HTTP purist view", in some cases it may make sense
> to bend these rules.
>
> -- M
>
> * The Dataverse API returns 201 for some endpoints and 200 on others - we
> should improve on that but it's not very pressing. At least, I'm not aware
> of anyone asking for a proper 201 response.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1612 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AJA6nSIpWb_edkmwBNuZykId62phcRDbks5rU9VugaJpZM4Dqu4Y>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1612 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB2UJIo8ZsUGUYo5BbLpQu98g2oofRjQks5rVJr_gaJpZM4Dqu4Y>.
|
Also "add" example to phoenix scripts.
@otter606 I added some curl examples for "add" and "replace" in cc1d614 which you can see at http://guides.dataverse.org/en/2290-file-replace/api/native-api.html . I left the Python examples in there. |
@michbarsinai linked to the HTTP 1.1 RFC (2616) which I think is a fine place for guidance on POST vs. PUT in the context of the new "replace" endpoint. (I think we all agree that POST is fine for the new "add" endpoint.) Here's how each section starts:
To me, PUT feels wrong for "replace" because what's in the URI simply identifies the database id of the file to be replaced but content of the file (md5) is not actually changed. What we're really doing is uploading a file with different content (different md5) and that newly uploaded file will get a new database id (the next id available). As @michbarsinai says, perhaps we should make sure we're returning 201 instead of 200 since something new is being created, a new database id of a file, for both "add" and "replace". Pull request #3579 is currently in QA, however, so we aren't touching the branch at this time. Also, as @michbarsinai mentioned, API users aren't exactly clamoring for 201 return codes. To me, the important thing is that we don't break people's code by changing the API. See the recent Spec-ulation Keynote by Rich Hickey for more on not breaking people's code. 😄 I'll end by showing the diagram @raprasad drew on my whiteboard after absorbing what @michbarsinai was saying about if the end URL is known or not.
|
Works, issue with not respecting file upload size limit but that is mentioned in #2290 and being worked on/ updated there. |
I tested and documented the forceReplace boolean in 2a56e58. |
Currently uploading files to a Dataset on Dataverse 4.0 is only possible using the Sword API described at http://guides.dataverse.org/en/latest/api/sword.html#add-files-to-a-dataset-with-a-zip-file. This has the disadvantage of requiring that files be zipped before upload.
I imagine allowing file uploads through the native API could allow files to be uploaded without first having to zip them.
The text was updated successfully, but these errors were encountered: