Skip to content

Using Previewers with download redirects from S3

qqmyers edited this page May 4, 2021 · 3 revisions

As noted in the README file, to use these previewers with the Dataverse option supporting direct download of files form S3, you must configure the S3 provider to allow CORS requests. For Amazon, this can be done via the aws client:

aws s3api put-bucket-cors --bucket <bucket-name> --cors-configuration file://cors.json

where cors.json contains something like:

 {
   "CORSRules": [
           {
             "AllowedOrigins":["*"],
             "AllowedHeaders":["x-requested-with"],
             "AllowedMethods":["GET"]
           }
          ]
 }

These settings will work when using the Previewers from the https://globaldataversecommunityconsortium.github.io website.

With the additions show below, this should also support s3 direct uploads:

 {
   "CORSRules": [
           {
             "AllowedOrigins":["*"],
             "AllowedHeaders":["x-requested-with"],
             "AllowedMethods":["GET", "PUT"],
             "ExposeHeaders": ["ETag"]
           }
          ]
 }

If you are interested in restricting the AllowedOrigins and AllowedHeaders:

Allowing the "x-requested-with" header is required and the above example is the most restrictive that works. (You may have, or may see examples where, AllowedHeaders are set to "*" - that can be constrained to just the "x-requested-with" header without affecting the previewers or direct upload). The jQuery Ajax queries used in some previewers to retrieve the data will fail if this header is not allowed.

Changing the AllowedOrigins from '*', allowing all servers to make Cross-Origin-Requests is more complex, and, as far as I know, will require you to host the previewers yourself (potentially in your own github/github.io repository): Nominally, one can specify a list of servers that should be allowed to make requests and, with the previewers hosted at https://globaldataversecommunityconsoritum.github.io, it would at first seem like using this host in AllowedOrigins would be sufficient. However, the are further restrictions related to the Content-Security-Policy standard that prevent Jquery from making ajax requests that include the Origin header that would allow the S3 server to match against the AllowedHeaders (actually Origin: null is sent).

To allow the Origin header to be sent, the web server from which the previewers are being served must be configured to add Content-Security-Policy header specifying the S3 server and the Dataverse server used as connect-src entries. This can be set in Apache or other web servers as outlined in https://content-security-policy.com/. It can also be set by adding a line to the previewer html files themselves, which would allow them to be hosted on a github.io site for your Dataverse instance: To do this, you would add a line in the

section of each previewer file, e.g. in TextPreview.html:

<meta http-equiv="Content-Security-Policy" content="connect-src <Dataverse Server URL> <S3 Server URL>">

If the Previewers are being served from the same host as Dataverse, the Dataverse URL can be replaced with 'self' So, full examples would be:

<meta http-equiv="Content-Security-Policy" content="connect-src https://demo.dataverse.org https://abucket.s3.amazonaws.com">

and

<meta http-equiv="Content-Security-Policy" content="connect-src 'self' https://thedvbucket.s3.amazonaws.com">

With that change, or with the same header being set by the server, the CORS policy can be restricted to only allow the preview server host:

{
  "CORSRules": [
          {
            "AllowedOrigins":["https://yourpreviewerserver.example.com"],
            "AllowedHeaders":["x-requested-with"],
            "AllowedMethods":["GET"]
          }
         ]
}