-
Notifications
You must be signed in to change notification settings - Fork 51
HTTP Streaming, and Cross Origin Resource Sharing (CORS), and the AWS CloudFront Cache
At Northwestern University Library, we have experienced a number of issues getting content to stream over HTTP from our AWS-hosted Audiovisual Repository (AVR) to Chrome and Firefox.
When a page served from one site requests a resource (such as a stream) from another site, it’s called a cross-origin request. W3C has a recommendation document dealing with CORS requests, client/server behavior, and recommended browser behaviors.
When a browser makes a cross-origin request for a resource, it adds an Origin
header indicating the domain of the page requesting the stream. If that Origin is authorized to play the stream, the server responds with an Access-Control-Allow-Origin
header. If that header is not present or doesn’t match the requesting domain, the browser won’t play back the stream even though it successfully retrieved it. This is important: CORS is for client-side decisions about what content to execute or play back, not about whether a resource can be retrieved in the first place.
AWS’ CORS request handling happens in S3, not in CloudFront. CloudFront passes the request to S3 – along with the Origin
header, if present – and caches the result. But the decision about whether to include the Access-Control-Allow-Origin
header in the response is S3’s to make.
NUL’s AVR uses the following CORS configuration on the S3 bucket hosting its derivatives:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*.northwestern.edu</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>Authorization</AllowedHeader>
<AllowedHeader>Access-Control-Allow-Origin</AllowedHeader>
</CORSRule>
</CORSConfiguration>
This does not affect any given client’s authorization to retrieve content from the bucket. (That’s handled by CloudFront via request signing.) All it does is make sure that any request containing an Origin: some.server.at.northwestern.edu
header receives a response that includes an Access-Control-Allow-Origin: some.server.at.northwestern.edu
, and that any request that does not specify a northwestern.edu
origin doesn’t get an Access-Control-Allow-Origin
response header at all.
This is where things get dicey. Even with the correct CORS configuration on the S3 bucket, we found that some content simply failed to stream (or load) on Chrome and/or Firefox. The effect was inconsistent between browsers, different versions of the same browser, and even among different items being streamed to the same browser within the same session.
By default, CloudFront caches responses based on URL alone. That means that if the first request for a resource comes in without the proper Origin
header, the response that will be cached and sent in response to all other requests for that same URL will not contain the Access-Control-Allow-Origin
header the browser wants, regardless of whether those requests contain the proper Origin
header. It is therefore possible for the client to send the correct Origin
header and get a response that does not include a correct and meaningful Access-Control-Allow-Origin
header, depending on what is in the CloudFront cache.
The solution is to change CloudFront’s caching strategy. Instead of using only the URL as the cache key, CloudFront can use the URL plus the values of certain headers. Turning on (“whitelisting”) the Origin
header as part of the cache key guards against the cache misfire above, at the expense of some caching efficiency. For example, CloudFront now needs to cache a separate copy of the same content at the same URL for every Origin
header that comes in.
The reason Chrome and Firefox were affected (in different ways) while Safari was not has to do with their request strategies. Firefox seems to make a bare (Origin
-less) request for every .m3u8
playlist file before making a proper CORS request, which can poison the cache right off the bat. Chrome does something similar. Testing multiple browsers at once and switching back and forth can do unpredictable things based on these request behaviors.
Safari, on the other hand, sends an OPTIONS
request before its GET
request in order to determine whether a resource is CORS-authorized. Since CloudFront (by default) never caches responses to OPTIONS
requests, Safari’s CORS decision making is free from local or CloudFront cache interference.
This issue may crop up even outside the context of AWS/CloudFront. If there is a caching proxy (e.g., squid) in front of the HTTP streaming server, it also needs to be configured to make Origin
-based caching decisions, or modify the Access-Control-Allow-Origin
header on the fly.