Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlap with Client Hints esp. client tracking considerations #33

Closed
LPardue opened this issue Mar 10, 2020 · 13 comments
Closed

Overlap with Client Hints esp. client tracking considerations #33

LPardue opened this issue Mar 10, 2020 · 13 comments

Comments

@LPardue
Copy link

LPardue commented Mar 10, 2020

One of the first things that strikes me about this proposal is the amount of overlap with HTTP Client Hints (see here for some examples of its usage). It is in the Working Group Last Call stage and so is fairly well reviewed.

Have you considered that CMCD might actually be a client hint?

Now, I think the use case of CMCD has some difference to Client Hints, mainly due to the information coming from application data and sent via programmatic fetch. And I appreciate the desire to avoid pre-flight requests, leading to the usage of query-strings. However, I think there are considerations laid out in Client Hints that are suitable to incorporate or reference in CMCD. In particular, the CMCD's Security Considerations section focuses purely on the server. I find that a bit surprising given that the document's opening paragraph discusses session, content and device ID tracking. For example, Client Hints says

Client Hints mitigate the performance concerns by assuring that
clients will only send the request headers when they're actually
going to be used, and the privacy concerns of passive fingerprinting
by requiring explicit opt-in and disclosure of required headers by
the server through the use of the Accept-CH response header.

@mnot
Copy link

mnot commented Mar 12, 2020

See also whatwg/fetch#1006

@wilaw
Copy link
Contributor

wilaw commented Mar 26, 2020

We did think through a Client-Hint type of opt-in approach initially. The idea was that instead of sending CMCD automatically on all requests, the media player would first wait for the CDN to signal via response header (could be using Client Hint) that it understood CMCD headers and then start sending data on subsequent requests. This avoids sending data to CDNs that are not going to to process it. However this approach has a number of problems:

  1. The very first request would carry no CMCD data. This is a problem, as in many media playback scenarios, the first request is the playlist/manifest request, which is a critical component of the session to log and track.
  2. The client would need to track state across multiple CDNs and also multiple server-IPs that it may be talking to within a single CDN. This client complexity is a barrier to adoption and leads to further data gaps on each new connection per [1]

On the whole we consider the burden of sending CMCD data to non-compliant CDNs to be light. CDNs already deal with many custom headers and arbitrary query args millions of times per second. Unknown headers and query args are robustly ignored. Additionally the payload size is small relative to the media traffic being delivered. (the same would not be true if we were proposing this spec for all objects in a web page for example).

@lucas - we did think through the privacy and fingerprinting concerns behind sid, cid and did. Firstly, these are all optional fields. If the client does not want to send them, it should not. Session ID is randomly generated and it ties together the media object requests in a playback session. It does not identify the client as it is a GUID and is never repeated outside of that session. Device ID is intended to signal the player version and device type, as many instances of delivery problems are tied to specific device problems. Same is true for ContentID, which is a hash of the content being played. Both Device-ID and Content-ID could be used for fingerprinting, as they are invariant when the user plays the same content from the same device. To this end I have raised a new issue so add language to the document outlining this risk. see #45

@mark - re your comments on whatwg/fetch#1006 - "This doesn't seem like a good way to go; not only is the query string not intended for colonisation by standards documents, it's also going to make caching and other generic functions more difficult to interpose, and less efficient. Linking can become problematic too.". Initially CMCD purely used Headers for data transmission. However, as we looked in to MSE usage, which is a key target application space, the overhead of browser-based clients having to double their request rate due to the CORS pre-flight request was untenable, especially for low latency streaming applications such as LL-HLS in which multiple requests are made per second. CMCD is not a core protocol standard, as much of the IETF work. Rather it is an application convention, between adaptive segmented media players and the CDNs that deliver them data. To that extent, we felt that allowing the usage of a predefined query-arg was the next best alternate to enable widespread adoption among media players. All modern CDNs have the ability to ignore certain query args in their cache keys and hence the caching issue we feel is not a significant obstacle. If we do not allow query-arg transmission, what would a better solution for clients for which sending custom headers is expensive?

@wilaw
Copy link
Contributor

wilaw commented May 14, 2020

Valid questions being asked here. Question for group - do we get an OPTIONS request with every single object request from a CORS-restricted client with a custom header, or just the first request to that host on that connection? We need to resolve before discussing further.

@wilaw
Copy link
Contributor

wilaw commented May 14, 2020

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Max-Age

MAX-Age can be used to set TTL of OPTIONS responses. This may be a workaround if we enforce a header-only approach.

@wilaw
Copy link
Contributor

wilaw commented May 15, 2020

I built a little test mse player which added a custom header with each request. This header was advertised by the server in its access-control-allow-headers response header, so it replicates the use case we are considering here for CMCD. The player made an OPTIONS request with every single segment request, even though they were all against the same domain. Segment requests were 2s apart.

Screen Shot 2020-05-14 at 11 13 50 PM

The server was not responding with an Access-Control-Max-Age header, although Chrome is meant to apply a default age of 5s and the requests were 2s apart.

@wilaw
Copy link
Contributor

wilaw commented May 15, 2020

I tested with MAX-AGE headers varying from 24hrs to 5min. All were ignored by Chrome, Firefox and Safari. The reason is that "In order to reduce the number of preflight requests, CORS has the concept of a preflight cache. However the preflight information is cached for an origin/url pair. That means each unique url has its own preflight cache. An API with the same preflight response across all urls will still receive a preflight request for each unique request.". So preflight responses are cached per URL and not per HOST. Since each media object request URL is unique, each segment request is preceded by an options request.

Also beware that latest Chrome network panel HIDES all options requests and does not show them. This leads to the false belief that they are not being made. https://httptoolkit.tech/blog/chrome-79-doesnt-show-cors-preflight

@mnot
Copy link

mnot commented May 16, 2020

@annevk fyi

@annevk
Copy link

annevk commented May 16, 2020

I think you already pointed out the most relevant issue. I don't think there's opposition to having an origin-wide or connection-wide bypass-CORS-preflights flag, but getting it defined is another matter.

@wilaw
Copy link
Contributor

wilaw commented May 21, 2020

Agreed that OPTIONS should have a response mode from the server authorizing a hostname to be valid for subsequent requests over the TTL defined by the MAX-AGE response.

We will initiate a long term action to request an origin-wide or connection-wide bypass-CORS-preflights flag.

In the meanwhile, we will persist with our query-arg options as the least-worst alternative to continual preflight requests. We will also add language to the spec indicating that the preferred mode of transmission for HTTP requests is to use custom headers. We will also consider placement of the query arg payload in the path, as is done with https://tools.ietf.org/html/draft-ietf-cdni-uri-signing-19.

@wilaw
Copy link
Contributor

wilaw commented Jun 11, 2020

Improved the definition of query arg transmission to include

CMCD=<URL_encoded_concatenation_of_key-value_pairs><reserved_character>

The reserved character is defined by [RFC3986]. This reserved character is optional at the end of the URL.

We will initiate a long term action to request an origin-wide or connection-wide bypass-CORS-preflights flag.

In the meanwhile, we will persist with our query-arg options as the least-worst alternative to continual preflight requests.

@wilaw wilaw closed this as completed Jun 11, 2020
@annevk
Copy link

annevk commented Jun 12, 2020

If by query-arg you mean the query part of the URL, why would that not generate a lot of CORS preflights?

@wilaw
Copy link
Contributor

wilaw commented Jun 12, 2020

@annevk - by query-arg I mean to reference the query-arg mode of transmission within CMCD, which does not use custom-request headers. Adding a query arg to a GET/HEAD/POST request that is only using CORS-safelisted-request-headers to a CORS access controlled host will not result in pre-flight requests.

@annevk
Copy link

annevk commented Jun 13, 2020

Ah, that's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants