Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Valid baggage keys #117

Open
james-callahan opened this issue May 2, 2018 · 5 comments
Open

Valid baggage keys #117

james-callahan opened this issue May 2, 2018 · 5 comments

Comments

@james-callahan
Copy link

Background

Baggage keys are currently specified to be "a string". However baggage often needs to be transmitted in places that don't support a clean string namespace.
e.g. when transmitted as an http header (with uberctx-mykey) the key cannot contain a colon.
This can get more complex if baggage keys go through unicode or case normalisation.

Proposal

Either the domain of keys needs to be reduced (e.g. mandate lower-case keys) or the full string-space needs to be called out so that encodings (such as the uberctx http header prefix) don't miss an encoding/escaping step.

@yurishkuro
Copy link
Member

I consider uberctx-{key} scheme to be, in retrospect, a mistake, we could've easily achieved the same goals with uberctx: key=value format, which is also being proposed for w3c trace context. At Uber we're stuck with the uberctx-{key} format now as changing it requires upgrading client libs in 1000+ applications, which is ... well, you know. So our internal guideline is that keys can only be alphanum-snake-case (which in practice is perfectly acceptable).

It doesn't mean we cannot solve this problem in Jaeger, we could either introduce encoding for keys that are not alphanum-snake-case, or we can implement different codecs for a format that's similar to w3c.

@james-callahan
Copy link
Author

james-callahan commented May 2, 2018

Looking at the w3c spec:

Name starts with the beginning of the string or separator , and ends with the equal sign =. The contents of the name are any url encoded string that does not contain an equal sign =. Names should intuitively identify a the tracing system even if multiple systems per vendor are present.

So the baggage key space is reduced to any string that can be encoded in url encoding (which is all of them?)

  • I assume the language here is indicating that the equals sign itself should be encoded as %3D and a comma as %2C (or are they banned entirely)?
  • I'm not sure if the null byte is allowed?
    The spec doesn't say it isn't, but I wouldn't trust browsers/libraries to handle it well
  • What happens to unpaired unicode surrogates? (UTF-8 vs WTF-8)

@isaachier
Copy link

I hope the null byte isn't valid but I might have to handle that too: https://github.com/isaachier/jaeger-client-c/blob/master/src/jaegertracingc/key_value.h#L34-L37. Other than C, most languages handle that gracefully.

@daurnimator
Copy link

@isaachier one incompatbility of treating them as 8bit (minus null byte) C strings is that you would allow invalid UTF8; while e.g. javascript would need valid unicode (but allows unpaired surrogates)

@isaachier
Copy link

I have an encoding method in that code too, but this all assumes the null byte is guaranteed to terminate a string (i.e. no need to maintain length).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants