Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookie Extractor misses cookies behind JSON stringified cookie values #904

Open
rowolff opened this issue Oct 25, 2024 · 4 comments
Open

Comments

@rowolff
Copy link

rowolff commented Oct 25, 2024

Project:
Stream Enrich

Version:
5.0.0

Expected behavior:

  • Given the cookies AS_JSON={\"Key\":"Value"}; wanted_cookie=crucial_value;
  • When wanted_cookie is configured in the cookie extractor enrichment
  • Then the wanted cookie is available in derived_contexts

Actual behavior:

The extraction only works if there's no stringified JSON in front of the wanted cookie:

  • works: Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":"Value"};
  • breaks: Cookie: AS_JSON={\"Key\":"Value"}; wanted_cookie=crucial_value;

Steps to reproduce:

  1. Configure cookie extractor with at least one cookie name
  2. Create a request where the wanted cookie is behind a stringified JSON cookie value (example above)

Example:
I reproduced this with Snowplow Micro in this repository: https://github.com/rowolff/snowplow-micro-debugging/

Additional info:

  • This behaviour is observed with Stream Collector 3.2.0/Enrich 5.0.0
  • The bug does not happen with Stream Collector 2.9.1/Enrich 5.0.0

We noticed the bug while upgrading our components. We were running with Collector 2.9.1/Enrich 5.0.0 for a while and then jumped to Collector 3.2.0/Enrich 5.0.0 when we suddenly noticed the issue. Hope this helps.

@miike
Copy link
Contributor

miike commented Oct 28, 2024

This is most likely a change between 2.x and 3.x of moving from akka-http to http4s (which is far stricter compared to akka which is relatively lax most of the time).

Unfortunately the behaviour is undesired but I think it is likely correct (having it work with changing ordering is unusual though) as the backslash value in your cookie is forbidden in the standard spec as per RFC 6265 where cookie octet must be

 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

As a result the recommendation (for cross browser compatibility) is to base64 encode anything where you expected disallowed characters to occur.

   To maximize compatibility with user agents, servers that wish to
   store arbitrary data in a cookie-value SHOULD encode that data, for
   example, using Base64 [RFC4648].

@rowolff
Copy link
Author

rowolff commented Oct 28, 2024

Hi @miike - thank you for the quick response and the awesome detective work. I'll check with my team if and what we can do about it. Some JSON strings come from 3rd party tools and we're not in control of how they are formatted, so it might take some time to resolve that.

@miike
Copy link
Contributor

miike commented Oct 28, 2024

No worries. I can see how third party cookies could definitely be problematic and difficult to modify (or get encoded correctly).

There may be some good news in that it looks like this is by no means the first time folks have run into this issue with http4s and as a result there is a PR that adds a "RelaxedCookies" mode - and the test seem to include some JSON. I haven't tested this as I'm assuming it's an issue with the collector rather than enrich - but that seems a reasonable bet if the same version of enrich demonstrates different behaviour between 2.9.1 and 3.2.0.

I've raised this with the engineering team to have a closer look and see what we might be able to do - thank you for flagging this one!

@benjben
Copy link
Contributor

benjben commented Nov 7, 2024

Thanks a lot for providing all the details and the scripts @rowolff !

I compared the difference in the outputs between Collector 3.2.0 and 2.10.0, here are the results :

Collector 3.2.0

json after

Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};

Output : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};

json before

Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;

Output : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;

Collector 2.10.0

json after

Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};

Output : Cookie: wanted_cookie=crucial_value

json before

Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;

Output : Cookie: wanted_cookie=crucial_value

Conclusion

Collector 2.x was removing the JSON from the cookie, probably because it was not respecting the RFC. We'll see internally if we put this behavior back in 3.x or if we update Enrich to work with the JSON, we'll let you know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants