Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDP-1526 Remove parts of the cookies that are not valid according to RFC 6265 #432

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

benjben
Copy link
Contributor

@benjben benjben commented Nov 8, 2024

The issue is described in snowplow/enrich#904

Comparison between Collector 2.x and 3.x

Collector 3.2.0

json after

Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};

json before

Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;

Collector 2.10.0

json after

Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output : Cookie: wanted_cookie=crucial_value

json before

Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output : Cookie: wanted_cookie=crucial_value

Conclusion

Collector 2.x removes the invalid part while collector 3.x leaves the cookie as is.

These are the characters allowed in the RFC 6265:

scala> val allowedChars = Set(0x21.toChar) ++ Set(0x23.toChar to 0x2b.toChar: _*) ++ Set(0x2d.toChar to 0x3a.toChar: _*) ++ Set(0x3c.toChar to 0x5b.toChar: _*) ++ Set(0x5d.toChar to 0x7e.toChar: _*)
allowedChars: scala.collection.immutable.Set[Char] = Set(E, e, X, s, x, 8, *, %, 4, n, }, ., ], 9, N, j, y, T, =, Y, t, J, <, u, U, f, &, F, !, A, a, 5, m, |, M, `, ), I, i, -, @, v, G, 6, 1, V, q, Q, L, ', b, g, [, B, l, P, #, p, {, 0, ?, _, 2, C, H, +, c, W, h, (, 7, r, K, w, :, R, $, 3, k, ~, O, ^, /, D, >, Z, o, z, S, d)

The following characters of the above cookies are not part of this list: space, ", \.

Solutions

1. Update Enrich to deal with invalid cookies

As it is nearly impossible to foresee everything that could be wrong with the format of the cookies, it does not seem like a good idea.

2. Use strict parsing

At the moment, the collector doesn't parse the value of a cookie, it just leaves it as is.

If we use http4s' cookie parser, by default the relaxed parser is used, which leaves us in the same situation as we are now (I tried).

If instead we use RFC 6265 parser, then the parsing of the cookie fails:

Left(org.http4s.ParseFailure: Invalid Cookie header: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
         ^
expectation:
* must end the string)

FYI this is where the parsing of cookies happens in http4s.

3. Manually remove invalid parts

This consists in removing all the cookie's sub-parts (surrounded by ;) that contain at least a character that is not allowed by the RFC 6265.

I'm in favor of this solution. It seems safe (all cookies valid according to the RFC are left untouched) and it puts back 2.x's behavior.

Copy link

@adatzer adatzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@benjben benjben force-pushed the PDP-1526_cookie_parsing_rfc6265 branch from 6ed0a1a to 0980a5f Compare November 12, 2024 16:33
@benjben benjben merged commit e494f56 into develop Nov 12, 2024
2 of 3 checks passed
@benjben benjben deleted the PDP-1526_cookie_parsing_rfc6265 branch November 12, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants