PDP-1526 Remove parts of the cookies that are not valid according to RFC 6265 #432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The issue is described in snowplow/enrich#904
Comparison between Collector 2.x and 3.x
Collector 3.2.0
json after
Input :
Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output :
Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
json before
Input :
Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output :
Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Collector 2.10.0
json after
Input :
Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output :
Cookie: wanted_cookie=crucial_value
json before
Input :
Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output :
Cookie: wanted_cookie=crucial_value
Conclusion
Collector 2.x removes the invalid part while collector 3.x leaves the cookie as is.
These are the characters allowed in the RFC 6265:
The following characters of the above cookies are not part of this list:
space
,"
,\
.Solutions
1. Update Enrich to deal with invalid cookies
As it is nearly impossible to foresee everything that could be wrong with the format of the cookies, it does not seem like a good idea.
2. Use strict parsing
At the moment, the collector doesn't parse the value of a cookie, it just leaves it as is.
If we use
http4s
' cookie parser, by default the relaxed parser is used, which leaves us in the same situation as we are now (I tried).If instead we use RFC 6265 parser, then the parsing of the cookie fails:
FYI this is where the parsing of cookies happens in
http4s
.3. Manually remove invalid parts
This consists in removing all the cookie's sub-parts (surrounded by
;
) that contain at least a character that is not allowed by the RFC 6265.I'm in favor of this solution. It seems safe (all cookies valid according to the RFC are left untouched) and it puts back
2.x
's behavior.