Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not decode %2F in path #14

Closed
wants to merge 3 commits into from
Closed

Do not decode %2F in path #14

wants to merge 3 commits into from

Conversation

trowski
Copy link
Member

@trowski trowski commented Jul 3, 2024

This reverts #4. I believe https://datatracker.ietf.org/doc/html/rfc3986.html#section-2.4 was mis-interpreted. We should not be decoding the path until after matching, otherwise "/" is interpreted as separating path segments.

See https://chat.stackoverflow.com/transcript/message/57474623#57474623.

@trowski trowski requested a review from kelunik July 3, 2024 22:50
@bwoebi
Copy link
Member

bwoebi commented Jul 3, 2024

I think both #4 and this PR are not right. You want to match against the decoded URI, except for the path delimiters (but just because the router has no capability to represent a non-path delimiting slash, but that's okay).

A route /foo?/bar should be matching /foo%3F/{something}. A route /foo~/bar should be matching both /foo~/{something}, just like a route /foo%7E/bar does too. A route /bööh should be matching /b%C3%B6%C3%B6h. You really don't want to have to put percents in your bare routes... That would be surprising.
A route /{path} should match /foo%2Fbar, with path containing foo/bar later.

In particular, quoting the very section you're linking:

For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"~" without changing its interpretation.

Put it another way: a route should be canonical after splitting by delimiters and percent decoding.

Very roughly, pseudo-code:

split_path := path.split("/")
for ref part in split_path
   part := decode(part)
for route_parts in all_routes with as many parts as split_path has:
  for i..part.len()
    if not part_matches(route_parts[i], split_path)
      check next route
  return success(route_parts)
return no_match

I think the proper implementation would be replacing any %2F by %252F, then rawurldecode(), then afterwards replace any %2F in the matches by / (you can skip that step if no %2F was replaced in the initial step). [I guess, incidentally that would allow matching non-part delimiting slashes by putting a %2F into the route. But that's an implementation detail :-D]

@trowski
Copy link
Member Author

trowski commented Jul 3, 2024

Any iterative approach to path matching is pretty much a non-starter. That's why fast-route, etc. exist.

Perhaps instead we need to simply not decode %2F and decode all other characters in the path, since "/" is the only character which has meaning in a path.

EDIT: @bwoebi I see you suggested essentially the same thing in your edit. 😄
Replacing with \0 seems like a nice hack, though I suppose technically the URL could already have contained %00, so is maybe not a good idea.

@bwoebi
Copy link
Member

bwoebi commented Jul 3, 2024

@trowski Yeah, replacing by %252F is safer. (I edited while you were writing)

And yeah, for sure, we can't do an iterative approach for performance, it was just pseudocode :-)

@trowski trowski force-pushed the remove-decoding branch from 56f4ade to a9be25b Compare July 3, 2024 23:35
@trowski trowski changed the title Do not decode path before matching Do not decode %2F in path Jul 3, 2024
@trowski trowski force-pushed the remove-decoding branch from a9be25b to 5655ecb Compare July 3, 2024 23:36
@trowski
Copy link
Member Author

trowski commented Jul 3, 2024

@bwoebi Implemented with a simple call to str_ireplace. Does that look correct?

@bwoebi
Copy link
Member

bwoebi commented Jul 3, 2024

You should do another str_replace after matching on the actual matches if the str_ireplace changed the string size of $path.

@trowski
Copy link
Member Author

trowski commented Jul 4, 2024

Good call. Updated.

Copy link
Member

@bwoebi bwoebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me now :-)

@kelunik
Copy link
Member

kelunik commented Jul 4, 2024

The approach is still flawed, because %252F in the original URL is now decoded to / instead of %2F.

Copy link
Member

@kelunik kelunik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about design flaw.

@trowski
Copy link
Member Author

trowski commented Jul 4, 2024

I couldn't sleep so tried an idea of double-encoding any existing %252F in the path, which sort of works but then doesn't match "%2F" if provided in the route string.

Perhaps then we do need to go back to my original proposal to not decode before path matching. Unfortunately that would be a BC break, requiring a new major. Luckily that's not really a problem for this library if it's necessary.

@trowski
Copy link
Member Author

trowski commented Jul 4, 2024

Closing in favor of #15, as I think any string manipulation we attempt to do before matching is going to have potential collisions with the contents of the path before that manipulation.

@trowski trowski closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants