Do not decode %2F in path #14

trowski · 2024-07-03T22:50:45Z

This reverts #4. I believe https://datatracker.ietf.org/doc/html/rfc3986.html#section-2.4 was mis-interpreted. We should not be decoding the path until after matching, otherwise "/" is interpreted as separating path segments.

See https://chat.stackoverflow.com/transcript/message/57474623#57474623.

bwoebi · 2024-07-03T23:10:38Z

I think both #4 and this PR are not right. You want to match against the decoded URI, except for the path delimiters (but just because the router has no capability to represent a non-path delimiting slash, but that's okay).

A route /foo?/bar should be matching /foo%3F/{something}. A route /foo~/bar should be matching both /foo~/{something}, just like a route /foo%7E/bar does too. A route /bööh should be matching /b%C3%B6%C3%B6h. You really don't want to have to put percents in your bare routes... That would be surprising.
A route /{path} should match /foo%2Fbar, with path containing foo/bar later.

In particular, quoting the very section you're linking:

For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"~" without changing its interpretation.

Put it another way: a route should be canonical after splitting by delimiters and percent decoding.

Very roughly, pseudo-code:

split_path := path.split("/")
for ref part in split_path
   part := decode(part)
for route_parts in all_routes with as many parts as split_path has:
  for i..part.len()
    if not part_matches(route_parts[i], split_path)
      check next route
  return success(route_parts)
return no_match

I think the proper implementation would be replacing any %2F by %252F, then rawurldecode(), then afterwards replace any %2F in the matches by / (you can skip that step if no %2F was replaced in the initial step). [I guess, incidentally that would allow matching non-part delimiting slashes by putting a %2F into the route. But that's an implementation detail :-D]

trowski · 2024-07-03T23:26:10Z

Any iterative approach to path matching is pretty much a non-starter. That's why fast-route, etc. exist.

Perhaps instead we need to simply not decode %2F and decode all other characters in the path, since "/" is the only character which has meaning in a path.

EDIT: @bwoebi I see you suggested essentially the same thing in your edit. 😄
Replacing with \0 seems like a nice hack, though I suppose technically the URL could already have contained %00, so is maybe not a good idea.

bwoebi · 2024-07-03T23:29:20Z

@trowski Yeah, replacing by %252F is safer. (I edited while you were writing)

And yeah, for sure, we can't do an iterative approach for performance, it was just pseudocode :-)

trowski · 2024-07-03T23:37:05Z

@bwoebi Implemented with a simple call to str_ireplace. Does that look correct?

bwoebi · 2024-07-03T23:44:35Z

You should do another str_replace after matching on the actual matches if the str_ireplace changed the string size of $path.

trowski · 2024-07-04T00:06:28Z

Good call. Updated.

bwoebi

Looks right to me now :-)

kelunik · 2024-07-04T05:34:06Z

The approach is still flawed, because %252F in the original URL is now decoded to / instead of %2F.

kelunik

See comment about design flaw.

trowski · 2024-07-04T06:56:38Z

I couldn't sleep so tried an idea of double-encoding any existing %252F in the path, which sort of works but then doesn't match "%2F" if provided in the route string.

Perhaps then we do need to go back to my original proposal to not decode before path matching. Unfortunately that would be a BC break, requiring a new major. Luckily that's not really a problem for this library if it's necessary.

trowski · 2024-07-04T15:01:11Z

Closing in favor of #15, as I think any string manipulation we attempt to do before matching is going to have potential collisions with the contents of the path before that manipulation.

trowski requested a review from kelunik July 3, 2024 22:50

trowski force-pushed the remove-decoding branch from 56f4ade to a9be25b Compare July 3, 2024 23:35

trowski changed the title ~~Do not decode path before matching~~ Do not decode %2F in path Jul 3, 2024

Do not decode %2F in path

5655ecb

trowski force-pushed the remove-decoding branch from a9be25b to 5655ecb Compare July 3, 2024 23:36

Replace %2F in args with /

20e2cd0

bwoebi approved these changes Jul 4, 2024

View reviewed changes

kelunik requested changes Jul 4, 2024

View reviewed changes

Handle %252F in path

e810542

trowski closed this Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not decode %2F in path #14

Do not decode %2F in path #14

trowski commented Jul 3, 2024

bwoebi commented Jul 3, 2024 •

edited

Loading

trowski commented Jul 3, 2024 •

edited

Loading

bwoebi commented Jul 3, 2024 •

edited

Loading

trowski commented Jul 3, 2024

bwoebi commented Jul 3, 2024

trowski commented Jul 4, 2024

bwoebi left a comment

kelunik commented Jul 4, 2024

kelunik left a comment

trowski commented Jul 4, 2024

trowski commented Jul 4, 2024

Do not decode %2F in path #14

Do not decode %2F in path #14

Conversation

trowski commented Jul 3, 2024

bwoebi commented Jul 3, 2024 • edited Loading

trowski commented Jul 3, 2024 • edited Loading

bwoebi commented Jul 3, 2024 • edited Loading

trowski commented Jul 3, 2024

bwoebi commented Jul 3, 2024

trowski commented Jul 4, 2024

bwoebi left a comment

Choose a reason for hiding this comment

kelunik commented Jul 4, 2024

kelunik left a comment

Choose a reason for hiding this comment

trowski commented Jul 4, 2024

trowski commented Jul 4, 2024

bwoebi commented Jul 3, 2024 •

edited

Loading

trowski commented Jul 3, 2024 •

edited

Loading

bwoebi commented Jul 3, 2024 •

edited

Loading