feat: add `join` method to `Url` class #1378

Meetesh-Saini · 2024-07-29T23:05:27Z

added support for URL path joining with optional trailing slashes and multiple arguments.

Change Summary

This PR implements a feature based on pydantic/pydantic#9794 to join URL path into the base URL. It uses the join method from the url crate.

Related issue number

fix pydantic/pydantic#9794

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

- added support for URL path joining with optional trailing slashes and multiple arguments.

codecov · 2024-07-29T23:10:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.10%. Comparing base (ab503cb) to head (6a4fa06).
Report is 207 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1378      +/-   ##
==========================================
- Coverage   90.21%   89.10%   -1.11%     
==========================================
  Files         106      112       +6     
  Lines       16339    17892    +1553     
  Branches       36       40       +4     
==========================================
+ Hits        14740    15943    +1203     
- Misses       1592     1929     +337     
- Partials        7       20      +13

Files with missing lines	Coverage Δ
src/url.rs	`98.43% <100.00%> (+0.11%)`	⬆️

... and 52 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b21b0f...6a4fa06. Read the comment docs.

codspeed-hq · 2024-07-29T23:12:43Z

CodSpeed Performance Report

Merging #1378 will not alter performance

_{Comparing Meetesh-Saini:dev-url-join (6a4fa06) with main (9b21b0f)}

Summary

✅ 155 untouched benchmarks

davidhewitt

Thanks for the PR! The signature looks fine, though I think the implementation can be simpler.

src/url.rs

davidhewitt

I think this just needs test cases and then I would be happy to see this merged. Sorry for the long delay 😬

- Refactor URL join function for better handling of relative paths - Add tests for joining URLs with and without trailing slashes - Cover various edge cases in line with URL specification that the previous function would fail to handle

Meetesh-Saini · 2024-09-28T17:50:07Z

I've added tests and changed the implementation. Now, it only takes one argument instead of multiple.
I noticed that using multiple arguments can be confusing, especially when trailing_slash=False. Keeping it to one argument makes things clearer, similar to how servo-url and urllib.urljoin work.
For example, the user may expect either of the following. I wasn't sure which one to implement.

            a.join("e", "?q=1", "g", trailing_slash=False) 
                    │     │      │                         
                    └──┬──┘      │                         
                       ▼         │                         
              ┌───► "e?q=1"      │                         
              │        │         │                         
    No trailing slash  └────┬────┘                         
                            ▼                              
                           "g"                             
                                                       
                                                                                       
         a.join("e", "?q=1", "g", trailing_slash=False)
                 │     │      │                        
                 └──┬──┘      │                        
                    ▼         │                        
                 "e/?q=1"     │                        
                    │         │                        
                    └────┬────┘                        
                         ▼                             
                       "e/g"

davidhewitt

Sorry for yet another long review cycle. I would like to just agree what the right default for trailing_slash is, and have a suggestion to avoid the cannot_be_a_base() restriction.

python/pydantic_core/_pydantic_core.pyi

src/url.rs

davidhewitt

Thanks for the many iterations here and slow reviews by me. I am happy with the design here now. I would like to see the __floordiv__ operator removed (see comment below), and then let's merge 👍

davidhewitt · 2024-10-29T11:57:37Z

src/url.rs

+    fn __truediv__(&self, other: &str) -> PyResult<Self> {
+        self.join(other, true)
+    }
+
+    fn __floordiv__(&self, other: &str) -> PyResult<Self> {
+        self.join(other, false)
+    }


Ok, sorry I missed these in the last round of review. I think the difference between the / and // operators here is subtle and hard to document.

I think better we just have /, and make it so that it matches the default of append_trailing_slash=False. This will also simplify testing, I think.

Suggested change

fn __truediv__(&self, other: &str) -> PyResult<Self> {

self.join(other, true)

}

fn __floordiv__(&self, other: &str) -> PyResult<Self> {

self.join(other, false)

}

fn __truediv__(&self, other: &str) -> PyResult<Self> {

self.join(other, false)

}

Okay __floordiv__ can be removed but I feel the __truediv__ should have append_trailing_slash=True because this overloaded operator would likely be used to join multiple paths in shorter code. This behaviour would feel familiar to Python users, as it resembles pathlib's path joining.
For example,

a = Url("http://a") print(a / "b" / "c" / "d") # http://a/b/c/d/

a = Url("file:///home/user/") print(a / "music" / "pop") # file:///home/user/music/pop/

With append_trailing_slash=False it would instead result in http://a/d and file:///home/user/pop which I think is not what the user would expect.
I chose to add __floordiv__ too because it would simplify adding files at the end.

print(a / "dir" / "dir" / "dir" // "file.txt") # file:///home/user/dir/dir/dir/file.txt

Oh, I see. Yikes, there are so many subleties here!

It seems to me that our .join() method really works like urllib.parse.urljoin when it comes to semantics, e.g.

>>> urllib.parse.urljoin("https://foo.com/a", "b") 'https://foo.com/b'

versus pathlib's

>>> pathlib.Path("/foo/a").joinpath("b") PosixPath('/foo/a/b')

Given these are inconsistent, I think we should perhaps back away from trying to have pathlib-like semantics at all.

Would you be open to the idea of dropping the operators from the PR completely, so we can get .join() merged? We could then open a pydantic issue to discuss the design of the operators and move forward with an implementation when there's consensus?

Alternatively we could also have joinpath() which works like Pathlib and doesn't accept query string or fragments as the whole input?

And then could have / operator work like joinpath? 🤔

joinpath() would certainly make things cleaner. Should I implement joinpath() in this PR, or should we drop the operators for now and discuss it in the issues instead?

Great question. I think I'd prefer we just had .join() here and worried about .joinpath() and the operators later. That said, there's potentially a desire to agree a sketch of the follow ups here. @pydantic/oss - any ideas?

I think without comment from anyone else, let's just do .join() here and then follow-up with an issue in the main pydantic repo where we can discuss .joinpath() and operators.

I think it would be great to have more time to discuss the semantics (does it need to match urllib? What about other libraries like furl? Should be double check with the current RFCs? We should also check what was said in this discussion).

sydney-runkle · 2024-11-21T15:20:38Z

I think it would be great to have more time to discuss the semantics (does it need to match urllib? What about other libraries like furl? Should be double check with the current RFCs? We should also check what was said in this discussion).

Let's do this before moving forward

feat: add join method to Url class

a7d9351

- added support for URL path joining with optional trailing slashes and multiple arguments.

davidhewitt reviewed Jul 30, 2024

View reviewed changes

src/url.rs Show resolved Hide resolved

davidhewitt reviewed Jul 31, 2024

View reviewed changes

src/url.rs Outdated Show resolved Hide resolved

davidhewitt reviewed Sep 20, 2024

View reviewed changes

Meetesh-Saini and others added 2 commits September 28, 2024 01:25

Merge branch 'pydantic:main' into dev-url-join

7ef57ba

Meetesh-Saini requested a review from davidhewitt October 10, 2024 19:37

davidhewitt marked this pull request as ready for review October 21, 2024 11:53

davidhewitt reviewed Oct 21, 2024

View reviewed changes

python/pydantic_core/_pydantic_core.pyi Outdated Show resolved Hide resolved

src/url.rs Outdated Show resolved Hide resolved

Meetesh-Saini and others added 2 commits October 25, 2024 17:24

refactor: update url join method implementation and function signature

8b70975

Merge branch 'pydantic:main' into dev-url-join

6a4fa06

davidhewitt approved these changes Oct 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `join` method to `Url` class #1378

feat: add `join` method to `Url` class #1378

Meetesh-Saini commented Jul 29, 2024

codecov bot commented Jul 29, 2024 •

edited

Loading

codspeed-hq bot commented Jul 29, 2024 •

edited

Loading

davidhewitt left a comment

davidhewitt left a comment

Meetesh-Saini commented Sep 28, 2024

davidhewitt left a comment

davidhewitt left a comment

davidhewitt Oct 29, 2024

Meetesh-Saini Oct 29, 2024

davidhewitt Oct 29, 2024

davidhewitt Oct 29, 2024

Meetesh-Saini Oct 29, 2024

davidhewitt Oct 29, 2024

davidhewitt Oct 30, 2024

Viicos Oct 30, 2024

sydney-runkle commented Nov 21, 2024

feat: add join method to Url class #1378

Are you sure you want to change the base?

feat: add join method to Url class #1378

Conversation

Meetesh-Saini commented Jul 29, 2024

Change Summary

Related issue number

Checklist

codecov bot commented Jul 29, 2024 • edited Loading

Codecov Report

codspeed-hq bot commented Jul 29, 2024 • edited Loading

Merging #1378 will not alter performance

Summary

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

Meetesh-Saini commented Sep 28, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sydney-runkle commented Nov 21, 2024

feat: add `join` method to `Url` class #1378

feat: add `join` method to `Url` class #1378

codecov bot commented Jul 29, 2024 •

edited

Loading

codspeed-hq bot commented Jul 29, 2024 •

edited

Loading