Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: URL-encode document names in document index client requests #847

Merged
merged 1 commit into from
May 21, 2024

Conversation

Michael-JB
Copy link
Collaborator

@Michael-JB Michael-JB commented May 18, 2024

Description

Prior to this commit, we did not URL-encode document names when querying
the document index. This caused problems when names have special
characters, i.e., '/' or '?'. We do not need to URL-encode other parts
of the URL (e.g., namespace or collection) as these are already
constrained to URL-safe characters. In addition to this fix, this commit
includes the following changes:

  • update DocumentPath.from_slash_separated_str to correctly delimit
    paths whose document name contains a forward slash. Ideally we'd
    remove both this and DocumentPath.to_slash_separated_str in the
    future.
  • parametrize tests to include special characters and increase coverage
  • migrate to new document index collection to avoid CI conflicts

Before Merging

  • Review the code changes
    • Unused print / comments / TODOs
    • Missing docstrings for functions that should have them
    • Consistent variable names
    • ...
  • Update changelog.md if necessary
  • Commit messages should contain a semantic label and the ticket number
    • Consider squashing if this is not the case

Prior to this commit, we did not URL-encode document names when querying
the document index. This caused problems when names have special
characters, i.e., '/' or '?'. We do not need to URL-encode other parts
of the URL (e.g., namespace or collection) as these are already
constrained to URL-safe characters. In addition to this fix, this commit
includes the following changes:
- update `DocumentPath.from_slash_separated_str` to correctly delimit
  paths whose document name contains a forward slash. Ideally we'd
  remove both this and `DocumentPath.to_slash_separated_str` in the
  future.
- parametrize tests to include special characters and increase coverage
- migrate to new document index collection to avoid CI conflicts
@NiklasKoehneckeAA
Copy link
Contributor

If we want to remove the DocumentPath.from_slash_separated_str, we could raise a deprecated warning in this

@Michael-JB
Copy link
Collaborator Author

Michael-JB commented May 21, 2024

If we want to remove the DocumentPath.from_slash_separated_str, we could raise a deprecated warning in this

I only suggest removing them because a slash is a valid character in a document name, which makes unescaped slashes a bad choice for delimiting.

I'm not sure what the use-case for from_slash_separated_str is, and I imagineto_slash_separated_stris only really useful for logging purposes. We don't use them internally either, hence my suggestion to remove them. What do you think?

Edit: in either case, I think it would make sense to do this in a separate PR.

@Michael-JB Michael-JB merged commit 3223dd8 into main May 21, 2024
4 checks passed
@Michael-JB Michael-JB deleted the DI-71-fix-document-name-url-encoding branch May 21, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants