Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_url: allow urls with parameters #92

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tmorrell
Copy link
Contributor

@tmorrell tmorrell commented Oct 24, 2023

❤️ Thank you for your contribution!

Description

Software heritage uses urls with parameters, like https://archive.softwareheritage.org/swh:1:dir:44ac666e75004dd2a27ca0e09e73aecc0e8b426f;origin=https://github.com/amykwebster/MIPseq_2021;visit=swh:1:snp:46bcbe75339295c3d68fcf1257022aae8e6bee40;anchor=swh:1:rev:27839dcc9ef1587086be195349310fb70fbfcaf1 . However the idutils is_url function will reject these as not being a valid url.

This PR makes the is_url function more permissive. Software heritage ids and related urls are already in idutils, so it's a bit odd for idutlils to say that they aren't urls.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Third-party code

If you've added third-party code (copy/pasted or new dependencies), please reach out to an architect.

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

Copy link
Member

@slint slint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One test case since the given example could be ambiguous in later iterations.

def test_url():
"""Test URL validation."""
assert idutils.is_url(
"https://archive.softwareheritage.org/swh:1:dir:44ac666e75004dd2a27ca0e09e73aecc0e8b426f;origin=https://github.com/amykwebster/MIPseq_2021;visit=swh:1:snp:46bcbe75339295c3d68fcf1257022aae8e6bee40;anchor=swh:1:rev:27839dcc9ef1587086be195349310fb70fbfcaf1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an example above in the big list of sample identifiers variable? I have a feeling that this should eventually also be identified as a swh identifier.

We can do this later of course by including the http(s)://archive.softwareheritage.org part in the regex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swh was actually already in the identifiers list, but with a url that would have failed the validation check. I expanded the test so all the urls in the identifiers variable are now checked.

@moranegg
Copy link

moranegg commented Jun 3, 2024

It is useful to link to software directly with a SWHID.
SWHID with parameters are more useful from the core SWHID without parameters.

#OR2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage 📥
Development

Successfully merging this pull request may close these issues.

3 participants