Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine validation rule for HTAN filenames #294

Closed
adamjtaylor opened this issue Sep 14, 2023 · 3 comments · Fixed by #314
Closed

Refine validation rule for HTAN filenames #294

adamjtaylor opened this issue Sep 14, 2023 · 3 comments · Fixed by #314
Assignees
Labels
effort-mid This one needs your brain

Comments

@adamjtaylor
Copy link
Contributor

From Regex 101

/
^(([a-zA-Z0-9_-.+(') ]+/)[a-zA-Z0-9_-.+(')]+)$
/
gm
^ asserts position at start of a line
1st Capturing Group (([a-zA-Z0-9_-.+(') ]+/)
[a-zA-Z0-9_-.+(')]+)
2nd Capturing Group ([a-zA-Z0-9_-.+(') ]+/)*

  • matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
    Match a single character present in the list below [a-zA-Z0-9_-.+(') ]
  • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
    A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
    0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
    _ matches the character _ with index 9510 (5F16 or 1378) literally (case sensitive)
    - matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
    . matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
    + matches the character + with index 4310 (2B16 or 538) literally (case sensitive)
    ( matches the character ( with index 4010 (2816 or 508) literally (case sensitive)
    ' matches the character ' with index 3910 (2716 or 478) literally (case sensitive)
    ) matches the character ) with index 4110 (2916 or 518) literally (case sensitive)
    matches the character with index 3210 (2016 or 408) literally (case sensitive)
    / matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
    Match a single character present in the list below [a-zA-Z0-9_-.+(')]
  • matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
    a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
    A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
    0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
    _ matches the character _ with index 9510 (5F16 or 1378) literally (case sensitive)
    - matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
    . matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
    + matches the character + with index 4310 (2B16 or 538) literally (case sensitive)
    ( matches the character ( with index 4010 (2816 or 508) literally (case sensitive)
    ') matches a single character in the list ') (case sensitive)
    $ asserts position at the end of a line
@adamjtaylor
Copy link
Contributor Author

Need to confirm synapse and s3 valid characters and list here.

@aclayton555 aclayton555 added the effort-mid This one needs your brain label Oct 2, 2023
@adamjtaylor adamjtaylor linked a pull request Oct 10, 2023 that will close this issue
@adamjtaylor adamjtaylor removed a link to a pull request Oct 24, 2023
@adamjtaylor
Copy link
Contributor Author

From S3 docs

The following character sets are generally safe for use in key names.

Alphanumeric characters 0-9a-zA-Z
Special characters Exclamation point (!)Hyphen (-)Underscore (_)Period (.)Asterisk (*)Single quote (')Open parenthesis (()Close parenthesis ())

@adamjtaylor
Copy link
Contributor Author

Some minimal testing with regex101

Screenshot 2023-10-24 at 11 57 10 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort-mid This one needs your brain
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants