Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict regexes #40

Merged
merged 6 commits into from
Jun 28, 2022
Merged

Strict regexes #40

merged 6 commits into from
Jun 28, 2022

Conversation

jgonggrijp
Copy link
Contributor

This fixes #29 by making the regular expressions in lib/util.js mutually exclusive and more accurate. By "more accurate", I mean that the patterns represent the HTML syntax better, allowing more in some places and less in other places.

I have double-checked that all tests still pass. Nevertheless, the changes to the regular expressions should be scrutinized closely. For this purpose, it is easiest to look at 338868b in isolation.

The changes in the package-lock.json can be ignored, I only upgraded it to version 2.

Optionally I could add another commit to enforce that the closing quote of the URL is identical to the opening quote (i.e., both double or both single). This would require a named capturing group: (?<quote>["']) at opening and \k<quote> elsewhere in the pattern.

I have not looked into how this change might interact with #20.

At first sight, the failing test seems to suggest that every odd
<script> has a quote stripped while every even <script> does not. This
order change reveals that it is not the order, but the content of the
src= attribute that influences the behavior. The URLs from cdnjs have
a quote stripped, the URLs from jsdelivr do not.
@OverZealous
Copy link
Owner

Thank you for your PR!

I'll be honest, I don't use this library, and don't really have any intention of maintaining it. I'm not really sure I'm comfortable accepting these changes, because I don't know for sure what the side effects will be for existing users.

I think this would actually break previous fixes. [\s\S] was chosen specifically because it matches linebreaks, while matchers like [^'"] won't.

I'm also not sure I like adding all the extra whitespace into the regular expressions. This library is really intended to be run after a minifier, so there should be no reason to have whitespace around the attributes or closing element. Adding that opens the door to overmatching, and I'd prefer the library to remain simpler.

As a final note, before I would accept the PR there are a couple things would need to be done:

  1. Combine the various exploration commits into a single commit.
  2. Remove package.json from the commits, since they are not part of the changes.
  3. Instead of changing the tests, add new ones that also pass. Otherwise, this could be introducing incompatibilities.

(To be totally honest, this library should use an in-memory DOM parser and walk the tree, using regular expressions is a hack that only really works because this is intended for a fairly simplified use case.)

@jgonggrijp
Copy link
Contributor Author

I'll be honest, I don't use this library, and don't really have any intention of maintaining it. I'm not really sure I'm comfortable accepting these changes, because I don't know for sure what the side effects will be for existing users.

In that case, let's discuss the matter before I put more work into the branch.

I think this would actually break previous fixes. [\s\S] was chosen specifically because it matches linebreaks, while matchers like [^'"] won't.

[^"'] matches everything, including linebreaks, except for quotes. So the only change here is that the URL part is no longer matching quotes, which is the reason why it can now be eager instead of lazy.

I'm also not sure I like adding all the extra whitespace into the regular expressions. This library is really intended to be run after a minifier, so there should be no reason to have whitespace around the attributes or closing element.

Well, I'm using it without a minifier.

Adding that opens the door to overmatching, and I'd prefer the library to remain simpler.

Please explain to me how this could lead to overmatching.

As a final note, before I would accept the PR there are a couple things would need to be done:

  1. Combine the various exploration commits into a single commit.

By my count, there is only one exploration commit: "Experimentally change order in #29 test fixture" (c863688). I don't think squashing that commit into any of the other commits would be a sensible combination, but if you insist I will comply with whatever rebase you have in mind. I can also drop the commit, because its change is not essential. I just left it in because it is informative.

  1. Remove package.json from the commits, since they are not part of the changes.

That's package-lock.json, which I'm just upgrading to a newer version (the version that comes with npm v7 and later). I can drop that commit as well, but the first next contributor who runs npm install will upgrade it as a side effect again.

  1. Instead of changing the tests, add new ones that also pass. Otherwise, this could be introducing incompatibilities.

May I remind you that this is a test that you wrote because of #29, based on my report of the bug, and that the version originally written by you did not actually reproduce the bug, which you were aware of at the time? I only removed the leading slashes in order to reproduce the bug, which I could afford because I was going to fix it.

That said, if you want to retain a copy of the original test with leading slashes, I will comply.

(To be totally honest, this library should use an in-memory DOM parser and walk the tree, using regular expressions is a hack that only really works because this is intended for a fairly simplified use case.)

Granted. We already discussed this in #29.

@OverZealous
Copy link
Owner

It's been 4 years since #29, so please forgive me if I don't remember the details—there's been a few things that have happened in since then. I didn't remember discussing this at all.

I can just accept your PR as-is, I guess I don't really care that much.

As I said, this isn't a library I care to maintain too much, since I don't use it at all. Please understand that accepting a PR means that I'm implicitly taking responsibility for those code changes. If they break something, then it comes back on me to fix it or reverse the changes. So I tend to be hesitant to accept PRs in general.

@OverZealous OverZealous merged commit 36dbce5 into OverZealous:master Jun 28, 2022
@OverZealous
Copy link
Owner

Published as v3.3.0

@jgonggrijp jgonggrijp deleted the strict-regexes branch June 28, 2022 15:40
@jgonggrijp
Copy link
Contributor Author

It's been 4 years since #29, so please forgive me if I don't remember the details—there's been a few things that have happened in since then. I didn't remember discussing this at all.

I realized that, that's why I reminded you of it.

I can just accept your PR as-is, I guess I don't really care that much.

As I said, this isn't a library I care to maintain too much, since I don't use it at all.

Out of curiosity: why don't you use it anymore? What has changed in your workflow?

If you no longer wish to maintain cdnizer, perhaps consider transferring the burden to somebody else.

Please understand that accepting a PR means that I'm implicitly taking responsibility for those code changes. If they break something, then it comes back on me to fix it or reverse the changes. So I tend to be hesitant to accept PRs in general.

Being the maintainer of a couple of libraries and tools myself, I understand the responsibility. Thanks for merging and releasing the changes anyway.

If complaints come in because of this change, I am willing to contribute a fix again. Please feel free to mention me.

@OverZealous
Copy link
Owner

Out of curiosity: why don't you use it anymore? What has changed in your workflow?

I don't like relying on externally served files in production. It's too easy to have something outside of your control go wrong, breaking the app. As long as you can get to my server, everything you need will load. I prefer to include all necessary scripts in my own bundles hosted on my own server. They are hashed and served with a very long cache, so in general the penalty is very low.

Plus with JS modules these days, it's less often I want to serve a full library—better to only import the exact code necessary.

If you no longer wish to maintain cdnizer, perhaps consider transferring the burden to somebody else.

I thought of this. Or at least adding additional maintainers. Maybe at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cdnizer 2.0.0 appears to apply quotes randomly
2 participants