-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Elide comments in bundles #2420
Conversation
fa48cac
to
fc1eb2f
Compare
packages/bundle-source/README.md
Outdated
@@ -42,6 +42,14 @@ entry instead of `main` in `package.json`, if not overridden by explicit | |||
The `development` condition additionally implies that the bundle may import | |||
`devDependencies` from the package containing the entry module. | |||
|
|||
## Comment Elision | |||
|
|||
The `--no-comment` (`-X`) flag with `--format` `endoScript` or `endoZipBase64` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a cute name but "no" sounds like opting out of behavior, not into one.
The name should convey that it's opting in. E.g. strip-comments
as is the common name for this behavior or elide-comments
as is the term used throughout the implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally worth the gag, even if lost in a fit of responsibility. I want to reserve --strip-comments
for a feature that removes the comment brackets entirely. Please push back if that is undue pedantry and I’ll thread stripComments
throughout.
* @param {import('./location-unmapper.js').LocationUnmapper} [unmapLoc] | ||
*/ | ||
export const elideComment = (node, unmapLoc) => { | ||
if (node.type === 'CommentBlock') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will also remove copyright notices. I think that should be a requirement but IANAL.
Whichever way we land the tests and docs should make it patently clear.
I think we could get it by checking for some values in the node.value before trying to replace: https://terser.org/docs/miscellaneous/
18da963
to
432cd82
Compare
}; | ||
|
||
/** | ||
* Rewrites comments by replacing their interior with an ellipsis, deleting all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect most readers will think ellipsis=…
Consider,
Rewrites comments by blanking their interior, deleting all non-newlines before the last line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the catch. I removed the ellipsis and did not update the comment.
432cd82
to
7ef5f0f
Compare
7ef5f0f
to
0a59732
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM module regular expression efficiency.
if (comment.startsWith('*')) { | ||
// Detect jsdoc style @preserve, @copyright, @license, @cc_on (IE | ||
// cconditional comments) | ||
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is unnecessarily vulnerable to ReDoS.
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test( | |
return /(?:^|\n)\s*(?:\*\s*)?@(?:preserve|copyright|license|cc_on)\b/.test( |
or (relaxing as suggested by Claude below)
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test( | |
return /(?:^|\n)[\s*]*@(?:preserve|copyright|license|cc_on)\b/.test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't consider regexp efficiency.
Some other ideas:
https://claude.site/artifacts/badb117f-f454-4e8c-bcf9-8037160acb97
https://chatgpt.com/share/e/da63d95f-f9e6-4f28-86b5-96d80be0fe37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For increased availability, the claude suggestion is:
const optimizedRegex = /^[\s*]*@(?:preserve|copyright|license|cc_on)\b/;
I can't seem to access the chatgpt suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Claude suggestion is more liberal in what comments it accepts. Claude would consider this a valid and preservation-worthy JSDoc:
/**
** @copyright
**/
Maybe this is an acceptable relaxation. The existing regex is already more liberal than strictly necessary. That is, it accepts weird jsdoc comment styles.
Can the Claude LLM be prompted that the behavior should be equivalent and do the same work in fewer cycles? For what comments would the proposed regex process the same characters more than once?
(?:^|\n)
matches the beginnings of lines, initial and non-initial\s*
consumes all leading space\*?
consumes exactly one asterisk if present. If absent the character under the cursor is neither a space nor an asterisk. This seems like the most likely matcher to cause an iterative walk-back, but I’m not sure how or why.\s*
consumes any additional space after an asterisk but should be harmless if there was no asterisk.@
leads every pragma- At some loss of legibility, the alternation of matched pragma names could avoid some walkback with a common prefix tree,
(?:c(?:c_on|opyright)|preserve|license)
but the legibility is probably a bigger win for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude Haiku is free to use at a rate limit. Fwiw, I think it's reasonable that @copyright
anywhere in the comment should be considered an intent to preserve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the relaxed suggestion above, but FWIW @kriskowal's intuition is pretty much correct. The most problematic input for the merged regular expression is lots of whitespace followed by an asterisk and then not an at sign, and I think the backtracking is exponential:
$ esbench.mjs -b1 -h 'V8' --arg i~3 \
-s '
const S = " ".repeat(10**(i + 1));
const sMatch = `${S}*@preserve`;
const sNoMatch = `${S}*X`;
const r = /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/;
' \
accept:'if(!r.test(sMatch)) throw Error("unexpected rejection");' \
reject:'if(r.test(sNoMatch)) throw Error("unexpected acceptance")'
#### V8
accept (0) 7491.209439528024 ops/ms after 31 245760-count observations
reject (0) 3480 ops/ms after 29 122880-count observations
accept (1) 4924.580152671756 ops/ms after 21 245760-count observations
reject (1) 89.9080919080919 ops/ms after 34 2647-count observations
accept (2) 1298.868 ops/ms after 34 38202-count observations
reject (2) 1.2203389830508475 ops/ms after 34 36-count observations
accept (3) 154.3412228796844 ops/ms after 34 4603-count observations
reject (3) 0.0125 ops/ms after 13 1-count observations
Closes: #2413
Description
Adds an option to the bundler to blank the interior of comments, reducing bundle sizes.
This change does not attempt to apply the elideComments behavior if the user selects noTransforms, since it is piggybacking on the censorship evasion transform. These features could be decoupled, elideComments works with endoZipBase64 and a narrower interpretation of noTransforms (no precompiled module transforms).
Security Considerations
Some care has been taken to ensure that the resulting comments produce programs with the same behavior in the event the comment must be interpreted as a newline for automatic semicolon insertion (ASI).
Scaling Considerations
None.
Documentation Considerations
Testing Considerations
Uncovered:
Compatibility Considerations
Maintains support for caches from prior versions.
Upgrade Considerations
None