-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect coloring for html comment across multiple lines #63
Comments
This is a shortcoming of the syntax colouring library rather than VS Code Printing. If you can give me a sample that demonstrates the problem I'll be happy to log a bug with the syntax colouring project. |
Below is an html sample together with the PDF printout that demonstrate the problem. https://drive.google.com/drive/folders/1SuTVZryAV9MVRucQ3zmbocVW_Od7WVet?usp=sharing |
Sorry, my fault, I have overlooked that you meant "over multiple lines", which also triggers the same no-syntax problem as meantioned above (the wrong HTML comment). I've made a JSFiddle to first see what happens when using "highlight.js" and if this is a bug related to highlight.js: The output there seems fine to me - except the fact that it won't show HTML comments - even when HTML is set as class. But when used through the plugin, it shows this (which looks like a bug inbetween somewhere): VSC Input VS Extension Output: |
There are two modes, manual and auto. I use manual whenever VSCode has a definite langId, and for an HTML file that is the case. Manual mode basically means I don't let highlight.js guess, I explicitly tell it "this is HTML" But that doesn't explain the behaviour you're seeing so I suppose I'll have to spin it up in the debugger and seen exactly what's being sent to highlight.js |
Peter I've also tried to do it with Prism, but I've ran into similar troubles. I've already opened a ticket: You can see my tryouts in this JSFiddle here (now it works, but it is a quirky workaroud, since comments in HTML will require a plugin, which in my opinion is not the best way of having to deal with): Again, see my comments in here: At least the multi-line comments worked for Javascript and CSS. |
Digging into the rabbithole ... Even this nice looking approach won't work as expected: Issue has been submitted, but I guess he might not answer, seems like a dead git repo to me: ... to be continued ... |
Without wishing to rain on your parade, highlight.js is probably as good as it gets. Stackoverflow just switched to using it. This is why, despite otherwise being a bit perfectionist about vscode-printing, I think we should let the open-source division of problems thing work for us. Gather your evidence and just log a case with the highlight.js people. We can't do everything and they know their codebase. |
Finally I got some feedback for rainbow.js: This works PERFECTLY NOW: If you want to include all you can use this command here for a build: That might also address potential performance issues ... |
To bundle Craig told me this:
|
So Peter, as thought, don't expect 'help' from the guys from highlight.js, as seen in the comment here: It seems to me they have the very same attitude than we're well aware of the superior VSC forum feedback in our case. Since I've made that research for various reasons so comprehensive, I'll personally stick with rainbow.js since it not only seems the most reliable code syntax highlighter on the market right now, it also has the nicest and most helpful developer feedback I've encountered so far! Hope you can find a solution which finally works - give it a try! |
The answer came promptly after telling them that it obviously works in rainbow.js, so this is how it "only-works" in highlight.js: That said, I still prefer the rainbow.js solution, since you do not need to "mask" anything to make it work, since I don't get the whole "I am concerned about security issue" in this specific regard: @edit: since I've had a really good chat with josh about this whole security issue, I see the problem now which might occur when you're using unmasked code. |
@Neohiro79 FYI: This is the attitude I picked up on that very slightly miffed me. :-) Just to explain here we do not highlight unescaped HTML code because it is a HUGE security risk and a very bad idea. This is a major feature, not a bug. If you want HTML code to be treated as "text" and highlighted, it must be properly escaped. Otherwise google "html injection", "javascript injection", etc... IE, correct: <code>
<code>this is a code tag to be highlithed</code>
</code> Incorrect: <code>
<code>hey code here!!!</code>
</code> I personally believe Rainbow is buggy in this regard or at the very least encouraging/enabling people to make some very poor security choices. I've explained this as clearly as I can to the author and we'll see what they decide. :-) With a very small change to the library they could close this security hole and prevent people from shooting themselves in their own foot. |
@PeterWone I'm not sure if this affects your plugin or not... if you are asking us [Highlight.js] to highlight code inside a block you need that code to be properly HTML escaped (which should be pretty trivial to do with JS). The block should not include RAW unescaped HTML - which we are NOT going to highlight. |
@josh :-) big hug for all your effort !!!
Yeah, I got the point, that's what I thought ... but you also opened up for us 'beginners' now, which makes me happy :-) |
So in essence if I escape HTML the issue vanishes and we could consider using your library.
Bring back Windows Phone
…________________________________
From: Josh Goebel <[email protected]>
Sent: Sunday, November 22, 2020 4:08:48 PM
To: PeterWone/vsc-print <[email protected]>
Cc: Peter Wone <[email protected]>; Mention <[email protected]>
Subject: Re: [PeterWone/vsc-print] incorrect coloring for html comment across multiple lines (#63)
@PeterWone<https://github.com/PeterWone> I'm not sure if this affects your plugin or not... if you are asking us to highlight code inside a block you need that code to be properly HTML escaped (which should be pretty trivial to do with JS). The block should not include RAW unescaped HTML - which we are not going to highlight.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#63 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABJ6QOCPIAOYZJPDIFCY52DSRCTHBANCNFSM4QPKYFVA>.
|
That is my current understanding, yes. Highlight.js (currently) preserves HTML - it's passed thru silently - while the actual TEXT content is highlighted... and the HTML remains, doing whatever it would do... this is legacy behavior and in the future we'll likely drop it completely but we can't do that until version 11. If you want all the content to be treated as text (and highlighted) then just escape it. There is no reason for your block to include anything other than text (ie, properly escaped HTML which is "text" to the browser/runtime) unless you purposely WANT to use HTML to do something:
For example you can do tricks like this... and we'll highlight the text but preserve the HTML so line 1 will always be RED. Some people use this to highlight/emphasize parts of their code, etc... |
Oh, that explains why some of them won't show the |
Yes, or |
So the issue with rainbow.js is that they use And the fact that they change the HTML during the process won't change the fact they read from Because in that case it makes sense to me to definately warn the user once any input of code is not masked properly in a very clear way (and not in the console) that you can see the mistake and directly put a link to one of those converters to finally get the result someone might expect, just as you proposed ... just my two cents... |
Yes. You want "the text content in a code block" not "all the HTML code in a code block".
Well, no.. :-) It's quite likely Rainbow would actually "fix it"... by turning it into harmless highlighted code (albeit in weird ways)... but it's too late... the HTML is loaded BEFORE Rainbow sees it. The JS runs BEFORE Rainbow sees it. Plus you can't guarantee Rainbow will run anyways - perhaps the server (or internet) has a glitch and your page loads without Rainbow. <pre><code>
<script>
// this code runs BEFORE Rainbow
// this code could even hack Rainbow and prevent it from loading if it wanted to
</script>
</code></pre> |
Yeah, got it. But this also means (to me) that weather Rainbow.js nor Highlight.js can ever prevent anyone from injecting "bad javascript" (of course) which will anyway beeing executed BEFORE any of those methods through the browser engine, weather innerHTML nor textContent nor any masking techniques beeing applied by any javascript library afterwards, will ever come into play. Meaning that the main problem I still see is that it can only "warn" a user AFTER potential malicious code was injected already, or in other terms, it is much more important to warn users BEFOREHAND that they shall never use untested or unknown php code which might be used in a contact form or a comment form of any plugin since they do not know weather or not this code is beeing properly masked BEFORE it even hits the html page - which in turn will execute the script BEFORE EVEN ANY highlighter script can detect something via innerHTML or innerText - or am I wrong? |
@Neohiro79 Right, we can't STOP an injection attack like this "in the wild"... but if someone is building their project and testing things out and they do this accidentally we can yell at them very loudly then - so they fix it early - hopefully before their code every gets out "in the wild"... or even if it's in the wild we can still warm them if they are using raw HTML even innocently. Perhaps they can fix it before they are exploited maliciously. And we can use |
Yeah, got you. |
@josh I'll work on a html warning page and provide it to you once I finished which can be used to be injected into the document or as a popup for all to be aware of once they use HTML code in plain form - since I really think this is a problem since not everyone is aware of this shit. You can sent me the text you might think or come up with, I'll see what I can do, to make it strikingly clear for others. |
We don't need anything from you. We have an issue to track this, but we can't do anything about it until 2021 anyways - because we can't break the existing behavior in v10 (which passes HTML thru). If you wanted to drop some text on that issue you could, but we'll likely think about this carefully when we get around to doing it. So you can save the time. |
Sure, if there's no help needed I won't help of course. It was just a proposal with a good intension after all. |
OK so looking at the code (it's been a while) I have a function I give let sourceCode = await getSourceCode();
let renderedCode = "";
try {
renderedCode = hljs.highlight(sourceCode[0], sourceCode[1]).value;
}
catch (err) {
renderedCode = hljs.highlightAuto(sourceCode[1]).value;
} and as you can see if I cannot map the language to something This does have the vulnerability we discussed, both for accidentally toxic code and for handling the work of others. @joshgoebel do I understand correctly that for HTML you want me to escape all angle brackets to to I can't do this for unsaved editor buffers because I don't know what they are, these will always be processed in auto mode. |
I was never asking you to do anything magic, just upgrade to v10, which fixes many of the issues. :-) If our library still has regex issues, then yeah it's possible you'll find them - but that's on us, not you. :-)
That's only if your HTML is already on HTML page inside a block, etc... if you're using the API functions directly (highlight, highlightAuto, etc) then you pass them regular text/code whatever... there is zero need to do any type of escaping inside JS. And we escape the output. Escaping matters if you are using |
@ josh I really don't want to be pedantic here, but when I've searched in the CDN for I mean this way one can find it when searching for it. Who looks into a language description file when searching for the language - and even more - at the But now you've mentioned it, yes, sure, I know where to look when I am not sure about the usecase of a specific language file. |
@ peter from @ josh
|
This is why we have: https://github.com/highlightjs/highlight.js/blob/master/SUPPORTED_LANGUAGES.md :-)
That isn't something we could do until v11 in any case since it would break things for existing users. There are perhaps some other things worth discussing around HTML vs XML though - could you file an issue with that suggestion?
That was more to show you how it works. :) Then I went and pasted the list for you instead. |
And this list would of course IMHO make sense to be put in the same folder here: https://github.com/highlightjs/highlight.js/tree/master/src/languages since I didn't know of that list before that it even exists. |
Just as an idea, having for example three basic subsets, one that is optimised for and for those Just my two cents. |
I don't think the uses cases are that clear cut at all - that's why we make it so easy for people to download what they need instead of making that choice for them: https://highlightjs.org/download/ And if you want to common on the 'common' list especially there is an issue open for any changes we might make there in 2021. |
But it wouldn't hurt either to promote at least a carefully chosen You can keep the page as it is - just 'help' someone to find the right 'choice' - beginners love easy answers ;-) |
It's like HTML in the beginning - "strict" "transitional" and "frameset", three basic choices for example for web for example the most used ones (for programming): and the full package AND THE INDIVIDUAL CHOICES / BUILDS |
The existing "common" build is small enough to serve for "both" the cases you just list at just 36kb. |
Oh I see. I haven't recognised those "headlines". Yeah, but it's kinda huge package for "just the web" ... It for sure serves well for the programming fraction like you - sure - but not for "just the simple web package" (seems like a huge language overkill to me). I mean I don't want to spend hours of finding out which language packages to use to just highlight my basic code examples in my webpage and even then I later figure out, that I might used the wrong ones or could've used better ones - I hope you get what I mean or want to tell you here. It's kinda like this first 'contact' feeling: You want to start quickly and have some experience of success?
USE PROPER HTML MASKING TO YOUR OWN SAFETY !!! (link see here and here) Enjoy. |
For example, right now I use this in my first testcases, because I didn't even realised
and it took me more than a minute to figure it out and find the right |
There might be a docfix here (to better explain the CDN and default built-ins), but right now I'm not convinced more and more variants is the answer. :) In that case you only need the first line. |
The fun-fact is, I didn't even know that there are variants. I just thought you need the basic lib and then the specific language packs named after their language. |
By the way, see, even after 15 minutes research of how to implement Please consider at least a web-package. You will see in the stats, how often it will be picked and used. Or just include this basic web-package in the |
@ josh Am I understanding this right: highlight.js already packages this 'common' bundle within itself???
At least it seems to work here right now:
So all this is already bundled within the highlight.js file itself? |
Yes, they are all bundled by webpack. APL is the name of a programming language. It stands for A Programming Language. Really. https://en.wikipedia.org/wiki/APL_(programming_language) Back in the day you needed a special keyboard and a terminal that supported user defined characters. Nowadays editors like VS Code convert chords to the appropriate character codes and use an APL compatible font. APL is extremely obscure and I was basically making a joke about the extraordinary number of languages highlight.js supports. It is probably the most exotic, illegible and unmaintainable programming language ever genuinely intended to be used. Perl doesn't count because it isn't Turing complete. |
Well, than I guess, my whole effort was for nothing to convince @josh here to make a webpack-bundle ;-) ... was worth a try when not knowing that it already includes all that languages from start. But what if I want only those three languages (like a |
Did you find the problem @peter ? It might be a setting of the correct "classname" - maybe? Can you show the code to @josh that he might find out what the problem is? |
Your efforts to improve the quality of the product are never a bad idea. To understand why I bundled all of it you must consider the fact that VS Code Printing is designed to work offline. To install a user selection would require either a VSIX that supports some mechanism for choosing during the installation process (I don't think this is possible) or a fancy download page that builds a custom VSIX according to selections on a web page. Due to the overheads of handling loads of small files it's actually faster to load an 835KB containing all of highlight.js than it is to demand load parts of it and all its dependencies. No I haven't time today. It's my first day at a new job. |
Sorry, might has been a bit chaotic, the proposal for the bundling was originally meant for @josh (the whole webbundle thing). Yeah sure, no hurry, good luck on your new job! |
I have a strong suspicion that this is actually caused by the problem in #78 which is about to be resolved. |
We combine "hanging indent" for line numbers with PRE for code indent by using a table and putting the PRE in a TD. This assumes that the code is strictly line based and that any markup tags introduced by highlight.js will both open and close on the same line, a line being defined by \n rather than wrapping. Mutli-line comments and string literals break this assumption and to fix it in a language-agnostic way we have to parse the rendered HTML looking for unbalanced tags, close them at the end of the line and re-open them at the start of the next. |
@joshgoebel note the above comment -- this is not a |
Yep. You got the idea. If you do it abstractly enough you should make a plugin for it. :) It shouldn't be a lot of code, it's just never been a priority for me. |
@alan7872 thanks for opening this can of worms. We finally got to the bottom of things. |
I am printing an html file in which there are comments for the form
<!--comment-->
/*comment*/
But only the first line of the comment is correctly colored as comment.
The text was updated successfully, but these errors were encountered: