-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Your library encourages unescaped HTML by supporting it in many cases. #249
Comments
While I really appreciate the way of trying to fix security vulnerabilities which might potentially coming up, at least this solution is a solution which works in an understandable way (for me). If I am hosting this on my very own, nothing should happen as far as I am aware of. Of course, if you fire up the console and put some additional code in it or use some weird ads on your website, you always "have potential security risks". But this is true for every website out there and has nothing to do with this "masking" of "<" ">" elements. This is not a form field which will get pushed via PHP to a server: I personally don't get the drama here right now - but I might have not the whole picture... At least Craig's solution is by far the most reliable one when it comes to "highlighting" html code the way I expect it to be highlighted from a highlighter - and additionally he provided the best feedback so far from all "highlighers" out there. |
So yes, Javascript works (in every modern browser nowadays, as far as I know): |
Yes, but this is potentially very bad:
And this does nothing (because it's just text, NOT Javascript).
I didn't really open this issue for you more for the library author.
You don't. The key: Not all users are you. This is very dangerous behavior for a highlighting library to be encouraging. There are very real risks when someone DOES use this library with PHP and a server and then highlight some code (just like you did)... and then their website is defaced (or much much worse).
HTML is complicated... sometimes just because something is easy doesn't mean there aren't serious risks. :) |
@Neohiro79 Just to be clear there are also technical issues with trying to highlighting HTML like that:
For example you can't highlight this because Or:
This will come out as |
@Neohiro79 Don't feel bad though, this is a very common confusion. But it's the way it is for good reason. :) And lots of security nightmares have occurred on the web due to people not understanding this who perhaps should have. :) You might say:
And you might be speaking the truth. :-) But that's also terrible advice for a LOT of people in say urban cities. :-) |
Josh, I am really thankful that you at least gave me a proper answer to my question and also provided me with a workaround. But I don't get the point of your security concerns. Javascript is as far as I am aware of activated by default in all modern browsers nowadays. So basically EVERY script, even on CDN's might be "compromised" - hence "loaded" into the browser without ANY option to "stop the browser from doing so" - except "denying all javascripts" - which noone really does nowadays. If I host all my scripts and html files on my server having only potentially some "input-fields" which might get submitted through php itself, I only run into troubles (as far as I am aware of) when you do not use htmlentities to proper ensure that no code which might beeing submitted is potentially beeing executed on the server itself. That would harm your files on the server, yes. That said, since every client and browser on this planet actually executes javascript without any security measures other than maybe a sandboxed environment, I don't get your point of the implied security concerns in case someone "highlights" the code the browser anyway doesn't execute since it is in a @edit: Which was proofed wrong thanx to joshs ongoing resitance, as can be seen here:
I might be missing something here - if so, please give me a hint since I want to know what I might need to look out for when using rainbow.js. If you're talking about a node.js environment which is executed on the server - well, yes, that might be a complete other topic, but who is using node.js when beeing concerned about security issues, right? ;-) |
I'm talking about the HTML environment on the client. I think I covered it with my analogy earlier. Just because your gold is safe on your front porch doesn't mean putting a huge stack of gold on front porches is a good idea for most people - and perhaps even you will get your gold stolen one day. :-) Same reason you don't run with knives... no one means to trip and impale themselves and die, yet every year it happens to someone. :-) What if I want to highlight this code:
You can't UNLESS you properly escape the HTML. That is the correct way to do it. |
Thankfully there are no examples in the README showing this risky behavior. Hopefully the author will come around eventually and dissuade you. :-) |
Here the author is saying the same ting I'm telling you: #244 :-) And more issues: It's like half the issues are about people doing this wrong and the pain as a result. |
What if I want to show someone this snippet: I just use the
and hightlight it properly with a "hightligher-script".
And since I've edited it in here in git, I needed to use those `backticks ` for the Then, just a personal guess, github is using the `backticks` to properly address those chars with htmlentities / php on the backend once the "comment" button has been pressed to properly "recode" those That's how it should be done. And the "highlighter script" - in my humble opinion - has nothing to do with this. |
That is not safe either:
|
Yes because if they didn't they would have major security problems. |
The gold on your porch might be safe... for now... but good luck. :) Because a lot of peoples gold is getting stolen off porches... I'll keep my gold elsewhere. |
FYI: Unrelated You use three backticks for multi-line code blocks on GitHub.
|
I think I found out what you're talking about: This is the concern you have, right? The browser executes Javascript-Code even when it is written in a code-element, that's the point? |
Yes, that's something I wasn't aware of yet, I really thought javascript code inside a codeblock would NOT be executed by default. What a fuckup. But even then, if I "use" javascript-code to be highlighted, I only need to address this javascript " Even more, why is it then, that close to all highlighters I tested, only HTML comments made "troubles" - not "CSS" and not "Javascript"? I will look into this topic further, I hope we can keep up this conversation once I figured it out. |
I've given 2-3 examples of where this breaks in very weird ways. See many of the open issues here for tons examples. The problem in there is NO way to get the ACTUAL "text" in-between tags (when there is HTML or comments in the middle), because the browser throws that text it away... and store a "representation" of it. Such as my example with the browser turning all your
There is no way to get whatever [stuff that includes HTML] is verbatim. And that's a problem if you're actually trying to highlight exactly what is in-between the blocks. Which is kind of important if say you're wanting to highlight specific HTML to ask a question or show someone a problem.
I do not know what you are asking. And I'd be carefully to make this a numbers game. It's possible for a lot of people to do something wrong and it still be a bad idea. :-) |
What you want is something like:
But the problem is what if we wanted to show how to use it?
What the heck is going on now? How do we know which XML actually has this concept with "CDATA":
And the rule is "anything goes" UNTIL |
Original problem I found astounding was this: I thought basically, that most highlighters I've tested at that point with that code, that it just "eats" the html comments for whatever reason. What I haven't been aware of or seen at that time, were the missing html tags itself Once you made your statement about the missing encapsulation or encoding and got around my doubts, I went to html-entities, changed the code - and all of the sudden the magic happened, I saw everything as I would've expected. See here: What might seem basically logic to you once knowing about this very fact of encoding practise, it wasn't for me. I quickly implemented all of them in a jsfiddle as written in their helpfile just to see how they worked (it was originally about the topic of having more than one line comments which have been highlighted false) and found that close to none of them were showing up the html comments properly as expected - but all others (CSS, JS comments and content) showed up.
I would say since we're all using proper html5 based on proper XHTML 1.0 which was basically used before my guess would be that's clear since the encapsulation is always inside another.
So basically if you grab an ID or class in the DOM you will get to the "contents" of any given tag I wasn't aware of the fact that the browser is mixing things up internally to much to avoid javascript execution within code blocks, but for me it seems like a huge fuckup, to allow code execution within code tags itself, since they are basically there to "show code" as I understand them, not to tell the browser, "hey execute that, it's code". How f* dumb is that. It's like you're going to jump into a car, close the door, and during your drive the door will just suddenly open up because it wasn't locked in the first place. What happend? Well, while you might think the door was closed when you closed the door, fun-fact it wasn't, since the engeneers found that too primitive, hence they implemented a second nice looking but good hidden lock-button, which will lock the door once pressed. but still ...basically my main question, why the need to mask ALL I mean right now the practise seems more like a |
Just wanted to point out that rainbow does escape tags for you when it can. Lines 66 to 74 in d606601
The issue is that when you put something like this the browser executes the tags as they were written on the page before the javascript from rainbow runs to rewrite them: <pre><code>
<script>var whatever = true;</script>
</code></pre> So it is specifically when you are highlighting html code in this manner that you need to use html entities <pre><code>
<script>var whatever = true;</script>
</code></pre> Also, as an aside, you should set the language to <pre><code class="lang-js">if (foo > 25) {
console.log('good!');
}</code></pre> Rainbow works from node.js too so if you are doing the rendering on the server side, I believe it will already output proper html entities. |
Well as you are learning this all depends on various behavior of innerText, innerHTML, textContent... HTML isn't content, so I personally think the "correct" behavior is to silently drop it all (which leads to a teaching moment when people go "wait where did my code go?????") - which is exactly what HLJS does (for the most part). Obviously Rainbow has made a different choice - though one I don't understand at all since it seems half of the support issues could be avoided with a small change to the library to help educate people if they are doing this wrong.
Oh no that was clear. :-) I'm just a bit more used to "appeal to authority" working. :-) I maintain Highlight.js. I run a website where people can paste code (http://pastie.org/). Though I guess I didn't exactly present credentials. I've been doing this quite a long time. I'm happy to explain things, but when people seem not to want to listen it can get frustrating fast. I'm glad to see you're coming around. :-)
That's a good question. I'd say it's a bit of an "beginner" question (and our README assumes more than that level of knowledge), but I might even be wrong on that... but even if so there are lots of beginners I suppose... so it's probably worth some documentation. Highlight.js has a small problem though in that our handling of HTML is a bit more "complex" than I've led you to believe. We actually preserve HTML and pass it thru, which allows some neat tricks:
We will highlighting this like as such (on the browser):
And this is one advantage of MIXING code and html... Someone might define Yes, it's complex and long-term I want to kill this (or move it to a plugin) since this is a feature not everyone needs. So in the future out of the box it's far more likely we'd silently drop the HTML and just render:
I assume they are all using variants of
HTML5 is typically not valid XHTML nor is it required to be. You are not required to close or pair tags. The real question was how do you how do you know which verbatim is which. What if you add more:
Is that a bunch of nested verbatims or a single verbatim with a bunch of verbatim code examples? You could invent some complex rules to decide, but it's not clear at all. Hence, escaping (or something like CDATA). XML (with no ambiguity):
I've been doing this too long to comment other than to say your expectation is just wrong. <code>
formatWholeComputer() <span style="color:red"> // never do this</span>
</code> How do you think highlighters work? We put HTML inside code tags... HTML that colors the text different colors, just like the last example. We don't change the code tag, it's still a code tag - with HTML inside it.
I already gave an example Highlight.js also encodes If you are doing this escaping by hand, you should stop and use a blogging platform or something. This is a job the computer should be doing for you. :-) |
I am also going to close this issue as I don’t encourage unescaped HTML code. It won’t work most of the time, and of course, can introduce xss vulnerabilities. |
Also even in XHTML land the point I was making is if "verbatim" magically means 'this is not code"... then the first "" isn't an opening tag.. it's just text, it could be booger... I'm asking "how do you put text on the page that looks like HTML" So if I really want the TEXT/output to be:
You'd start with:
But this has a BIG problem because the browser is probably only going to see:
And who knows what it will do with the rest of my code probably hide it and not display it. So you escape it: <verbatim>
boogers-boogers-boogers-boogers
</verbatim></verbatim></verbatim></verbatim>
</verbatim> It's ugly, but now the browser isn't confusde. |
It actually works a lot of the time (other than all the weird side issues) - which led to this whole thread... :-) is there a reason you don't just use @ccampbell FYI: I wasn't accusing but early on I got the impression that this person had spoken to the author of the library and the author had encouraged this or marketed it as a feature. :-) I'm glad I misunderstood. |
I guess it depends on what you are talking about. Rainbow doesn’t do any dynamic markup injection or anything like that. It takes what is already rendered on the page. If someone wants to dynamically insert markup to be highlighted that is entirely outside the scope of this library. I see I have one examples in the README using |
I'm not sure you are following. Your library current supports (perhaps encourages this). From looking at ALL your support issues its obvious people are very confused - because for simple cases raw HTML actually works (despite being a huge security risk)... but then for other cases it flops: <pre><code>
<html>...</html> <!-- this will disappear of course since it's not valid HTML -->
<!-- is this HTML or code to be highlighted? -->
<span>Is this HTML or code to be highlighted</span>
<script>stealUserData()</script>
<STRONG>bold</STRONG> <!-- will turn into <strong> -->
</code></pre> What is the purpose of allowing the unescaped HTML at all?It encourages users in the sample cases to think this is how your library can and should be used. Since you'll actually highlight MOST of that. So you could do one of two things:
See the proposal I linked to just above. highlightjs/highlight.js#2886 Your library itself could easily detect that the user (mistakenly) include RAW HTML and then show them a warning or error with a link to a section of the documentation (or an issue) that explains proper HTML escaping once and for all - vs you having to answer the question over and over. And it would make the web a more secure place. If your library allows this (and people are doing it often judging from issues) then it seems very obvious that some of your users are potentially opening themselves up to a serious security vulnerability by using your library - one that you could so easily be saving them from. |
To be clear I mean it's your library encouraging bad behavior, not your documentation or README. Say someone uses your library on their blog: <p>Here is an example of Bold:</p>
<pre><code class="language-html">
<strong>This will be bold</strong>
</code></pre> And they go "Oh neat, that's so easy. I love this library!" I could give you a long list of support issues where your users have done exactly this... except you only see the users who got tripped up on minor glitches... you don't see all the happy users doing it wrong. BUT then later they build a more complex project: <pre><code class="language-html">
<?php echo $USER_INPUT ?>
</code></ore> And boom, their website is exploited, user data is stolen, etc... this isn't directly your fault... but you did nothing to prevent them from getting the wrong idea about how to use your library... so then they get the wrong idea, and cut their hand wide open - with the knife you handed them - because you assumed they knew knives were dangerous... but they didn't... It'd be so easy to say "Hey knives are sharp, be careful." :) |
I completely felt it was my fault that josh stepped over to you and was accusing you of something you didn't even know what he was talking about and why and I didn't even know that it existed before. I am always pissed off when someone tried to accuse someone out of the blue - regardless the matter. Since my lack of knowledge in this case triggered this whole issue, I am deeply sorry if someone of you two got brain-troubles in the process. Josh potentially was just pissed of the fact that I mentioned that your library is much more intuitive and working from the startup - so I'm sorry for that - that might've triggered the whole anger of Josh here firsthand. @josh As you guessed correctly, no, I didn't checked 'who you are' (who cares ;-)), I just thought some guy who is for whatever reason I don't get right now just hugely pissed off :-) I really think that we all could achieve so much more when we would just keeping all our "proudness" away and just keep discussing topics as what they are (and doing so right now as far as I see) and trying to explain things to each other in a manner and language that we all can "enhance" ourselves. And let me tell you, Craig is really a role model in this regard. I was happy to finally get a useful answer from "someone" of those "libraries" (who in most cases don't work as expected) ... So I'm happy to see I finally can learn and ask something here which might be important to know. |
And that was exactly my point from the beginning. The question, how someone might potentially "insert" a malicious script on your site will only "come up" when someone is able to "push" code through an input field to a site which shows this exact content in a "comment-field" or whatever WITHOUT PROPER ENCODING ON PHP/SERVER-LEVEL BEFOREHAND like php-htmlentities BEFORE putting that code "into place". So basically that has been the whole issue - or am I wrong? If someone is not aware of the fact of proper masking before putting input to a server this is or becomes a potential issue, since javascript code can be executed inside a Which basically leads me to this question: If this is a main concern - wouldn't it be more concerning not to be concerned about the broad usage of node.js and all it's javascript-dependencies getting javascript-dependencies getting javascript-dependencies of javascript-dependencies noone really can say what is loaded when and where from INTO or ONTO your computer (say VSC for example)? |
I was not angry, maybe very slightly miffed. :-) The way you said it did read a certain way... :-) "your stupid library is doing it wrong and broken" Could be I was reading a lot into it. :-) And I tend to write a lot, it doesn't mean I'm "pissed off". :-) I just over communicate. Don't think I'm being prideful, just trying to make the internet a bit more secure place and save some people a lot of potential pain later. Who is Craig again LOL? :-) It's also possible you misunderstood someone's answer to be suggesting something they weren't suggesting. :-) I don't think you'll find many highlighting library authors encouraging unescaped HTML (so my topic choice here was probably quite poor)... :-) |
No. You can shoot yourself in the foot too. A lot of people have automated systems for publishing their sites... So one day you just add:
And run |
@Neohiro79 But again my point was always larger than you and your personal website. Teach one person to do this wrong and they teach other people and then you have a bunch of people doing it wrong. :-) You should do it right so that others learn from your good example instead of learning from your your bad behavior - because they MIGHT be dealing with random user input. |
He never did Josh ... :-) See here: Be assured, after this whole conversation and all the in- and for sure upcoming useful comments here, that I would hug both of you :-) |
Yeah, you're right. That's why I really like this conversation, since 'some guys' who finally 'know something' can 'tell someone' something which might help others too in the future. That's how it should be! |
And that my friend is the beginning of wisdom. This is a VERY real attack vector - and has been used for attacks in the past. (with Ruby Gems off the top of my head) Compromise the library server, upload a "bad" library instead of a good one, infect everyone who installs it. https://news.ycombinator.com/item?id=5139583 I'm not sure what Related: PrismJS/prism#1680 (auto-loading 'infected'/bad languages from a CDN is a risk) |
@ccampbell Sorry for turning your issue into a huge chat room. ;-) |
Meaning that a lot of php tools or plugins out there do NOT use proper htmlentities encapsulation and hence this is or shall be a "second" security wall? To sum it up for me:
then and only then I shall be 'save, happy and super-secure' ? |
I'm asking because I found a javascript library which basically can reverse the encoded parts into real html code again, so basically the only security measure which will work is see here: I was likely to use this as a "convenience-tool" for me not always having to change my code I want to "present" online. |
Let's keep this focused on HTML escaping. Proper website security is a HUGE issue and gets into many, many things. I haven't the time or desire to get into all that. :) The general idea: If not's not HTML code meant to be rendered as HTML code, then escape it - so it's not accidentally confused by the browser as HTML code.
Or make sure the CDN assets are what they claim to be. https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity |
Most big CDNs go to great trouble to not be hacked, but yes, any code you don't control is always a risk. A lot of people use Highlight.js from several CDNs we trust, and that's reasonable for many people. Security is about layers and sometimes knowing "how much do you need".
Perhaps you need a safe, but you may not need guard dogs and armed guards - but some people do. |
@josh Could you please do me a favour and get in contact with @Peter here, since he tries to get a VSC plugin to work (it's about the complete unusual usecase of someone wanting to finally print his code through a printer to control what he might have written or mark some things which might need to be rewritten). It seems there is a can of worms once trying to find the reason for the problem of not having multi-line-comments syntax-highlighted as intended - and even more, your inside-knowledge might be really helpful for him I guess: Issue about Issue about @that issues with VSC - yes, that scares me, that's why I asked. |
I have no idea why that issue is relevant to this discussion. |
This issue might not be relevant to this discussion, but he uses your highlight.js in his plugin. But yes, I might ask for too much. |
I see now, wasn't obvious at a glance. I commented. |
Yeah, sorry, I realised it looked completely like falling out of line ... Too impulsive myself sometimes too ;-) |
Thanks so much josh! I will let all these comments sink in, since I guess you might've something else to do today and Craig will have no more follow-up emails from us in his mailbox, at least for today :-) |
Reference:
highlightjs/highlight.js#2884
Some users get entirely the wrong idea when you support this behavior (highlighting unescaped code). This allows all sorts of potential JS injection attacks for users who don't understand why proper escaping is necessary.
Are you aware of the security implications of encouraging this?
The text was updated successfully, but these errors were encountered: