You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, Sanitize doesn't appear to like the colon inside the href attribute. It simply outputs <a>1</a> instead, skipping the href attribute altogether.
My sanitize code looks like this:
# Setup whitelist of html elements, attributes, and protocols that are allowed.
allowed_elements = ['h2', 'a', 'img', 'p', 'ul', 'ol', 'li', 'strong', 'em', 'cite',
'blockquote', 'code', 'pre', 'dl', 'dt', 'dd', 'br', 'hr', 'sup', 'div']
allowed_attributes = {'a' => ['href', 'rel', 'rev'], 'img' => ['src', 'alt'],
'sup' => ['id'], 'div' => ['class'], 'li' => ['id']}
allowed_protocols = {'a' => {'href' => ['http', 'https', 'mailto', :relative]}}
# Clean text of any unwanted html tags.
html = Sanitize.clean(html, :elements => allowed_elements, :attributes => allowed_attributes,
:protocols => allowed_protocols)
Is there a way to get Sanitize to accept a colon in the href attribute?
Answered on Stack Overflow. Repeating here for posterity.
This is Sanitize doing the safest thing by default. It assumes that the portion of the URL before the : is a protocol (or a scheme in the terminology of RFC 1738), and since #fn isn't in the protocol whitelist, the entire href attribute is removed.
You can allow URLs like this by adding #fn to the protocol whitelist:
I'm found this while troubleshooting an issue where Sanitize strips out the : character in the href tag. I have a document with a bookmark that contains the : character and an href that points to it (e.g.href="#my:id). Seeing as : is a valid character for id in HTML5 would it be safe for Sanitize to leave the : in place for links that begin with a # character?
rgrove
changed the title
Sanitize doesn't like colon inside href attribute
URL fragment identifiers containing colons are stripped even when relative URLs are allowed
Nov 1, 2017
Turns out this was fixed in #87 (released in v2.1.0) way back in 2013, but this issue wasn't mentioned in that PR so no link was established. When a new comment was added here in 2017, I must not have remembered that PR, and I reopened the issue. But as far as I can tell everything's working fine!
If anyone's still having problems with URL fragments that contain colons, please share some code that reproduces the problem (be sure to mention what version of Sanitize you're using), and I'll try to find time to investigate before another 7 years go by. 😄
Using the Sanitize gem, I'm cleaning some HTML. In the href attribute of my anchor tags, I wish to parse the following:
<a href="#fn:1">1</a>
This is required for implementing footnotes using the Kramdown gem.
However, Sanitize doesn't appear to like the colon inside the href attribute. It simply outputs
<a>1</a>
instead, skipping the href attribute altogether.My sanitize code looks like this:
Is there a way to get Sanitize to accept a colon in the href attribute?
This issue is a duplicate of this Stack Overflow question.
The text was updated successfully, but these errors were encountered: