Cuts the tags and attributes from HTML that are not on the whitelist. Their content is leaves. Signature of whitelist:
{'enabled tag name' : ['list of enabled tag\'s attributes']}
You can use the symbol *
to allow all tags and/or attributes.
Note that the script
and style
tags are removed with content.
This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.
$ pip install html-purifier
>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
'div': ['*'], # разрешает все атрибуты у тега div - All attributes are allowed for div
'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span - Only "attr-2" attribute is allowed for span elements
# все остальные теги удаляются, но их содержимое остается - All other tags and attributes are removed but their content is kept
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>
As usually used in models and forms.
Here is purifier.models.PurifyedCharField
, purifier.models.PurifyedTextField
for Django ORM and purifier.forms.PurifyedCharField
for Django forms