Skip to content
JensDiemer edited this page Apr 5, 2015 · 1 revision

How to handle unknown html tags

Introduction

We have different modes to handle unknown html tags when converting html to creole:

  • Raise !NotImplementedError on unknown tags.
  • Use <<html>> macro to mask unknown tags.
  • Escape all unknown tags.
  • Remove all unknown tags.

As default behaviour we use the last one and remove all unknown html tags.

You can change the default behaviour by passing a callable to !Html2CreoleEmitter() class or to html2creole() function.

examples

Raise !NotImplementedError on unknown tags.

from creole import html2creole
from creole.shared.unknown_tags import raise_unknown_node

print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=raise_unknown_node)

result:

Traceback (most recent call last):
  ...
NotImplementedError: Node from type 'unknown' is not implemented!

Use <<html>> macro to mask unknown tags.

from creole import html2creole
from creole.shared.unknown_tags import use_html_macro

print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=use_html_macro)

result:

<<html>><unknown><</html>>**foo**<<html>></unknown><</html>>

Escape all unknown tags.

from creole import html2creole
from creole.shared.unknown_tags import escape_unknown_nodes

print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=escape_unknown_nodes)

result:

&lt;unknown&gt;**foo**&lt;/unknown&gt;

Remove all unknown tags.

from creole import html2creole
from creole.shared.unknown_tags import transparent_unknown_nodes

print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=transparent_unknown_nodes)

result:

**foo**

complex example

You can also pass the callable to !Html2CreoleEmitter():

from creole.html_parser.parser import HtmlParser
from creole.html2creole.emitter import CreoleEmitter
from creole.shared.unknown_tags import escape_unknown_nodes

h2c = HtmlParser(debug=False)
document_tree = h2c.feed(u"<unknown><strong>foo</strong></unknown>")

emitter = CreoleEmitter(document_tree, debug=False, unknown_emit=escape_unknown_nodes)

print emitter.emit()

result:

&lt;unknown&gt;**foo**&lt;/unknown&gt;