Improved document model, parsing of borderline cases & HTML annotation support
AlbertWeichselbraun
released this
30 Jun 09:51
·
326 commits
to master
since this release
-
HTML parsing:
- new: new model for handling blocks and lines
- chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
- chg: improved whitespace handling
- add: cover more borderline cases with unit tests
-
Inscriptis core:
- new: support for annotation rules and annotation output
- new: annotation post-processors (html, xml, surface form)
- new: type hints
- chg: extended and improved documentation
-
Inscript command line client:
- chg: apply
--encoding
to Web URLs as well
- chg: apply