Skip to content

Improved document model, parsing of borderline cases & HTML annotation support

Compare
Choose a tag to compare
@AlbertWeichselbraun AlbertWeichselbraun released this 30 Jun 09:51
· 326 commits to master since this release
84ec720
  1. HTML parsing:

    • new: new model for handling blocks and lines
    • chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
    • chg: improved whitespace handling
    • add: cover more borderline cases with unit tests
  2. Inscriptis core:

    • new: support for annotation rules and annotation output
    • new: annotation post-processors (html, xml, surface form)
    • new: type hints
    • chg: extended and improved documentation
  3. Inscript command line client:

    • chg: apply --encoding to Web URLs as well