Skip to content

Releases: weblyzard/inscriptis

Fixed annotations for borderline cases

10 Jul 15:21
Compare
Choose a tag to compare

Please refer to https://github.com/weblyzard/inscriptis/releases/tag/2.0rc1 for a list of all new features. This release candidate fixes the following issues in rc1:

  • fixed annotations for some borderline cases
  • improved documentation compared to 2.0rc2

Improved document model, parsing of borderline cases & HTML annotation support

30 Jun 09:51
84ec720
Compare
Choose a tag to compare
  1. HTML parsing:

    • new: new model for handling blocks and lines
    • chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
    • chg: improved whitespace handling
    • add: cover more borderline cases with unit tests
  2. Inscriptis core:

    • new: support for annotation rules and annotation output
    • new: annotation post-processors (html, xml, surface form)
    • new: type hints
    • chg: extended and improved documentation
  3. Inscript command line client:

    • chg: apply --encoding to Web URLs as well

1.2

14 May 09:40
Compare
Choose a tag to compare
1.2
  • tables: add support for vertical (valign, css: text-vertical-alginment) and horizontal (align) cell alignment (fixes: #33)
  • improved handling of HTML attributes and styles
  • code cleanup
  • migrated build from travis to github actions

Improved margin handling & more liberal licensing

04 Jan 12:51
d6e275d
Compare
Choose a tag to compare
  • ignore top margins at the beginning of a document.
  • more liberal licensing:
    • the license change has been triggered by another project that created a Java port of inscriptis.
    • to facilitate the free sharing of code and ideas between our two projects, we have (i) obtained the permission of all contributors for a license change, and (ii) changed the inscriptis license to the "Apache License 2.0".

Improved testing and Python 3.9 support

08 Dec 06:33
Compare
Choose a tag to compare
  • minor performance improvements and code optimizations
  • added Python 3.9 test environment
  • improved test coverage
  • updated package metadata
  • improved tox configuration

Improved HTML rendering, command line client and Web service

20 May 19:10
Compare
Choose a tag to compare
  1. added support for rendering tags with the white-space: pre CSS attribute (e.g. <pre> which is often used for formatting code).
  2. API change: A ParserConfig object replaces the parameters display_images, dedpulicate_captions, display_links and indentation in get_text() and for initializing the Inscriptis class.
      from lxml.html import fromstring
      from inscriptis.model.config import ParserConfig
      
      html_tree = fromstring(html)   
      # optional parser configuration fine tuning
      config = ParserConfig(display_links=True, display_anchors=True)
      parser = Inscriptis(html_tree, config)
      text = parser.get_text()
  1. command line client:
    • added option for displaying anchor links
    • --encoding not sets the HTML and output encoding
    • new --version option
  2. Web service
    • use the related CSS profile per default
    • added version call
  3. Documentation fixes and improvements

Improved performance and code structure, documentation and unit testing

20 Dec 17:16
Compare
Choose a tag to compare
  • improved performance and code structure.
  • use metadata published in ./inscriptis/__init__.py for versioning and in setup.py.
  • improved test coverage
  • created sphinx API, usage and testing documentation which is published on https://inscriptis.readthedocs.org
  • requires Python 3.5+ (dropped support for Python 2.7)

Correct inscript.py default indentation strategy.

25 Sep 13:20
Compare
Choose a tag to compare

Use the extended indentation strategy per default as outlined in the README.md.

Improved indentation and custom rendering styles

25 Sep 13:09
b064737
Compare
Choose a tag to compare
  • improved indentation, if span and div tags are used
  • support for custom rendering styles
  • improved documentation
  • use travis for auto CI
  • requires Python 2.7+ or Python 3.5+ since lxml does not support Python 3 versions <3.5

Improved table rendering (nested tables and line breaks in tables)

26 Feb 09:33
d45c687
Compare
Choose a tag to compare
  • Correctly handle nested tables and line breaks (e.g. due to enumerations, list or paragraph breaks) in tables.
  • Improved content stripping.

Please take a look at the Rendering document for an overview of how Inscriptis renders different tables.