Releases: weblyzard/inscriptis
Releases · weblyzard/inscriptis
Custom HTML Handling and HTML engine improvements
- add working support for specifying custom html tags (fixes #81)
- improved html_engine.py
- improved typing across all modules
- added unittests for
- inscript
- inscriptis-api
- documentation update
Fix documentation build and update publish script.
- fix building documentation on readthedocs.org
- update publish script
Code cleanup, improved Web service and distribution
- added official Python 3.12 support
- Inscriptis command line client
- renamed
inscript.py
toinscript
and install client via pip - added
--timeout
argument.
- renamed
- Inscriptis Web service:
- migrate the Web service to FastAPI and uvicorn
- enable install as an extra using
pip install inscriptis[web-service]
- code cleanup
- migrate to
pyproject.toml
and poetry for package distribution - use black for code formatting
- improved tox config and code checks
Official Python 3.11 support
Maintenance release adding Python 3.11 to the build pipeline.
Fixed handling of invalid length specifications
This is a bugfix release correcting the handling of invalid length specifications (bug #63).
Correct handling of tail text in HTML comments
- fix: correctly handle HTML comments used to confuse HTML to text conversion (fixes #45).
- fix: updated unittests to correctly work with lxml in Ubuntu 22.04.
- add: updated and extended flake8 testing.
Support for custom HTML table separators and Python 3.10
- support custom HTML tables separators (addresses #29).
- extended documentation on the command line client and added a link to the JOSS paper on inscriptis.
- officially support Python 3.10 and add it to the build pipeline.
- fixed dependency resolution for tox builds.
Zenodo DOI and integrated feedback obtained through the Journal of Open Source Software review process
- improved documentation based on feedback provided by @reality, @rlskoeser and @sbenthall as part of the Journal of Open Source Software review process.
- the Inscriptis web service has been included into the Python package and can now be started with
export FLASK_APP="inscriptis.service.web" python3 -m flask run
Integrated feedback obtained through the Journal of Open Source Software review process
- improved documentation based on feedback provided by @reality, @rlskoeser and @sbenthall as part of the Journal of Open Source Software review process.
- the Inscriptis web service has been included into the Python package and can now be started with
export FLASK_APP="inscriptis.service.web" python3 -m flask run
Improved document model, parsing of borderline cases & HTML annotation support
Changes
HTML parsing:
- new: improved model for handling text blocks and lines
- chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
- chg: improved whitespace handling
- add: cover more borderline cases with unit tests
Inscriptis core:
- new: annotation support
- new: processing of annotation rules and annotation output
- new: type hints
- add: extended and improved documentation
Inscript command line client:
- new: added
--annotation-rules
option for annotation support. - new: added
--post-processor
option to export and visualize annotations (HTML, XML and surface form export) - chg: apply
--encoding
to Web URLs as well
Misc:
- chg: migrated to the semantic versioning schema described on https://semver.org/ for versioning.
Note
In terms of functionality, this release corresponds to Inscriptis 2.0rc2.