Releases: weblyzard/inscriptis
Releases · weblyzard/inscriptis
Use the requests library for URL fetching
- use requests for URL fetching (this addresses #17 and prevents
403
responses with some Web servers).
Fixed handling of negative margins.
- correctly parse negative margins in CSS definitions.
- This fixes a bug that led for some pages to a high number (>1000) of newlines between content.
Use server encoding, if available in the inscript.py client.
This prevents encoding errors when using inscript.py
for converting HTML pages to text.
Decode HTLM entities
Decode HTML entities such as Auml;
, Ouml;
, Uuml;
prior to returning the plain text version of the HTML page.
Improved parsing and PyPI metadata
- improved handling of highly nested tables
- more comprehensive PyPI metadata
flask web service and more reliable parsing
Changelog
- optional flask web service for converting html to python
- bug fixes
- allow infinitely nested lists
- fix a css parsing bug
- correctly handle empty documents