-
-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conversion for HTML to markdown #932
base: master
Are you sure you want to change the base?
Conversation
Thank you for the PR, I'll try to review it soon |
Codecov ReportAttention: ❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
... and 11 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
@hrodz : this is a pretty massive undertaking, there is a lot of missing code coverage. But as you get your code more stable and to your liking, I can try and pitch in and help here. On a second note:
Or tags that don't align:
Another scenario might be deep nesting with missing tags, etc. Just to be clear, I'm not saying your solution has to be perfect 🙂 . But there are so many |
@caronc Thanks for taking the time to look at this and provide feedback. I will work on the code coverage. Just wanted to know if you have any thoughts about this before committing more time to it. Regarding working with invalid HTML, the two example provided seem to be already covered by the current html_to_text conversion, this code will produce similar results. I understand that I may have to do more testing on the parsing of the invalid tags to be converted to markdown, but if possible I would like more insight on the extend of this validation. Would it be acceptable to have a similar handling of invalid HTML as the html_to_text conversion? Or should we expect it to handle more cases? Thanks. |
I reviewed your code; i had to make some changes, but i think i got the test coverage done. The only thing is what you're doing is very tricky... it's hard to track indentation; so some formatting will get lost (hence if a code block should have been indented over because we're currently in a list (or But overall it seems to work. I updated the test cases a bit too... the previous code was escaping characters that shouldn't have been (like periods Have a look at what i've done let me know your thoughts |
Description:
Related issue (if applicable): #527
I use Mailrise to get email notifications for various services. The HTML to Text conversion works great for regular notifications but it will be nice to be able to receive and click on links. This is my attempt to implement a HTML to Markdown converter to be able to have links on the notifications.
This code is heavily base on the current HTML to Text converter. I'm not a developer but I'm willing to learn and work on the code to make this feature available. Let me know if this is something we can work on. I have been testing with Gotify notifications and it seems to work fine.
The only thing I have not been able to get working are line breaks. I have try with different characters (double space, \, ) without success on Gotify. Thanks.
Checklist
flake8
)