-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Markdown.AST #1196
Introduce Markdown.AST #1196
Conversation
414359c
to
401ac0f
Compare
Thank you @eksperimental, this looks amazing but I am afraid it is nudging us in the wrong direction. Earmark is already capable of parsing HTML, so why do we need to reimplement it here and rely on Floki? In particular, I would expect |
The use of Floki was a last resort. Actually this PR was done when I added more HTML tests to realize that we were still not covering them.
Boths PRs stemmed from reports I have made upstream by working in this PR. |
Right, so I would rather push those concerns to Earmark rather than in ExDoc. It is not our job to solve them. Perhaps one solution you can adopt in Earmark is to have a copy of |
@RobertDober What do you think can be done about this? |
@eksperimental has had a tough time discovering how basic and also broken HTML parsing is in Earmark. Now I believe that I can come up with a pretty good HTML parser in 1.5 but I admit that
All that said I am working hard on the 1.4.6 release with the new AST format and most of the gross errors fixed (ETA Sunday 😨 ) do we agree that this is the first priority? If not I can rearrange my time. Please note too that pragdave/earmark#358 is downwards incompatible and therefore scheduled for 1.5 (ETA????) |
My understanding is that the issues described here are "old issues", so I would keep your current priority. @eksperimental if we close this, we should still merge #1190, right? |
no, never. That PR will render the Epub unusable whenever a |
My understanding is in this issue we're discussing and trying to tackle:
does that sound about right? @eksperimental could we just tackle 2. for now in this PR and leave 3. for the future? |
We could just leave out the Floki call which is one or two lines, and leave what Earmark gives as is, and merge this PR. |
Right, it is not our job to parse html and adding markdown specific logic is going to be counter intuitive to direction we are taking ExDoc which is to support multiple input formats. :) |
I know, but we have no other options, and we still don't know whether this will be fixed in Earkmark or not |
If we have no other options, then we will stick with no other options. We have lived without those so far and I really don't want to be responsible for handling this complexity. :) |
- Introduce ExDoc.Markdown.AST and t:Markdown.AST.html/0. - Replaces ExDoc.Formatter.HTML.ast_to_html/1 with ExDoc.Markdown.AST.to_html/2 which is a port of Earmark.Transform - It uses a more recent git version of Earkmark since due to a bug in the library. - It adds Floki to deal with HTML code in Markdown docs. Closes elixir-lang#1168, elixir-lang#1189 Supersedes elixir-lang#1190
401ac0f
to
46fd405
Compare
I have rebased to master, meaning now we work with EarmarkParser. I have added some temporary fixes until EarkmarkParser always works with 4-element tuples in the AST. |
For the record, this limitation recently came in with the introduction of AST if I'm not mistaken. I think the easiest solution would be to store the original HTML in the metadata, as I have suggested to @RobertDober and we can inject it back in on our side. |
But why do we need the original HTML? Doesn’t markdown say we should parse any markdown inside the HTML too? So returning the original HTML could override the nested parsing. |
Btw, thanks for the clarification! :D |
Earmark is returning a crippled version of the original HTML or something like that (I can't recall exactly). As of when I wrote this PR we couldn't fix this limitation, as we were not able to reconstruct the HTML that we passed to it. I will look later if something changed in EarmarkParser |
Sorry, the crippled version it was a bug that has been fixed. We are getting the HTML string. The issue is that we cannot escape the text within the HTML elements. This PR and also the current version of ExDoc escape the special characters of the whole string |
@josevalim Almost no Markdown parser parses content inside HTML https://babelmark.github.io/?text=%3Cdiv%3E+%0A++-+a%0A++-+b%0A%3C%2Fdiv%3E , nor does EaramrkParser I try to orient Earmark as much as GFM as possible, GFM does not parse Markdown inside HTML. |
Ah, apologies for the confusion then. So storing the original HTML should work, if that’s what we want. |
Sorry, since Earkmark was parsing the first level of the HTML I thought the intention was to build the whole tree. |
Why do we need to escape them? Maybe they shouldn’t be unescaped in the first place? |
@eksperimental parsing the whole tree is planned for 1.5, scheduled for 2020-08-30 and to be clear parsing the whole tree, leaving all text nodes untransformed using Floki as standard
|
Regardless the parsing of the HTML code which is just two lines and I will remove them later today., I would like to know what is your opinion on this PR |
I think it is too early for us to formalize the AST. We also don't really care about Markdown AST per se as we will likely introduce our own so it works with both Markdown and the Erlang docs AST. So I would rather focus on fixing bugs instead of trying to formalize something that will definitely change. |
Closes #1168, #1189
Supersedes #1190