-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibly better verse block handling? #48
Comments
Thanks for digging in! Replacing |
Perhaps the verse blocks you're looking at are not using whitespace? Given the following org file:
I got this output (sans the inner style content as it's verbose to irrelevant):
Now normally my org-export function uses the HTML encoded version of the no-break space, but here it's using the hex value. Not sure if that matters one way or the other, might be htmlize or something playing with things... |
Oh, I see! so, the actual behavior is that verse blocks are first stripped of the common whitespace prefix, then newlines are converted to You can see that from this sample:
that converts to: <p class="verse">
hello<br />
  this is a verse triple whitespace<br />
</p> |
From ;;;; Verse Block
(defun org-html-verse-block (_verse-block contents info)
"Transcode a VERSE-BLOCK element from Org to HTML.
CONTENTS is verse block contents. INFO is a plist holding
contextual information."
(format "<p class=\"verse\">\n%s</p>"
;; Replace leading white spaces with non-breaking spaces.
(replace-regexp-in-string
"^[ \t]+" (lambda (m) (org-html--make-string (length m) " "))
;; Replace each newline character with line break. Also
;; remove any trailing "br" close-tag so as to avoid
;; duplicates.
(let* ((br (org-html-close-tag "br" nil info))
(re (format "\\(?:%s\\)?[ \t]*\n" (regexp-quote br))))
(replace-regexp-in-string re (concat br "\n") contents))))) |
This'll do the trick
A little naive, as it replaced all spaces with nbsp's but that's because I couldn't come up with a better solution due to EDIT Needs some more work, just realized there are instances of verse blocks that have n+1 children (ie verse blocks with superscript). |
This one properly interacts with all children of a verse block: case "verse-block":
const interleave = (a, e) => a.flatMap((x) => [x, e]).slice(0, -1);
const verses = org.children.flatMap((n) =>
n.type != "text"
? n
: interleave(
n.value
.split("\n")
.map((v) => {
if (v != "") {
const value = v.replaceAll(" ", "\u00A0");
return { type: "text", value };
}
return null;
})
.filter((v) => v != null),
h("", "br", {}, [])
)
);
return h(org, "p.verse", {}, toHast(verses)); |
Just looked into this again and it seems to be even more complicated than that. @ispringle your last solution fails on nested children with newlines:
this should produce 2 nbsp's before "bold": <p class="verse">
some text <b>and<br />
  bold</b> overflowing on the next line<br />
</p> In uniorg, that example currently parses as: - type: "verse-block"
children:
- type: "text"
value: " some text "
- type: "bold"
children:
- type: "text"
value: "and\\n bold"
- type: "text"
value: " overflowing on the next line\\n" So the processing probably needs to happen in two passes:
|
I was a bit surprised when my verse blocks were rendering as
<pre>
. I looked into the source and saw your annotations as to the why:uniorg/packages/uniorg-rehype/src/org-to-hast.ts
Line 302 in c6fdf0d
This could be /mostly/ resolved with CSS but there are going to be issues still. For example, superscript doesn't render correctly but instead results in
^superscript
.I looked into the rehype-minify that you mentioned and it's doing a fairly naive (arguably on purpose) replace and using a fairly basic definition of whitespace (https://github.com/rehypejs/rehype-minify/blob/1dc9280c341087a40dfaa332792c095f96d41686/packages/rehype-minify-whitespace/index.js#L286). In this case it's looking for literal spaces, tabs, newlines, and carriage returns. With regard to spaces, arguably the only whitespace that /really/ matters for our purposes, it's only looking for the literal
space. Additionally, the definition of "whitespace" it searches for in the HAST is equally naive (again, likely purposefully so) and is only looking for the regular expression
when it comes before or after an element, ie
/[ \t\n\f\r]/g
(https://github.com/syntax-tree/hast-util-whitespace/blob/3c765ef9b3fc561976649b97543498cfa7068760/index.js#L16) Thus, I think we can do exactly what org-publish does and wrap each verse block in it's own<p>
, replace spaces with the non-breaking space (
), and replace newlines with<br>
. I examined the rehype-minify plugin and it also will not removeThis space at the end of this string " <br> "is preserved because it comes before an element"
.I believe this means we can perfectly replicate Org's own output for the verse block without fear of rehype-minify changing the output. Thoughts? If you agree with my understanding of these two rehype plugins, I would be willing to start working on a PR.
The text was updated successfully, but these errors were encountered: