Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTML autogeneration #580

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

windymilla
Copy link
Collaborator

This does not do everything that a PPer needs to consider when creating the HTML file, but it attempts to do the basic autogeneration of the HTML in the same way as GG1. There will usually be further work to be done, especially of the frontmatter, tables, illustrations, etc., which really require human input. Under most circumstances the output should be valid HTML, but if there are two chapters with the same number/title, (e.g. if a project contains two books in one document) it is possible to get duplicate IDs, which again will need human intervention.

Tasks done by autogeneration:

  1. Trim trailing spaces
  2. Adjust pagemark positions so they don't come mid-word
  3. Convert entities such as ampersands
  4. Convert the main body, spotting headings, block markup, poetry indents, etc.
  5. Convert inline markup (e.g. using the dialog, you can have <i> converted to <em class="italic">
  6. Convert smallcaps/allcaps
  7. Convert footnotes & sidenotes
  8. Add page anchors/page numbers
  9. Add chapter divs around headings to support ebookmaker conversion
  10. Wrap very long HTML lines

Unlike GG1, I think you can use Undo fairly safely after HTML generation.

@windymilla windymilla requested a review from srjfoo December 18, 2024 14:12
@windymilla windymilla linked an issue Dec 18, 2024 that may be closed by this pull request
@rtonsing
Copy link

Autogenerated TOC needs '#' before IDs.

@rtonsing
Copy link

This is current behavior in GG1, and just my opinion, but the /* */ markup insertion of
<span style="margin-left: x.xem;">
is useless because they are usually tables that I will reformat anyway, so I delete them.

@rtonsing
Copy link

rtonsing commented Dec 18, 2024

Overall very nice, passed W3C check, and pphtml only reported "missing h1 element".

Is there a way to designate the h1 text? Using h1 markup doesn't work.

@windymilla
Copy link
Collaborator Author

Autogenerated TOC needs '#' before IDs.

Thanks - I'll take a look at that

@windymilla
Copy link
Collaborator Author

windymilla commented Dec 18, 2024

This is current behavior in GG1, and just my opinion, but the /* */ markup insertion of <span style="margin-left: x.xem;"> is useless because they are usually tables that I will reformat anyway, so I delete them.

I agree, at least as far as my own PPing is concerned. However, I would prefer for now to mimic the GG1 behavior, and only consider removing it in future if there is substantial support for such a move. It's possible that in the same way the PPer can change /& to /P to get poetry markup instead of default /* behavior, that we might consider if a /T (for Table) markup might be useful. I'm not sure how much, if any help it could be, but it could be worth investigating.

For tables, it sounds like you'd probably prefer it if you changed the /.../ to /X...X/, which just puts a <pre> at the start/end of the table, which is easily deleted before you do the manual work of coding the table.

@windymilla
Copy link
Collaborator Author

Is there a way to designate the h1 text? Using h1 markup doesn't work.

Good catch - I've neglected to code that. I think GG1 assumes the first bit of text that isn't an illo, etc. is the title. I'm extracting that in GG2 to display in the dialog "title" field and the HTML header, but forgot to add the h1 markup in. Thanks.

@windymilla
Copy link
Collaborator Author

Pushed commit to fix points raised in @rtonsing's review

1. Convert body - handles block markup
If user has copy of header.txt in GGprefs, then if it is
a full header, just use it, but if it's not, then insert it
at the end of the default header.
Replace with em or span, or keep as marked up,
based on radio buttons in dialog.
Replace TITLE in header with the best guess at the
title of the book.
Replace BOOKLANG with the main language code.
If not, then spans/anchors are still inserted in HTML,
but no text is shown.
If pagenum span is not already within a paragraph,
then enclose it in paragraph markup.
Improve pagemark positions early so they don't
get caught just inside the end of a para.
Then the chapter div can enclose them if appropriate.
Especially in indexes with lots of linked page numbers.

Wrap at a space after a close tag, not after an open
tag.
Also add `#` before auto-toc link IDs.

Discovered in @rtonsing's review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add HTML autogeneration
2 participants