Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection build: only presentation XML; incorrect documents-inline directive & filerefs #200

Open
strogonoff opened this issue Oct 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@strogonoff
Copy link
Contributor

strogonoff commented Oct 11, 2024

  • I see that collection.presentation.xml specifies documents-inline directive (three of them!), but documents are not inline. As of now I prefer them not inline, but in any case it seems to mean the directive is wrong.

    Screenshot 2024-10-11 at 23 37 47
  • Those non-inlined documents are missing. I see entry elements with fileref. However, each fileref points to a nonexistent file (e.g., sources/001-v4/document.xml, but there are no document XML files being generated).

    Is the intent that the user would combine artifacts from different builds and then filerefs would be correct?

  • It would be also nice to output both semantic & presentation collection & document XML files in one build, if possible🤔

@opoudjis opoudjis self-assigned this Oct 14, 2024
@opoudjis opoudjis added the bug Something isn't working label Oct 14, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Oct 14, 2024
@strogonoff
Copy link
Contributor Author

strogonoff commented Oct 14, 2024

Just shows that whatever MN XML does integrity of refs still has to be verified afterwards

@opoudjis
Copy link
Contributor

I've just compiled this.

Those non-inlined documents are missing. I see entry elements with fileref. However, each fileref points to a nonexistent file (e.g., sources/001-v4/document.xml, but there are no document XML files being generated).

Quite literally impossible. I have done a recent optimisation that, if the sources/001-v4/document.xml file is already there, then it is not recompiled, and if the file reference in the manifest is to sources/001-v4/document.adoc (which in fact it IS, you are wrong about the reference being to XML), the Asciidoc source file is also not recompiled to XML: you will need to remove the sources/001-v4/document.xml file to force recompilation, or else insert the directive - recompile-xml into the manifest.

Is the intent that the user would combine artifacts from different builds and then filerefs would be correct?

No, the intent is that the metanorma gem does combine them, and it does.

I'm clearly going to have to make you screenshare your directory, because there is no way those files are not there. The collection is compiled out of XML files.


We have:

  • sources/001-v4: compiles
    • sources/001-v4/document.xml, and sources/001-v4/document.xml.*.xml (sectionsplit)
    • tmp_document.xml and tmp_document.presentation.xml (resolving references)
    • document.xml.*.{xml,presentation.xml,html} -- moved to _site after compilation is complete
  • sources/002-v4: compiles
    • sources/002-v4/document.xml, and sources/002-v4/document.1.xml.*.xml (sectionsplit)
    • tmp_document.1.xml and tmp_document.1.presentation.xml (resolving references)
    • document.1.xml.*.{xml,presentation.xml,html} -- moved to _site after compilation is complete

So the compilation is:

  • Individual Semantic XML documents >
  • Individual Presentation XML documents (containing embedded Semantic XML) >
  • Sectionsplit Presentation XML documents, one per clause (each containing its corresponding embedded Semantic XML)
  • Final concatenation

Multiple iterations of the XML file are generated, and some of them end up being generated in /tmp , as you can see from the PDF compilations:

java -Xss10m -Xmx3g -Djava.awt.headless=true -Dapple.awt.UIElement=true -Duser.home=/Users/nickn -jar /Users/nickn/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/mn2pdf-1.99/bin/mn2pdf.jar --xml-file "/var/folders/st/02jjnpwd1bz15gp_7m7fm1lr0000gs/T/document.1.xml.020241015-80598-33bxak.xml" --xsl-file "/Users/nickn/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/metanorma-plateau-0.1.8/lib/isodoc/plateau/plateau.international-standard.xsl" --pdf-file "/Users/nickn/Documents/Arbeit/upwork/ribose/mn-samples-plateau/document.1.xml.0.pdf" --param baseassetpath="/Users/nickn/Documents/Arbeit/upwork/ribose/mn-samples-plateau" --syntax-highlight  --font-manifest "/var/folders/st/02jjnpwd1bz15gp_7m7fm1lr0000gs/T/fontist_locations20241015-80598-bbb3id.yml"

There is no concatenated Semantic XML collection.xml file, and there is no point in there being one. The concatenation concerns Presentation XML, and that is what gets concatenated—in order to generate PDF and DOC, and (in memory) in order to generate the HTML.

I continue to maintain that if you want to parse the monolithic collection.presentation.xml instead of the individual document.*.presentation.xml I am passing to isodoc, that you are begging for disaster. I process and break down 250 MB of XML (of which half is Semantic and half is Presentation) into 45 documents. There is no good reason for you to replicate that breaking up, apart from Not-Invented-Here syndrome.


Now, the documents-inline directive is indeed being ignored for the collection.presentation.xml. As a result, the collection.pdf is failing to be generated.

I have no idea why, and I will have to investigate now. But if anything, I would be disinclined to fix this at all. A collection.pdf based on a 250 MB XML document, when the individual sectionsplit documents are already being generated in HTML and PDF, would be completely useless, and a criminal waste of compilation time and resources, which should absolutely not be done unless explicitly requested by the user. (In fact, I wouldn't be surprised if sectionsplit is preempting the generation of the monolithic Collection XML.)

@opoudjis
Copy link
Contributor

Confirmed in a toy example: I am declining to populate full concatenated document if all documents in the collection are sectionsplit, but not if one of them isn't. Investigating why, it isn't apparent from the code, which means it may be a bug (although as I've said, once I'll consider keeping as default behaviour).

@opoudjis
Copy link
Contributor

... Yes: only files that we know have generated output are included in concatenation, but the list of files being iterated through to generate the concatenation is the old, pre-sectionsplit list, not the new, post-sectionsplit list. The intact documents do not have output, because of sectionsplit, so they are ignored in concatenation.

That is in fact a feature not a bug in my book, because sectionsplit defeats the purpose of generating a single monolithic XML and PDF document.

The correct behaviour is that, if any of the documents have been sectionsplit, concatenate all documents (including the sectionsplit document) into a single monolithic document, only if explicitly requested by a directive concatenate-sectionsplit. Otherwise we're compiling a huge monolith that may well break, for no good reason, and we're protracting overall generation time when it takes far too long already.

The correct behaviour for YOU, @strogonoff, is if you want to get huge a collection.presentation.xml document containing all of Plateau (and you're going to need to find out the hard way why you don't), then remove the sectionsplit: true directives from collection.yml, and rerun compilation of the collection.

I'm moving this to the bottom of the high priority queue.

@opoudjis opoudjis moved this from 🏗 In progress to 🏔 High priority in Metanorma Oct 15, 2024
@opoudjis opoudjis moved this from 🏗 In progress to 🏔 High priority in Metanorma Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🏔 High priority
Development

No branches or pull requests

2 participants