Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text objects in example article are not tagged #12

Open
juh2600 opened this issue Aug 19, 2019 · 12 comments
Open

Text objects in example article are not tagged #12

juh2600 opened this issue Aug 19, 2019 · 12 comments
Assignees
Labels
strucPDF transferred from old `strucPDF`package. Needs to be checked against accessibility

Comments

@juh2600
Copy link

juh2600 commented Aug 19, 2019

Describe the bug
Text objects in the example article are not tagged, resulting in a failed accessibility check.

To Reproduce
Steps to reproduce the behaviour:

  1. Obtain the free PDF Accessibility Checker from Zugang für alle. This issue was checked with PAC 3.
  2. Open the example article PDF in the checker.
  3. Below the listing of checkpoints, select "Results in Detail" to see a breakdown of the untagged objects.

Expected behavior
Text objects in a PDF compiled with this package should probably be tagged; this feels to me within the scope of the project.

Log messages
article_PAC_Report.pdf
TextObjectNotTagged

Additional notes
This is not isolated to the example article; I discovered this while working on my own document.

I appreciate the work you've put into this project! If there's anything I can do to help, I'd be glad to. I'm by no means an expert on LaTeX or PDF, but I've been studying the incantations lately for my accessibility projects, and I'm quite eager to see this work as well as it can. I'd rather spend a week in vi writing LaTeX than an hour in Acrobat.

@juh2600 juh2600 changed the title Example article's Text objects in example article are not tagged Aug 19, 2019
@AndyClifton AndyClifton transferred this issue from another repository Sep 8, 2019
@AndyClifton AndyClifton added the strucPDF transferred from old `strucPDF`package. Needs to be checked against accessibility label Oct 13, 2019
@AndyClifton
Copy link
Owner

@josephreed2600 - thanks for the bug report and offering to help. I'm just getting back in to this project after some distractions and will start putting together a roadmap soon. I'll get back to you when I see how this fits and what might be required to ship something useful.

@AndyClifton AndyClifton self-assigned this Nov 3, 2019
@viktoriasee
Copy link
Collaborator

viktoriasee commented Feb 10, 2020

I do not see any tags generated on a simple dummy file:

\documentclass{scrreprt}
\usepackage{accessibility}

\begin{document}
text.
\end{document}

Although it runs without errors or warnings in pdftex.

So it's not just an issue with the example file. It's not working at all.

@AndyClifton
Copy link
Owner

AndyClifton commented Feb 10, 2020 via email

@viktoriasee
Copy link
Collaborator

I hadn't because I read the manual p. 5/6:

Gibt man keine Optionen an, so wird ein PDF mit den Standardoptionen erzeugt. D. h. es wird Tagged PDF mit einer geschachtelten Struktur erzeugt.

Indeed, when I use \usepackage[tagged]{accessibility} I get a PDF with a tag. I think the documentation should win here. But even this minimal example does not produce accessible pdf:
pac3-latex-accessibility-minimal

@viktoriasee
Copy link
Collaborator

viktoriasee commented Feb 11, 2020

Although when tagging is on and the tags are visible in Acrobat, we get this error in PAC "Tagged content and artifacts" for the very exact content that is tagged.

@AndyClifton
Copy link
Owner

Hm. So with a comparable document from another source (e.g. MS Word), does the error still occur / get flagged by PAC?

I'm interested in whether this is a problem from latex or something else.

@viktoriasee
Copy link
Collaborator

viktoriasee commented Feb 11, 2020

In short: no
A MS word created pdf with the content "Text." and file property title<> empty validates in PAC except from the PDF/UA metadata. No other errors.
Text.pdf
Text.docx

@AndyClifton
Copy link
Owner

Ok, thanks. Could you upload the word document (attach to the comment) for comparison, please? Thanks!

@viktoriasee
Copy link
Collaborator

I think I have one more hint on this. When I open my minimal example in Acrobat, open the tags tab, click on a content container the correct paragraph text is highlighted with a blue frame:
missing content
However, in a normal document you would see the content. See the same pdf again after I disable accessibility and add the tags automatically in Acrobat:
content there
Does that ring a bell?

@AndyClifton
Copy link
Owner

So... looking at the MWE from@viktoriasee, I see two things:

  1. In the MWE generated using latex there is a highest-level "Document" branch in the PDF that shouldn't be there.
  2. In the MWE generated using latex there is no content in the <p> container.

This gives us some places to look.

@viktoriasee
Copy link
Collaborator

viktoriasee commented Jun 5, 2020

I agree with 2. But the «Document» master tag is fine. It's one of the few things where accessibility does a better job than Acrobat.
PDF/UA checker PAC3 complains if there is no such master tag.

@radinamatic
Copy link

Yes, please keep the top-level Document tag for PDF/UA checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strucPDF transferred from old `strucPDF`package. Needs to be checked against accessibility
Projects
None yet
Development

No branches or pull requests

4 participants