Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem converting docx to pdf format A-1a #59

Open
hajjoujti opened this issue Mar 3, 2023 · 10 comments
Open

Problem converting docx to pdf format A-1a #59

hajjoujti opened this issue Mar 3, 2023 · 10 comments

Comments

@hajjoujti
Copy link

hajjoujti commented Mar 3, 2023

Thank you for this amazing library.

I have a NestJs application that has a method that communicates with a docker image of gotenberg7 via this js-client.
I need to convert docx files to pdf with an A-1a format.
What I wrote is

private toPDF = pipe(
    gotenberg(''),
    convert,
    office,
    adjust({
      url: `${this.configService.get(
        'GOTENBERG_URL',
      )}/forms/libreoffice/convert`,
      fields: {
        pdfFormat: 'PDF/A-1a',
      } as any,
    }),
    please,
  );

and the method is

async convertToPdf(
    fileName: string,
    uint8Array: Uint8Array,
  ): Promise<NodeJS.ReadableStream> {
    return await this.toPDF({ 'fileName.pdf': Buffer.from(uint8Array) });
  }

This method is called multiple times because it is trying to convert multiple docx files that will be zipped later on.
When I convert to pdf without adding the pdfFormat field, it works with no errors. But when I add pdfFormat: 'PDF/A-1a' the files are not all converted correctly.
I am testing with two files for now and one out of two is damaged and cannot be read. Every time it's a different pdf out of the two that is damaged. Sometimes both are damaged.
I am not getting why I am having this issue since one of the pdf is converted correctly and I am sending the docx twice.

@yumauri
Copy link
Owner

yumauri commented Mar 4, 2023

Hello!
You didn't mentioned that, but I assume that you did try to send both of the files to Gotenberg service directly, and there are no errors, both files being converted correctly?

curl \
  --request POST 'http://localhost:3000/forms/libreoffice/convert' \
  --form 'files=@"/path/to/file.docx"' \
  --form 'pdfFormat="PDF/A-1a"' \
  -o my.pdf

@yumauri
Copy link
Owner

yumauri commented Mar 4, 2023

If you say problem is only with PDF/A-1a format, maybe this is an issue with Gotenberg itself 🤔

Did you try to play with nativePdfFormat field, instead of pdfFormat?
See https://gotenberg.dev/docs/modules/libreoffice

Also, there are four different PDF engines in Gotenberg, did you try to use different? Not sure about it though, documentation says only unoconv supports converting...
See https://gotenberg.dev/docs/modules/pdf-engines

@yumauri
Copy link
Owner

yumauri commented Mar 4, 2023

Also looking through Gotenberg issues, I see few ones that mentioned LibreOffice and docx convertion, you can try to use different version on Gotenberg, lets say 7.4.2

@hajjoujti
Copy link
Author

I did try the nativePdfFormat but it always returns corrupt files.
I didn't try to send directly to the service though. I will try it tomorrow to see if it works.
I will also try to use gotenberg 7.4.2 though I had some issue to create a docker container on a different port other than 3000. The default port 80 does not seem to connect with this library. I was always getting 404 Not found error.
Thanks for your fast reply. I will keep you updated as soon as possible.

@yumauri
Copy link
Owner

yumauri commented Mar 5, 2023

The default port 80 does not seem to connect with this library

I guess it depends on your environment. Maybe you already have other service, listening port 80, for example nginx or your NestJS application. You can use any other free port between 1025 and 65535, just make sure Gotenberg listens that port.

@hajjoujti
Copy link
Author

So I tested with the curl command and it is working perfectly and all the pdf are not corrupted.
I tried to change the version, I even tried the gotenberg of thecodingmachine but still the same problem.
So I used axios to send form-data containing the docx with the desired pdfFormat and now it's working.

@yumauri
Copy link
Owner

yumauri commented Mar 6, 2023

I got it, then I have to ask for more details, please :)

First of all, can you create repository with minimal reproduce example, with docx files?
If those files are not private, of course.

Do you run method convertToPdf sequentaly or in parallel for multiple files?

Can you try to replace built-in HTTP client with axios and check if it is working? Something like that (in toPDF):

gotenberg('', {
  post(url, data, headers) {
    return axios({
      method: 'post',
      url,
      data,
      headers: { ...headers },
    })
  }
})

(not sure it will work, axios documentation says it accepts FormData only in browser, but I could check it only tomorrow)

@hajjoujti
Copy link
Author

I am not sure I will have time to do it cause this week is my last on the current mission and I am wrapping things up at the moment.
If I got extra time sure I will try that.

@yumauri
Copy link
Owner

yumauri commented Mar 7, 2023

Maybe you can give me .docx files at least 😅 And I'll try to play with them

@hajjoujti
Copy link
Author

there is no specific docx file. any docx file would work. The tests I was making were sample docx file I made via Microsoft word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants