PyMuPDF on Discord #1766

jamie-lemon · 2022-06-20T16:55:17Z

jamie-lemon
Jun 20, 2022
Maintainer

We now have a dedicated public Discord channel for discussing PyMuPDF!

Please find it here: #pymupdf

We look forward to supporting your questions! ( and who knows perhaps you can give us a few answers too 😉 )

oliviaharman · 2023-09-28T16:29:34Z

oliviaharman
Sep 28, 2023

Hello (from the discord server), the PDF is still too large to upload, and I am unsure how to upload the script since ipynb files are not supported. Any suggestions?
Thanks.

1 reply

JorjMcKie Sep 28, 2023
Maintainer

Hi I am the same guy, just using a synonym.
You could Zip the ipynb and then attach it. Same problem for every executable file.
Otherwise ...

JorjMcKie · 2023-09-28T16:32:58Z

JorjMcKie
Sep 28, 2023
Maintainer

Please simply attach the PDF here, or share a URL or even send it to my private e-mail: [email protected].
More than from your code, I am expecting insights from the PDF itself ...

5 replies

oliviaharman Sep 28, 2023

I will send it to your email since I am still having some trouble.
Thank you for being so patient with me.

JorjMcKie Sep 28, 2023
Maintainer

You are welcome - no problem at all.

JorjMcKie Sep 28, 2023
Maintainer

received the link etc, ... dowloading now

JorjMcKie Sep 28, 2023
Maintainer

Half a TB - I understand your comment on the size now 😉- problems with any specific image on some page or just all or many?

oliviaharman Sep 28, 2023

Glad you got to take a look at it! The problem with the inverted colors is on all pages, except for the cover page.

JorjMcKie · 2023-09-28T18:06:18Z

JorjMcKie
Sep 28, 2023
Maintainer

Here is a working script:

import fitz

xrefs = set()  # prevent multiple processing for same image
doc = fitz.open("test.pdf")
for i in range(doc.page_count):  # iterate over pages
    if i > 0 and i % 50 == 0:  # some entertainment messages
        print(f"Processing page {i}")
    for item in doc.get_page_images(i):  # iterate over images on this page
        # Note we do not load any pages - as we would when using page.get_imges()
        xref = item[0]  # the image xref
        if xref in xrefs:  # skip if done earlier already
            continue
        xrefs.add(xref)
        pix = fitz.Pixmap(doc, xref)
        if pix.colorspace == None:  # skip "mask" images
            continue
        if pix.n not in (1, 3):  # if neither Gray nor RGB, convert
            pix = fitz.Pixmap(fitz.csRGB, pix)
        try:  # things may still go wrong
            pix.save(f"page-{i}-{xref}.png")
        except RuntimeError:  # problems may occurs for special Gray pixmaps
            if pix.n == 1:  # convert one more time
                pix = fitz.Pixmap(fitz.csGRAY, pix)
                pix.save(f"page-{i}-{xref}.png")

Comments:
Original images are mostly JPEG in CMYK format. As you wrote: storing them as is shows inverted colors. I am avoiding a long analysis and simply convert them all to either Gray or RGB and then save as PNG (smaller anyway).

3 replies

JorjMcKie Sep 28, 2023
Maintainer

@oliviaharman For future questions or contributions, please do not hesitate to open your own Discussion item (or issue where applicable), instead of editing this post.

BTW: very nice pictures and PDF!

oliviaharman Sep 28, 2023

This is great - even more than I expected. Thank you so much for your help. I will not hesitate to reach out. Thanks again

JorjMcKie Sep 28, 2023
Maintainer

You are welcome!
I am glad to see you enjoy working with PyMuPDF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyMuPDF on Discord #1766

{{title}}

Replies: 3 comments 9 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

PyMuPDF on Discord #1766

jamie-lemon Jun 20, 2022 Maintainer

Replies: 3 comments · 9 replies

oliviaharman Sep 28, 2023

JorjMcKie Sep 28, 2023 Maintainer

JorjMcKie Sep 28, 2023 Maintainer

oliviaharman Sep 28, 2023

JorjMcKie Sep 28, 2023 Maintainer

JorjMcKie Sep 28, 2023 Maintainer

JorjMcKie Sep 28, 2023 Maintainer

oliviaharman Sep 28, 2023

JorjMcKie Sep 28, 2023 Maintainer

JorjMcKie Sep 28, 2023 Maintainer

oliviaharman Sep 28, 2023

JorjMcKie Sep 28, 2023 Maintainer

jamie-lemon
Jun 20, 2022
Maintainer

Replies: 3 comments 9 replies

oliviaharman
Sep 28, 2023

JorjMcKie Sep 28, 2023
Maintainer

JorjMcKie
Sep 28, 2023
Maintainer

JorjMcKie Sep 28, 2023
Maintainer

JorjMcKie Sep 28, 2023
Maintainer

JorjMcKie Sep 28, 2023
Maintainer

JorjMcKie
Sep 28, 2023
Maintainer

JorjMcKie Sep 28, 2023
Maintainer

JorjMcKie Sep 28, 2023
Maintainer