PyMuPDF on Discord #1766
Replies: 3 comments 9 replies
-
Hello (from the discord server), the PDF is still too large to upload, and I am unsure how to upload the script since ipynb files are not supported. Any suggestions? |
Beta Was this translation helpful? Give feedback.
-
Please simply attach the PDF here, or share a URL or even send it to my private e-mail: [email protected]. |
Beta Was this translation helpful? Give feedback.
-
Here is a working script: import fitz
xrefs = set() # prevent multiple processing for same image
doc = fitz.open("test.pdf")
for i in range(doc.page_count): # iterate over pages
if i > 0 and i % 50 == 0: # some entertainment messages
print(f"Processing page {i}")
for item in doc.get_page_images(i): # iterate over images on this page
# Note we do not load any pages - as we would when using page.get_imges()
xref = item[0] # the image xref
if xref in xrefs: # skip if done earlier already
continue
xrefs.add(xref)
pix = fitz.Pixmap(doc, xref)
if pix.colorspace == None: # skip "mask" images
continue
if pix.n not in (1, 3): # if neither Gray nor RGB, convert
pix = fitz.Pixmap(fitz.csRGB, pix)
try: # things may still go wrong
pix.save(f"page-{i}-{xref}.png")
except RuntimeError: # problems may occurs for special Gray pixmaps
if pix.n == 1: # convert one more time
pix = fitz.Pixmap(fitz.csGRAY, pix)
pix.save(f"page-{i}-{xref}.png") Comments: |
Beta Was this translation helpful? Give feedback.
-
We now have a dedicated public Discord channel for discussing PyMuPDF!
Please find it here: #pymupdf
We look forward to supporting your questions! ( and who knows perhaps you can give us a few answers too 😉 )
Beta Was this translation helpful? Give feedback.
All reactions