-
Is it currently possible to render pdf pages concurrently instead of a pdf_pages = pdfium.PdfDocument("file.pdf")
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
pool.map(myRenderFunc, pdf_pages) |
Beta Was this translation helpful? Give feedback.
Answered by
mara004
May 26, 2024
Replies: 2 comments 1 reply
-
Yeah it is. Take a look at
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
mara004
-
Is something like this okay? def render_page(page_index, input_file, render_kwargs):
pdf = pdfium.PdfDocument(input_file)
page = pdf.get_page(page_index)
bitmap = page.render(**render_kwargs)
# do whatever you want with the image
def worker_initializer(input_file, render_kwargs):
global worker_data
worker_data = {
"input_file": input_file,
"render_kwargs": render_kwargs
}
def worker_task(page_index):
input_file = worker_data["input_file"]
render_kwargs = worker_data["render_kwargs"]
render_page(page_index, input_file, render_kwargs)
def main():
input_file = "file.pdf"
pdf = pdfium.PdfDocument(input_file)
num_pages = len(pdf)
render_kwargs = {
"scale": 1,
"grayscale": True,
# ...
}
with multiprocessing.Pool(processes=4, initializer=worker_initializer, initargs=(input_file, render_kwargs)) as pool:
pool.map(worker_task, range(num_pages)) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yeah it is. Take a look at
src/pypdfium2/_cli/render.py
for a sample implementation. Key notes:PdfDocument
orPdfPage
objects)fork
multiprocessing strategy, it has unpredictable stability issues. Usespawn
or maybeforkserver
instead.