Skip to content

Commit

Permalink
Remove PyMuPDF
Browse files Browse the repository at this point in the history
  • Loading branch information
dagardner-nv committed Aug 23, 2024
1 parent 79bf91d commit 3363d82
Show file tree
Hide file tree
Showing 5 changed files with 1 addition and 12 deletions.
1 change: 0 additions & 1 deletion conda/environments/all_cuda-121_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,6 @@ dependencies:
- pip:
- --find-links https://data.dgl.ai/wheels-test/repo.html
- --find-links https://data.dgl.ai/wheels/cu121/repo.html
- PyMuPDF==1.23.*
- databricks-cli < 0.100
- databricks-connect
- dgl==2.0.0
Expand Down
1 change: 0 additions & 1 deletion conda/environments/dev_cuda-121_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,6 @@ dependencies:
- yapf=0.40.1
- zlib=1.2.13
- pip:
- PyMuPDF==1.23.*
- databricks-cli < 0.100
- databricks-connect
- milvus==2.3.5
Expand Down
1 change: 0 additions & 1 deletion conda/environments/examples_cuda-121_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,6 @@ dependencies:
- pip:
- --find-links https://data.dgl.ai/wheels-test/repo.html
- --find-links https://data.dgl.ai/wheels/cu121/repo.html
- PyMuPDF==1.23.*
- databricks-cli < 0.100
- databricks-connect
- dgl==2.0.0
Expand Down
2 changes: 0 additions & 2 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,6 @@ dependencies:
- &python-docx python-docx==1.1.0
- pip
- pip:
- &PyMuPDF PyMuPDF==1.23.*
- pytest-kafka==0.6.0

example-dfp-prod:
Expand Down Expand Up @@ -420,7 +419,6 @@ dependencies:
- faiss-gpu==1.7.*
- google-search-results==2.4
- nemollm==0.3.5
- *PyMuPDF

model-training-tuning:
common:
Expand Down
8 changes: 1 addition & 7 deletions examples/llm/vdb_upload/module/content_extractor_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from typing import Dict
from typing import List

import fitz
import fsspec
import mrc
import mrc.core.operators as ops
Expand Down Expand Up @@ -171,12 +170,7 @@ def wrapper(input_info: ConverterInputInfo, *args, **kwargs):

@_converter_error_handler
def _pdf_to_text_converter(input_info: ConverterInputInfo) -> str:
text = ""
pdf_document = fitz.open(stream=input_info.io_bytes, filetype="pdf")
for page_num in range(pdf_document.page_count):
page = pdf_document[page_num]
text += page.get_text()
return text
raise NotImplementedError("PDF to text conversion is not implemented.")


@_converter_error_handler
Expand Down

0 comments on commit 3363d82

Please sign in to comment.