Skip to content

A Python script that compares two PDFs by converting to images and using OpenCV - generates a neat HTML file to view differences side-by-side.

License

Notifications You must be signed in to change notification settings

acornelissen/pdfdiffer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFdiffer

A small proof of concept script that takes two PDF files as input, converts them to images, then uses openCV to find and highlight the differences using combined bounding boxes in red.

It then generates image(s) to show the differences, and an HTML files that shows a side-by-side view.

Note: a difference in number of pages is handled by inserting a blank page on the relevant side of the diff.

Happy to review and accept improvements. Ideally, instead of changes being highlighted all in red, you would want removals in red, additions in green, and any other changes in blue. I might get to that in future, I might not.

Dependencies

  • Poppler
  • pdf2image
  • openCV
  • numpy
  • Pillow
  • BeautifulSoup4

About

A Python script that compares two PDFs by converting to images and using OpenCV - generates a neat HTML file to view differences side-by-side.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published