A pdf to text wrapper to extract text from a pdf. It works with searchable and non-searchable(images) PDFs
npm install text-from-pdf
brew install poppler
sudo apt-get update && sudo apt-get install poppler-utils
No installation required
- Standard Input PDF with horizontally aligned text:
const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>'); console.log(text)
- Input PDF's with vertically aligned text:
const options = { rotationDegree: -90, }; $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options); $ console.log(text)
- Text from first and second page:
const options = { firstPageToConvert: 1, lastPageToConvert: 2, }; $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options); $ console.log(text)
- Text from third to fifth page:
const options = { firstPageToConvert: 3, lastPageToConvert: 5, }; $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options); $ console.log(text)
- Enable Progressbar logging:
const options = { firstPageToConvert: 1, lastPageToConvert: 1, enableProgressBarLogging: true }; $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options); $ console.log(text)
Fork, add your changes and create a pull request