<img src=“https://codeclimate.com/github/CV-Gate/search_for_text_into_pdfs.png” />
This application is a simple example about how a text search can be done into pdf documents. It works with Sphinx and the pdf-reader gem.
-
Install Sphinx
-
Into the app configure the database connection (Sphinx only will work with MySQL or PostgreSQL)
-
Execute rails s and upload some PDFs
-
Run
rake ts:index
andrake ts:start
-
Run
whenever --update-crontab pdf_index
to start the cron job that reindex the records -
You can also configure the cron job in Rails, now it’s working each minute for testing purposes
The app stores texts into DB. The limit for MySQL is 4294967295 characters, so biggest PDFs will trim while storing. It’s also possible that the DB server will throw a time-out.
-
Validate texts on size (perhaps too expensive)
-
Write some tests
-
Some refactor