Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
RobinDavid committed Feb 28, 2014
1 parent 2b0d214 commit d055249
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@ Pytesser
========

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
Article : http://robindavid.comli.com/pytesser-python-wrapper-for-the-tesseract-ocr-engine/


Informations
------------

There is already multiples module called pytesser, but this one is slightly different on the following point:
There is already multiples tesseract python modules, but none of them satisfied me. This one is different on the following point:

* It implement all the features of tesseract engine it includes the choise of the language and the page segmentation mode.
* All the module is contained in one file (the others modules I have tried are quite messy.
* It support OpenCV, so you can directly provide an IplImage to the module.
* All the classes are put in the same file and all inessential class are removed
* Use OpenCV instead of PIL (to really an advantage because PIL as far more widespread, but better fit my needs ;))
* Use subprocess.communicate instead of subprocess.wait to avoid any output in the shell or in the programs that use the module.
* Management of the differents languages via the option '-l' because the original pytesser use the default language which is english. By this way the detection of french for instance is totally inacurrate.
* Management of of the pagesegmode, which allow to modify the behavior of tesseract if we want for instance to detect only one character, a word or a line.
* The code is far more straightforward (my opinion)

How to use it ?
---------------
Expand All @@ -28,4 +31,4 @@ There is to ways to use it. Either you give it a filename, either directly an Ip
Or you can directly give it an IplImage like this:

image = cv.LoadImage("myimage.jpg")
txt = pytesser.iplimage_to_string(image)
txt = pytesser.iplimage_to_string(image)

0 comments on commit d055249

Please sign in to comment.