Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the num_words_per_image without training again #13

Open
vamsiadari95 opened this issue May 26, 2020 · 1 comment
Open

Change the num_words_per_image without training again #13

vamsiadari95 opened this issue May 26, 2020 · 1 comment

Comments

@vamsiadari95
Copy link

Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.

InvalidType: 
Invalid operation is performed in: Reshape (Forward)

Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0

Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)

@Bartzi
Copy link
Owner

Bartzi commented May 27, 2020

No, you can not predict the content of multiple words per image without retraining.
The code could be used to do exactly that, but the text recognition model provided by us does things a little different than you might think.

It is configured to predict the content of one word with a maximum of 23 characters.
But it actually does it the other way round. We predict the locations of 23 words (each with one character) and then we assume that each word actually belongs to only one word (this is the conceptual level!). We can then put our one word with 23 characters into the transformer and predict the textual content.

You can, however, predict x words with max 23 characters, but you'll need to retrain the model for this, since the current model is not made for something like that.

We are not using spaces, since there are no words in the benchmark datasets that include spaces. You could add a space character to the char_map, but you'll need to retrain the model with enough data that also contains spaces. I'm sorry but this is one of the flaws of deep learning :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants