You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.
InvalidType:
Invalid operation is performed in: Reshape (Forward)
Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0
Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)
The text was updated successfully, but these errors were encountered:
No, you can not predict the content of multiple words per image without retraining.
The code could be used to do exactly that, but the text recognition model provided by us does things a little different than you might think.
It is configured to predict the content of one word with a maximum of 23 characters.
But it actually does it the other way round. We predict the locations of 23 words (each with one character) and then we assume that each word actually belongs to only one word (this is the conceptual level!). We can then put our one word with 23 characters into the transformer and predict the textual content.
You can, however, predict x words with max 23 characters, but you'll need to retrain the model for this, since the current model is not made for something like that.
We are not using spaces, since there are no words in the benchmark datasets that include spaces. You could add a space character to the char_map, but you'll need to retrain the model with enough data that also contains spaces. I'm sorry but this is one of the flaws of deep learning :/
Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.
Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)
The text was updated successfully, but these errors were encountered: