-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need the ability to blacklisting/whitelisting of characters #988
Comments
Thanks for opening the issue, let's first focus the discussions in #888 to avoid duplicates :) |
That's a great one too. Again, callback idea might shine here, since you can boost/deboost some characters, not completely blacklist them. For example, I often see 1 recognized as l or Q as 0/O, or : as 0. I'd like to avoid complete blacklist, but if I can, for example, boost and prioritize digits over letters or : over . it can solve the problem. |
A short example (with master) how we could reach this:
This would take the next "char" with the highest prob @SlappyAUS is this what you have had in mind ? For example:
|
What if we make it a bit broader and instead provide an option to multiply the logits to some weight. |
@dchaplinsky Mhhh this would need any mapping to "close" characters ? 🤔 |
Hey everyone 👋 The use case here is to help out the text recognition part when you have more info about a subvocab, so now we need to assess whether that's worth addressing (I think it would be useful), what would be the API, and at which step this should take place. My two cents:
Here is my suggested design for blacklist: import torch
from doctr.models import crnn_vgg16_bn
blacklisted_chars = {str(num) for num in range(10)}
# Set the mask
vocab_mask = torch.tensor((0 if char in blacklisted_chars else 1 for char in vocab), dtype=torch.float32)
model = crnn_vgg16_bn(pretrained=True, vocab_mask=vocab_mask)
input_tensor = torch.rand(1, 3, 32, 128)
out = model(input_tensor) and whitelist: whitelisted_chars = {str(num) for num in range(10)}
vocab_mask = torch.tensor((1 if char in whitelisted_chars else 0 for char in vocab), dtype=torch.float32) What do you think? |
Discussed in #888
Originally posted by Xargonus April 7, 2022
Hello, is there currently a way to blacklist or whitelist characters used by the text recognition model?
The text was updated successfully, but these errors were encountered: