Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is candidate generation important during inference? #37

Open
vishal-nayak1 opened this issue Nov 28, 2022 · 7 comments
Open

Is candidate generation important during inference? #37

vishal-nayak1 opened this issue Nov 28, 2022 · 7 comments

Comments

@vishal-nayak1
Copy link

vishal-nayak1 commented Nov 28, 2022

Hi @Praneet9,
Is candidate generation important during inference as for some fields it's difficult to extract text using regex like address, company name, registration number as it keeps on changing over the templates because their pattern changes over the templates.
Also what if I do not give candidates for fields like address, will model be able to predict address field ?

@Praneet9
Copy link
Owner

Think of candidates as a mixture of positive and negative samples. If the model doesn't see any negative examples, it is difficult to differentiate between right and wrong. This is why I personally feel candidates are required for training.
For inference, it should not matter if you have candidates or not.

@vishal-nayak1
Copy link
Author

vishal-nayak1 commented Nov 28, 2022

@Praneet9 thanks for sharing details, I have one doubt in inference file i can see you have used code to generate candidates which is feed to the model as input-
link-

candidates = extract_candidates.get_candidates(ocr_results)

candidates = extract_candidates.get_candidates(ocr_results)
candidates_with_neighbours = attach_neighbour_candidates(width, height, ocr_results, candidates)
annotation = normalize_coordinates(candidates_with_neighbours, width, height)

Model input-
with torch.no_grad():
rlie.eval()
val_outputs = rlie(field_ids, candidate_cords, neighbours, neighbour_cords)

Please clarify it.

Thanks

@Praneet9
Copy link
Owner

I'm passing all the possible candidates that can be the classes I want. The model picks the most relevant one from them.

@vishal-nayak1
Copy link
Author

Yeah but is it necessary? for fields like address, company name, registration name...etc, we cannot easily extract possible candidates using regex , so if i do not pass any candidates for such fields, will model be able to predict address field as well ?

@Praneet9
Copy link
Owner

Here, in inference, we don't know what the actual invoice number is which is why we send all that looks like one.
In your case, you can just send whichever looks like the address even if its the only one, and it should work fine.

@vishal-nayak1
Copy link
Author

Okay but then in such case the model is not actually extracting fields, something like extracting address from paragraph of text, it just ranking based on of our possible input candidates.I think generating possible candidates for some fields like address, registration_number is itself challenging.

@Praneet9
Copy link
Owner

This is a binary model that can just return True or False to the candidates you pass in and is not meant to do what you are asking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants