Is candidate generation important during inference? #37

vishal-nayak1 · 2022-11-28T04:33:32Z

Hi @Praneet9,
Is candidate generation important during inference as for some fields it's difficult to extract text using regex like address, company name, registration number as it keeps on changing over the templates because their pattern changes over the templates.
Also what if I do not give candidates for fields like address, will model be able to predict address field ?

Praneet9 · 2022-11-28T10:21:07Z

Think of candidates as a mixture of positive and negative samples. If the model doesn't see any negative examples, it is difficult to differentiate between right and wrong. This is why I personally feel candidates are required for training.
For inference, it should not matter if you have candidates or not.

vishal-nayak1 · 2022-11-28T10:31:25Z

@Praneet9 thanks for sharing details, I have one doubt in inference file i can see you have used code to generate candidates which is feed to the model as input-
link-

Representation-Learning-for-Information-Extraction/inference.py

Line 130 in b268463

candidates = extract_candidates.get_candidates(ocr_results)

candidates = extract_candidates.get_candidates(ocr_results)
candidates_with_neighbours = attach_neighbour_candidates(width, height, ocr_results, candidates)
annotation = normalize_coordinates(candidates_with_neighbours, width, height)

Model input-
with torch.no_grad():
rlie.eval()
val_outputs = rlie(field_ids, candidate_cords, neighbours, neighbour_cords)

Please clarify it.

Thanks

Praneet9 · 2022-11-28T10:39:30Z

I'm passing all the possible candidates that can be the classes I want. The model picks the most relevant one from them.

vishal-nayak1 · 2022-11-28T10:45:10Z

Yeah but is it necessary? for fields like address, company name, registration name...etc, we cannot easily extract possible candidates using regex , so if i do not pass any candidates for such fields, will model be able to predict address field as well ?

Praneet9 · 2022-11-28T10:52:56Z

Here, in inference, we don't know what the actual invoice number is which is why we send all that looks like one.
In your case, you can just send whichever looks like the address even if its the only one, and it should work fine.

vishal-nayak1 · 2022-11-28T11:19:28Z

Okay but then in such case the model is not actually extracting fields, something like extracting address from paragraph of text, it just ranking based on of our possible input candidates.I think generating possible candidates for some fields like address, registration_number is itself challenging.

Praneet9 · 2022-11-28T11:28:09Z

This is a binary model that can just return True or False to the candidates you pass in and is not meant to do what you are asking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is candidate generation important during inference? #37

Is candidate generation important during inference? #37

vishal-nayak1 commented Nov 28, 2022 •

edited

Loading

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022 •

edited

Loading

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022

Praneet9 commented Nov 28, 2022

Is candidate generation important during inference? #37

Is candidate generation important during inference? #37

Comments

vishal-nayak1 commented Nov 28, 2022 • edited Loading

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022 • edited Loading

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022

Praneet9 commented Nov 28, 2022

vishal-nayak1 commented Nov 28, 2022 •

edited

Loading

vishal-nayak1 commented Nov 28, 2022 •

edited

Loading