-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid ValueError: substring not found #65
base: master
Are you sure you want to change the base?
Conversation
in my case, substring was not found because ans were padded (like Ans Entity). Strangly, this error was only encountered when I do this using jupyter, when I do it from terminal, no such error was found. |
Yes please! I have found the same error but hadn't fully worked out why just yet. Here is a minimal example: from pipelines import pipeline
# load in the multi task qa qg
MODEL = pipeline("multitask-qa-qg")
# problem text
text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.'
MODEL(text) Full stack trace: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-59-1ab007d28390> in <module>()
7 text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.'
8
----> 9 MODEL(text)
2 frames
/content/question_generation/pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers)
140 answer_text = answer_text.strip()
141
--> 142 ans_start_idx = sent.index(answer_text)
143
144 sent = f"{sent[:ans_start_idx]} <hl> {answer_text} <hl> {sent[ans_start_idx + len(answer_text): ]}"
ValueError: substring not found |
In this specific case, I found out that the error occurred because in sentence "Researchers need to do more studies before they can confirm the health benefits of stinging nettle.", the generated answer is "Do more studies" instead of "do more studies", in ans_start_idx = sent.index(answer_text) (line 142), this index function is case-sensitive, so indexing "Do more studies" will give you this value error. Since the T5 model is uncased anyway, a simple solution would be replacing line 137 and line 140 in pipelines.py respectively with: This should solve your problem :) |
The error was mainly because of the occurrence of the "<pad>" token at the beginning of some answers. Due to which the index of the answer couldn't be found in "sent". So I added the following line at 141 to remove the token from the answer:
Post this addition, the code has been working on all the example that I've seen so far. Cheers! |
in some cases, answers can't be found in the input text and ValueError would appear, add try except to avoid such errors.