-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError for some texts using t5-small #22
Comments
I'm having trouble reproducing the exact error, but I suspect it's related to the 512 token limit of T5, which is shared with the input text and the command to T5. In general, it's not a good idea to put more than a single sentence at a time into the |
Your point about distribution shift aside, I don't think it can be text length. Firstly because:
This is way under the T5 input limit. Secondly because:
In other words, It may be helpful to know that I get the same error in the same line of code when I pass an empty string to detect_frames:
It works as expected with the
|
That is strange that it's happening on an empty string as well. I still can't reproduce this error for some reason. It seems like the small model must be outputting some strange string that's breaking processing of subtasks, I wish I could see what string it is. I'm not sure if it's related, but the library only supports Python 3.8+ currently, so I'm surprised it was possible to install it on Python 3.7. For the length of the input, the reason it could spill over the 512 token limit is that the subtask prompts can also be pretty long. The library tries to include any info it thinks might be helpful for T5 to make the classification as part of the task definition. So, for example, one of the subtask inputs for text_2 looks like the following when I try it locally:
Here, it's trying to include all the possible argument names that the That being said, if this is happening on an empty string input then something is definitely wrong. Are you able to see what strings are being passed to the model and what it's returning? It might require hacking print statements into the library for debugging. Or maybe we can add a |
Actually, it's possible that what's going on is that you're on an old version of the library, since old versions did support Python 3.7. Can you confirm what version of the library you have installed? |
For some texts, using t5-small, detect_frames returns a cryptic index error.
MRE:
Seems like a bug to me!
The text was updated successfully, but these errors were encountered: