-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pre-tokenized queries / documents does not work at the moment #50
Comments
Alright, for pretokenized indexes, @heinrichreimer Do you have any preferences how we could solve this? E.g., so that it is usable but maybe still compatible with previous behaviour? |
@heinrichreimer @mam10eks So I assume the default pipe is stopwords, porter stemmer, this is always included in data.properties should shouldn't be an issue in the default case |
one possible suggestion could also be that we introduce a new |
I'd say it would be best to fix this in the PyTerrier backend here: ir_axioms/ir_axioms/backend/pyterrier/__init__.py Lines 229 to 234 in 4212946
Is there a PyTerrier API to access the pre-tokenized terms given the document ID? |
This commit adds some failing unit tests: 4a747d4
Should be simple to resolve this. We load the term-pipeline from the terrier index which we implemented at a time when the pre-tokenized feature was not yet available in PyTerrier, so we likely have a wrong pipeline in case pre-tokenized is specified.
The text was updated successfully, but these errors were encountered: