-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repairs for readers and processors #183
Comments
You can evaluate post-processed sequences even now. The evaluation description can be a three-tuple: produced series, dataset series, metric. (The original series is lost, though.) |
Ok, so we must fix the third point. Note there are two places where the processor function is used (normal and lazy dataset). |
The third bullet is part of #179, isn't it? But the second one isn't done, although it's checked, right? |
Yes, it is not. I checked it earlier by mistake |
For now, the behavior in #179 is that the dataset calls However, lazy datasets are never preprocessed, is that correct? You always split them to batches and batches are not lazy, so I think it's solved. I'll have another look in the code. |
Unfortunately, the lazy dataset treats preprocessing differently than non lazy dataset. Point three must be fixed, preferably before merge of the decoder_refactor (#179) so it can use this. |
Things that went wrong with the readers pull request ( #181 ):
Config loader converts tuples to lists. (Can be addressed by explicitly casting the
build_object
output to tuple whenvalue
is a tuple, in config_loader, line 45)If preprocessor creates a new data series, postprocessor should do the same thing the other way around, otherwise the whole world falls apart, starting with evaluation and the strings around it. (E.g. we are not available to measure BLEU on postprocessed sequences)
Preprocessors were Callables applied on sentences, not data series (Either do a map and make a list of it (or yield, if lazy)) or reform the preprocessor format (which I'd prefer to do). To avoid future misinterpretations, Callables should be further parameterized in the type hints, e.g.
Callable[List[List[str]], List[List[str]]]
for serie-level preprocessors, or justCallable[List[str], List[str]]
for sentence-level ones.The text was updated successfully, but these errors were encountered: