Repairs for readers and processors #183

jindrahelcl · 2016-12-08T12:05:31Z

Things that went wrong with the readers pull request ( #181 ):

Config loader converts tuples to lists. (Can be addressed by explicitly casting the build_object output to tuple when value is a tuple, in config_loader, line 45)
If preprocessor creates a new data series, postprocessor should do the same thing the other way around, otherwise the whole world falls apart, starting with evaluation and the strings around it. (E.g. we are not available to measure BLEU on postprocessed sequences)
Preprocessors were Callables applied on sentences, not data series (Either do a map and make a list of it (or yield, if lazy)) or reform the preprocessor format (which I'd prefer to do). To avoid future misinterpretations, Callables should be further parameterized in the type hints, e.g. Callable[List[List[str]], List[List[str]]] for serie-level preprocessors, or just Callable[List[str], List[str]] for sentence-level ones.

The text was updated successfully, but these errors were encountered:

jlibovicky · 2016-12-08T15:57:37Z

You can evaluate post-processed sequences even now. The evaluation description can be a three-tuple: produced series, dataset series, metric. (The original series is lost, though.)

jindrahelcl · 2016-12-09T14:18:44Z

Ok, so we must fix the third point. Note there are two places where the processor function is used (normal and lazy dataset).

jlibovicky · 2016-12-12T14:36:14Z

The third bullet is part of #179, isn't it? But the second one isn't done, although it's checked, right?

jindrahelcl · 2016-12-12T14:38:38Z

Yes, it is not. I checked it earlier by mistake

jindrahelcl · 2016-12-12T14:41:38Z

For now, the behavior in #179 is that the dataset calls list(map(...)) on the preprocessor. But it won't work with lazy datasets.

However, lazy datasets are never preprocessed, is that correct? You always split them to batches and batches are not lazy, so I think it's solved. I'll have another look in the code.

jindrahelcl · 2016-12-12T17:45:59Z

Unfortunately, the lazy dataset treats preprocessing differently than non lazy dataset. Point three must be fixed, preferably before merge of the decoder_refactor (#179) so it can use this.

jindrahelcl added the bug label Dec 8, 2016

jindrahelcl assigned jlibovicky Dec 8, 2016

jindrahelcl removed the bug label May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repairs for readers and processors #183

Repairs for readers and processors #183

jindrahelcl commented Dec 8, 2016 •

edited by jlibovicky

Loading

jlibovicky commented Dec 8, 2016

jindrahelcl commented Dec 9, 2016

jlibovicky commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

Repairs for readers and processors #183

Repairs for readers and processors #183

Comments

jindrahelcl commented Dec 8, 2016 • edited by jlibovicky Loading

jlibovicky commented Dec 8, 2016

jindrahelcl commented Dec 9, 2016

jlibovicky commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

jindrahelcl commented Dec 12, 2016

jindrahelcl commented Dec 8, 2016 •

edited by jlibovicky

Loading