Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repairs for readers and processors #183

Open
2 of 3 tasks
jindrahelcl opened this issue Dec 8, 2016 · 6 comments
Open
2 of 3 tasks

Repairs for readers and processors #183

jindrahelcl opened this issue Dec 8, 2016 · 6 comments
Assignees

Comments

@jindrahelcl
Copy link
Member

jindrahelcl commented Dec 8, 2016

Things that went wrong with the readers pull request ( #181 ):

  • Config loader converts tuples to lists. (Can be addressed by explicitly casting the build_object output to tuple when value is a tuple, in config_loader, line 45)

  • If preprocessor creates a new data series, postprocessor should do the same thing the other way around, otherwise the whole world falls apart, starting with evaluation and the strings around it. (E.g. we are not available to measure BLEU on postprocessed sequences)

  • Preprocessors were Callables applied on sentences, not data series (Either do a map and make a list of it (or yield, if lazy)) or reform the preprocessor format (which I'd prefer to do). To avoid future misinterpretations, Callables should be further parameterized in the type hints, e.g. Callable[List[List[str]], List[List[str]]] for serie-level preprocessors, or just Callable[List[str], List[str]] for sentence-level ones.

@jlibovicky
Copy link
Contributor

You can evaluate post-processed sequences even now. The evaluation description can be a three-tuple: produced series, dataset series, metric. (The original series is lost, though.)

@jindrahelcl
Copy link
Member Author

Ok, so we must fix the third point. Note there are two places where the processor function is used (normal and lazy dataset).

@jlibovicky
Copy link
Contributor

The third bullet is part of #179, isn't it? But the second one isn't done, although it's checked, right?

@jindrahelcl
Copy link
Member Author

Yes, it is not. I checked it earlier by mistake

@jindrahelcl
Copy link
Member Author

For now, the behavior in #179 is that the dataset calls list(map(...)) on the preprocessor. But it won't work with lazy datasets.

However, lazy datasets are never preprocessed, is that correct? You always split them to batches and batches are not lazy, so I think it's solved. I'll have another look in the code.

@jindrahelcl
Copy link
Member Author

Unfortunately, the lazy dataset treats preprocessing differently than non lazy dataset. Point three must be fixed, preferably before merge of the decoder_refactor (#179) so it can use this.

@jindrahelcl jindrahelcl removed the bug label May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants