Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chaining of dataset series preprocessor can fail. #767

Open
varisd opened this issue Oct 18, 2018 · 2 comments
Open

Chaining of dataset series preprocessor can fail. #767

varisd opened this issue Oct 18, 2018 · 2 comments
Assignees
Labels

Comments

@varisd
Copy link
Member

varisd commented Oct 18, 2018

When chaining multiple dataset series preprocessor steps, e.g.:
preprocessors=[("source", "source_wp", <wp_prep>), ("source_wp", "source_wp_other", <other_prep>)]

The dataset.load can fail because there is no implicit order of processing the preprocessors list.

See:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/dataset.py#L325

For the part of code, that should be fixed.

@jindrahelcl
Copy link
Member

podívej se na komentář o dvě řádky nad tim, na co odkazuješ. Správně by se měl používat pipeline processor. může se sem přidat nějaký stromový zpracování, ale to nefungovalo ani ve starým datasetu

@varisd
Copy link
Member Author

varisd commented Oct 18, 2018

Here is a suggestion:

   def _add_preprocessed_series(iterators, s_name, prep_sl):
       preprocessor, source = prep_sl[s_name]
       if s_name in iterators:
           return
       if source in prep_sl:
           _add_preprocessed_series(iterators, source, prep_sl)
       if source not in iterators:
           raise ValueError(
           "Source series {} for series-level preprocessor nonexistent: "
               "Preprocessed series '', source series ''".format(source))
       iterators[s_name] = _make_sl_iterator(source, preprocessor)
[...]
   for s_name in prep_sl:
       _add_preprocessed_series(iterators, s_name, prep_sl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants