-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs for Search Errors #548
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not verify the RASR config, but else looks good (only typing/documentation missing)
@@ -664,3 +665,52 @@ def calc_wer(self): | |||
os.chdir(old_dir) | |||
|
|||
return wer | |||
|
|||
|
|||
class IntersectStmCtm(Job): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a docstring what this job does?
recognition/advanced_tree_search.py
Outdated
use_gpu: bool = False, | ||
extra_config=None, | ||
extra_post_config=None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as typical in @curufinwes Jobs, a docstring is missing xP
type hints would also be appreciated
recognition/advanced_tree_search.py
Outdated
self.config, | ||
self.post_config, | ||
) = RescoreLatticeCacheJob.create_config(**kwargs) | ||
self.input_lattice_cache = lattice_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Job assumes a single lattice cache as input but produces multiple caches as output.
Would the application then be to provide the bundle file for the parameter lattice_cache
?
And this is implemented because lattice processing would be done with fewer nodes than lattice generation?
for path, lines in [(self.out_stm, stm_lines), (self.out_ctm, ctm_lines)]: | ||
with uopen(path, "wt") as out: | ||
out.write("".join(lines[None])) | ||
del lines[None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why del lines[None]
? I assume because None
would be in the set common_recordings
and in the for-loop below we would also iterate again over it and write the header-lines a second time.
But now we still do iterate over it and when None
shows up, the defaultdict nature of lines
prevents a crash and leads to out.write("".join([])
which if fine I guess.
Maybe semantically more nice would be to remove None
from the set of common_recordings
e.g.
common_recordings = set(stm_lines.keys()).intersection(set(ctm_lines.keys())) - {None}
I leave this open.
|
||
|
||
class IntersectStmCtm(Job): | ||
def __init__(self, stm_path: tk.Path, ctm_path: tk.Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc str pls
RASR config looks good to me. The job rescores a lattice with a different LM and applies additional pruning as far as I can see. |
Co-authored-by: Moritz Gunz <[email protected]>
These two jobs are implemented by @curufinwe for calculating search errors, once the
rasr
has this commit.The merging should be straight-forward.