Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs for Search Errors #548

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Jobs for Search Errors #548

wants to merge 5 commits into from

Conversation

Marvin84
Copy link
Contributor

These two jobs are implemented by @curufinwe for calculating search errors, once the rasr has this commit.

The merging should be straight-forward.

Copy link
Contributor

@Atticus1806 Atticus1806 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not verify the RASR config, but else looks good (only typing/documentation missing)

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
@@ -664,3 +665,52 @@ def calc_wer(self):
os.chdir(old_dir)

return wer


class IntersectStmCtm(Job):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a docstring what this job does?

use_gpu: bool = False,
extra_config=None,
extra_post_config=None,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as typical in @curufinwes Jobs, a docstring is missing xP
type hints would also be appreciated

self.config,
self.post_config,
) = RescoreLatticeCacheJob.create_config(**kwargs)
self.input_lattice_cache = lattice_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Job assumes a single lattice cache as input but produces multiple caches as output.
Would the application then be to provide the bundle file for the parameter lattice_cache?
And this is implemented because lattice processing would be done with fewer nodes than lattice generation?

for path, lines in [(self.out_stm, stm_lines), (self.out_ctm, ctm_lines)]:
with uopen(path, "wt") as out:
out.write("".join(lines[None]))
del lines[None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why del lines[None]? I assume because None would be in the set common_recordings and in the for-loop below we would also iterate again over it and write the header-lines a second time.

But now we still do iterate over it and when None shows up, the defaultdict nature of lines prevents a crash and leads to out.write("".join([]) which if fine I guess.
Maybe semantically more nice would be to remove None from the set of common_recordings e.g.

common_recordings = set(stm_lines.keys()).intersection(set(ctm_lines.keys())) - {None}

I leave this open.



class IntersectStmCtm(Job):
def __init__(self, stm_path: tk.Path, ctm_path: tk.Path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc str pls

@michelwi
Copy link
Contributor

I did not verify the RASR config, but else looks good (only typing/documentation missing)

RASR config looks good to me. The job rescores a lattice with a different LM and applies additional pruning as far as I can see.

Marvin84 and others added 2 commits November 7, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants