Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible just to get the number of errors? #86

Open
SteveBraich opened this issue Feb 13, 2024 · 3 comments
Open

Is it possible just to get the number of errors? #86

SteveBraich opened this issue Feb 13, 2024 · 3 comments

Comments

@SteveBraich
Copy link

SteveBraich commented Feb 13, 2024

Is it possible just to get the number of errors?

I know I could probably just get the wer and multiply by the number of words to get the number errors, but I was hoping that was unnecessary.

Edit: The reason I am asking is this... I want to rollup all of my sentence WER into an overall document WER.

@nikvaessen
Copy link
Collaborator

nikvaessen commented Feb 13, 2024

If you use jiwer.process_words you get a WordOutput object. The number of errors would be the sum of substitutions, insertions, and deletions available in this object.

@nikvaessen
Copy link
Collaborator

nikvaessen commented Feb 13, 2024

If you want an overall WER of a document, you can use wer_contiguous transform instead. This allows the number of references and hypothesis sentences to differ.

For example:

import jiwer
jiwer.process_words(
    reference,
    hypothesis,
    reference_transform=jiwer.wer_contiguous,
    hypothesis_transform=jiwer.wer_contiguous,
)

@SteveBraich
Copy link
Author

SteveBraich commented Feb 13, 2024

I asked chatGPT, and here is the code it generated:

import jiwer

# Assuming 'original_sentences' and 'corrected_sentences' are your lists of sentences
original_sentences = [...]  # Your list of original sentences
corrected_sentences = [...]  # Your list of corrected sentences

# Calculate total WER for the document
total_wer = jiwer.wer(original_sentences, corrected_sentences)

print(f"Total WER for the document: {total_wer}")

Doesn't that seem easier?

EDIT: That's giving me a number greater than one.
Actually I was using the wrong columns to calculate this. It looks good now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants