-
Notifications
You must be signed in to change notification settings - Fork 4
Korektor
Dataset name | Source |
---|---|
korektor-czech-130202 | The current Korektor model |
syn2005 | Czech National Corpus (CNC) - http://hdl.handle.net/11858/00-097C-0000-0023-119E-8 |
syn2010 | Czech National Corpus (CNC) - http://hdl.handle.net/11858/00-097C-0000-0023-119F-6 |
precision = TP / (TP + FP)
recall = TP / (TP + FN)
F1-score = 2 * (precision * recall) / (precision + recall)
Measure | Description |
---|---|
TP | Number of words with spelling errors that the spell checker detected correctly |
FP | Number of words identified as spelling errors that are not actually spelling errors |
TN | Number of correct words that the spell checker did not flag as having spelling errors |
FN | Number of words with spelling errors that the spell checker did not flag as having spelling errors |
Measure | Description |
---|---|
TP | Number of words with spelling errors for which the spell checker gave the correct suggestion |
FP | Number of words (with/without spelling errors) for which the spell checker made suggestions, and for those, either the suggestion is not needed (in the case of non-existing errors) or the suggestion is incorrect if indeed there was an error in the original word. |
TN | Number of correct words that the spell checker did not flag as having spelling errors and no suggestions were made. |
FN | Number of words with spelling errors that the spell checker did not flag as having spelling errors or did not provide any suggestions |
Dataset | Max edit distance | Precision | Recall | F1-score |
---|---|---|---|---|
kor-cz-130202 | 1-edit | 94.7 | 90.8 | 92.7 |
syn2005 | “ | 95.7 | 90.8 | 93.2 |
syn2010 | “ | 94.7 | 89.9 | 92.2 |
kor-cz-130202 | 2-edit | 94.1 | 95.4 | 94.8 |
syn2005 | “ | 95.0 | 95.9 | 95.4 |
syn2010 | “ | 94.1 | 95.0 | 94.5 |
kor-cz-130202 | 3edit | 94.1 | 95.4 | 94.8 |
syn2005 | “ | 95.0 | 95.9 | 95.4 |
syn2010 | “ | 94.1 | 95.0 | 94.5 |
kor-cz-130202 | 4-edit | 94.1 | 95.4 | 94.8 |
syn2005 | “ | 95.0 | 95.9 | 95.4 |
syn2010 | “ | 94.1 | 95.0 | 94.5 |
kor-cz-130202 | 5-edit | 94.1 | 95.4 | 94.8 |
syn2005 | “ | 95.0 | 95.9 | 95.4 |
syn2010 | “ | 94.1 | 95.0 | 94.5 |
Note that the results are same for edit distances 2,3,4,5. This maybe due to the edit distance parameter does not really influence the error detection much.
#|top-1|top-1|top-1|top-2|top-2|top-2|top-3|top-3|top-3 ----|----|----|----|----|----|----|----|----|----|---- dataset|precision|recall|F1-score|precision|recall|F1-score|precision|recall|F1-score kor-cz-130202-1-ed|85.2|89.9|87.5|90.9|90.5|90.7|93.3|90.7|92.0 syn2005-1-ed|87.9|90.1|89.0|92.3|90.5|91.4|93.7|90.7|92.2 syn2010-1-ed|86.0|89.0|87.5|91.8|89.6|90.7|92.3|89.7|91.0 kor-cz-130202-2-ed|84.2|94.9|89.2|91.0|95.3|93.1|93.2|95.4|94.3 syn2005-2-ed|86.8|95.5|91.0|91.8|95.7|93.7|93.2|95.8|94.5 syn2010-2-ed|85.0|94.4|89.5|91.4|94.8|93.1|92.3|94.9|93.5 kor-cz-130202-3-ed|84.2|94.9|89.2|91.0|95.3|93.1|93.2|95.4|94.3 syn2005-3-ed|86.8|95.5|91.0|91.4|95.7|93.5|92.7|95.8|94.2 syn2010-3-ed|85.0|94.4|89.5|90.9|94.8|92.8|91.8|94.8|93.3 kor-cz-130202-4-ed|84.2|94.9|89.2|91.0|95.3|93.1|93.2|95.4|94.3 syn2005-4-ed|86.8|95.5|91.0|91.4|95.7|93.5|92.7|95.8|94.2 syn2010-4-ed|85.0|94.4|89.5|90.9|94.8|92.8|91.8|94.8|93.3 kor-cz-130202-5-ed|84.2|94.9|89.2|91.0|95.3|93.1|93.2|95.4|94.3 syn2005-5-ed|86.8|95.5|91.0|91.4|95.7|93.5|92.7|95.8|94.2 syn2010-5-ed|85.0|94.4|89.5|90.9|94.8|92.8|91.8|94.8|93.3