中文纠错模型CRASpell torch版本, 论文为CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction
torch==1.8.1+cpu
transformers==4.10.1
numpy==1.19.2
src
|__common
|__config
|__model
|__utils
|__train.py
|__predictv2.py
使用chinese-roberta-wwm-ext预训练模型得到
test data result
token num: gold_n:694, pred_n:783, right_n:586
token check: p=0.748, r=0.844, f=0.793
token correction-1: p=0.954, r=0.805, f=0.873
token correction-2: p=0.714, r=0.805, f=0.757
precision:0.7484026201754532, recall:0.8443791867735061, f1_score:0.7934992640496943
在sighan15中的数据集test数据集上
绝对acc 结果: 0.7372