Spanish Numbers Dataset is a small dataset of 485 images containing handwritten sentences of Spanish numbers (298 for training and 187 for testing).
- Laia
- ImageMagick's convert
- Optionally: Kaldi's compute-wer
To train a new Laia model for the Spanish Numbers dataset just follow these steps. Given that this dataset does not provide validation partition, we will use the test partition as validation.
- Download the Spanish Numbers dataset:
mkdir -p data/;
wget -P data/ https://www.prhlt.upv.es/corpora/spanish-numbers/Spanish_Number_DB.tgz;
tar -xvzf data/Spanish_Number_DB.tgz -C data/;
-
Execute
steps/prepare.sh
. This script assumes that Spanish Numbers dataset is insidedata
folder. This script does the following:- Transforms the images from pbm to png.
- Scales them to 64px height.
- Creates the auxiliary files necessary for training.
-
Execute the
laia-create-model
script to create an "empty" laia model using:
../../laia-create-model \
--cnn_batch_norm true \
--cnn_type leakyrelu \
-- 1 64 20 model.t7;
- Use the
laia-train-ctc
script to train the model:
../../laia-train-ctc \
--adversarial_weight 0.5 \
--batch_size 16 \
--log_also_to_stderr info \
--log_level info \
--log_file laia.log \
--progress_table_output laia.dat \
--use_distortions true \
--early_stop_epochs 100 \
--learning_rate 0.0005 \
model.t7 data/lang/char/symbs.txt \
data/train.lst data/lang/char/train.txt \
data/test.lst data/lang/char/test.txt;
After 366 epochs the model achieves a CER=~2.08% in test, with a 95% confidence interval in [1.295%, 2.610%].
You can use laia-decode to obtain the transcripts of any set of images.
../../laia-decode --symbols_table data/lang/char/symbs.txt \
model.t7 data/test.lst > test_hyp.char.txt;
Once you have created the test_hyp.char.txt you can compute the character error rate (CER) using Kaldi's compute-wer, for instance:
compute-wer --mode=strict ark:data/lang/char/test.txt ark:test_hyp.char.txt |
grep WER | sed -r 's|%WER|%CER|g';
In order to compute the WER, you will need first to convert the character-level transcripts into word-level transcripts (you can use a simple AWK script, for instance). Finally, you can compute the WER using Kaldi's compute-wer as well.
# Get word-level hypothesis transcript
awk '{
printf("%s ", $1);
for (i=2;i<=NF;++i) {
if ($i == "{space}")
printf(" ");
else
printf("%s", $i);
}
printf("\n");
}' test_hyp.char.txt > test_hyp.word.txt;
# ... and compute WER
compute-wer --mode=strict ark:data/lang/word/test.txt ark:test_hyp.word.txt |
grep WER;
Execute run.sh
.