Skip to content

Latest commit

 

History

History
192 lines (131 loc) · 11.6 KB

README.md

File metadata and controls

192 lines (131 loc) · 11.6 KB

Section 1 : Handwritten text recognition for e-NDP

e-NDP_HTR training experiments to fit transcribed pages from AN, LL105-126 registers into Kraken HTR core.

Definitions

Stats for each register volume (LL105 - LL126, 26 volumes)

Volume Dates Type N.Pages
105 1326-1352. Train 21
106A 1356-1361. Train 20
106B 1362-1364. Train 11
107 1367-1370. Train 10
108A 1392-1394. Train 27
108B 1397-1399. Train 14
109A 1399-1401. Test 4
109B 1401-1405. Train 10
109C 1405-1407. Train 12
110 1407-1411. Train 17
111 1412-1414. Test 3
112 1414-1424. Train 17
113 1425-1432. Train 17
114 1433-1437 Train 11
115 1440-1444. Test 4
116 1445-1459. Test 4
117 1450-1454. Train 60
118 (minutes) 1453-1456. Train 13
119 1456-1460. Train 18
120 1460-1465. Train 3
121 1465-1474. Test 3
122 1474-1481. Train 21
123-124 1481-1489. Test 4
125 1489-1493 Train 20+16
126 1493-1497 Train 18
Bnf Latin 17740 1430-1444 Test 687

references: Projet e-NDP – Notre-Dame de Paris et son cloître : archives des séances du séminaire

SÉRIE LL MONUMENTS ECCLÉSIASTIQUES REGISTRES

Architecture

Architecure 1:

kraken architecture 1

hyper_params': {'pad': 24, 'freq': 1.0, 'batch_size': 1, 'lag': 5, 'min_delta': None, 'optimizer': 'Adam', 'lrate': 0.0002, 'momentum': 0.9, 'weight_decay': 0, 'schedule': 'reduceonplateau', 'normalization': None, 'normalize_whitespace': True, 'augment': False, 'step_size': 10, 'gamma': 0.1, 'rop_patience': 3, 'cos_t_max': 50}}

Trained on Kraken (https://github.com/mittagessen/kraken). Training command:

kraken 3.0 : ketos train -N 70 -q dumb -f page --threads 32 -r 0.0001 --schedule reduceonplateau --sched-patience 3 -d cuda:0 --preload --pad 24 -s '[1,128,0,1 Cr4,16,32 Do0.1,2 Mp2,2 Cr4,16,32 Do0.1,2 Mp2,2 Cr3,8,64 Do0.1,2 Mp2,2 Cr3,8,64 Do0.1,2 S1(1x0)1,3 Lbx256 Do0.3,2 Lbx256 Do0.3,2 Lbx256 Do0.3]' --augment training_folder/*.xml

kraken 4.0 : ketos train -N 70 -q dumb -f page --workers 32 -r 0.0001 --schedule reduceonplateau --sched-patience 3 -d cuda:0 --pad 24 -s '[1,128,0,1 Cr4,16,32 Do0.1,2 Mp2,2 Cr4,16,32 Do0.1,2 Mp2,2 Cr3,8,64 Do0.1,2 Mp2,2 Cr3,8,64 Do0.1,2 S1(1x0)1,3 Lbx256 Do0.3,2 Lbx256 Do0.3,2 Lbx256 Do0.3]' --augment training_folder/*.xml

Architecure 2:

Training

training board accuracy on validation set

HTR Experiments

Training and testing data-sets

Total :

  • TRAIN: 786 images --> 554 folios -> 1109 pages
  • TEST: 28 images --> 28 pages

External test dataset

Multilingual

  • Odart de Morchesne : 274 formules --> 94 lat + 180 fro (35%-40% lat)
  • Clairmarais : 178 actes ---> 168 lat + 10 fro (92% lat)
  • Livre Rouge (35% - 40% latin)
  • e-ndp (almost all in latin)
  • Total: ± 77% latin / 23% french

Model versions

Training HTR versions using varied data:

  • 19/10/2021: V1 core --> Formulaire Odart de Morchesne + Cartulaire de Clairmarais + Livre Rouge + LL 108a (e-dnp_V1) :

  • 16/11/2021: V2 core --> V1 core + 84 pages (1º e-ndp transcription group)

  • 11/01/2022: V3 core --> V1 core + V2 core + 82 pages (2º e-ndp transcription group)

  • 10/02/2022: V4 core --> V1 core + V2 core + V3 core + 76 pages (3º e-ndp transcription group)

  • 30/06/2022: V5 core ---> all V4 core + 78 pages (4º e-ndp transcription group)

  • 17/07/2022: V6 core ---> all V5 core + 40 pages coming for new digitized volumes: 105, 106A, 111, 107, 118, 119, 120, 123-124, 127-128)

  • 16/08/2022: V7 core ---> all V6 core + 42 pages coming for new digitized volumes: 105, 106A, 111, 107, 118, 119, 120, 123-124, 127-128)

  • val_acc = accuracy on validation set during training

  • test_acc = accuracy on corpus test after training

  • cer = test character error rate

  • wer = test word error rate

model_name Content arch val_acc test_acc cer wer logs
V1_test Morchesne, Clairmarais, Livre Rouge, 108a arch_1 92.50% 69.75% 34.88% 71.50% log_1
V2_test V1 core, +LL115 (20 pages), +1º e-ndp group arch_1 94.71% 83.57% 18.97% 48.23% log_2
V3_test V1 core, V2 core, +2º e-ndp group arch_1 93.90% 86.92% 13.25% 36.24% log_3
V3b_test Only e-ndp transcriptions (193 images) arch_1 91.19% 81.90% 18.44% 46.94% log_4
V4_test V1 core, V2 core, V3 core +3º e-ndp group arch_1 93.52% 88.55% 11.43% 32.47% log_5
V5_test V1, V2, V4 cores + 4º e-ndp group arch_1 94.48% 90.26% 9.73% 27.61% log_6
all G1 test metrics
V3_Latin_17740 V3 tested on Latin 17740 manuscrit arch_1 - 89.25% 11.21% 30.28% log_7
V3b_Latin_17740 V3b tested on Latin 17740 manuscrit arch_1 - 82.59% 18.78% 48.17% log_8
V7_Latin_17740 V7 tested on Latin 17740 manuscrit arch_1 - 91.52% 9.27% 28.28% log_9
V3_Navarre V3 tested on Charles II of Navarre manuscrit arch_1 - 82.82% 14.36% 44.42% log_10
V3b_Navarre V3b tested on Charles II of Navarre manuscrit arch_1 - 67.81% 29.02% 69.80% log_11
V7_Navarre V7 tested on Charles II of Navarre manuscrit arch_1 - 85.42% 12.52% 37.78% log_12

Section 2: Layout Segmentation

Layout segmentation is a compulsory step before HTR recognition in order to distinguish sections inside a document. This process intend to separate interdependant page zones to produce a recognition in a section-sequence order and not in a line-sequence order which mix textual and peri-textual content.

For e-NDP we contemplate 5 sections to englobe the page distribution in all the 26 volumes:

  1. Block : All the central text blocks, that normally corresponds to the main content called "conclusions" in registers.
  2. Liste : List of names of the canons who were present during the meeting. Normally located before the "conclusions".
  3. Entrée : Marginal notes or entries to inform about the content of "conclusions".
  4. Date : Paragraph contending the date. Normally at the head of a "conclusion", but separate of the main body.
  5. Numérotation : Page numbers in roman or arabic. Usually appear in the corners of the pages.

Automatic layout segmentation in a e-NDP page.

Layout segmentation experiments

We annotate 376 transcribed pages from the e-NDP ground-truth (V1 core to V6 core) and we experiment using a classical CNN+BiLSTM architecture.

Order to replicate the training in Kraken 3:

ketos segtrain -f page -o seg_model -d cuda:0 -bl --threads 32 --epochs 50 --schedule reduceonplateau -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]' training_folder/*xml

Another training option is the fine-tuning on the default blla.mlmodel (https://github.com/mittagessen/kraken/blob/master/kraken/blla.mlmodel):

ketos segtrain -i blla.mlmodel -f page -o seg_model -d cuda:0 -bl --threads 32 --resize add --epochs 50 --schedule reduceonplateau -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]' training_folder/*xml

Layout segmentation model versions

  • mean_iu = Mean intersection over union (IU)
  • freq_iu = Frequency intersection over union (IU)
  • mean_acc = Mean accuracy (average of the prediction accuracy over all categories)
  • IU is the overlap ratio between the candidate bound and the ground truth bound.
model_name Content mean_iu freq_iu mean_acc
V1_layout endp V1-V2 cores 0.6508 0.7918 0.9552
V2_layout endp V1-V4 cores 0.6744 0.8366 0.9648
V3_layout endp V1-V6 cores 0.6936 0.8455 0.9673

references: Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).