The Hungarian Drama Corpus (HunDraCor) is based on the ELTE Drama Corpus:
Szemes Botond – Bajzát Tímea – Fellegi Zsófia – Kundráth Péter – Horváth Péter – Indig Balázs – Dióssy Anna – Hegedüs Fanni – Pantyelejev Natali – Sziráki Sarolta – Vida Bence – Kalmár Balázs – Palkó Gábor 2022. Az ELTE Drámakorpuszának létrehozása és lehetőségei. In: Tick József – Kokas Károly – Holl András (szerk.): Valós térben – Az online térért: Networkshop 31: országos konferencia. Budapest: HUNGARNET Egyesület. 170–178.
We have implemented an integration workflow that allows to easily update HunDraCor from the level1 files of the ELTE source repo performing some minor transformations to make them DraCor ready.
The XSLT workflow depends on the following tools
- saxon XSLT processor
- xmlformat XML document formatter
(on macOS with homebrew you can
brew install xmlformat
)
To update the entire corpus from the sources simply run the the elte2dracor
script from the root directory of this repo:
./elte2dracor
This clones the
ELTE-DH/drama-corpus repo and
runs the transformation for each file in its level1
directory.
Alternatively the files can be imported from the
dracor-org fork of the ELTE
repo by using the --dracor
switch:
./elte2dracor --dracor
You can also update individual files, for instance:
./elte2dracor ./source-repo/level1/Madach_ACivilizator.xml