DH2022 files
dump JSON du corpus structuré (dictionnaire par position puis chapitre > liste des phrases)
{
"ENCPOS_ID": {
"metadata": [
],
"chapter_title": [
"first sentence",
"second sentence",
"…"
],
"chapter_title": [
]
},
"ENCPOS_ID": {
"metadata": [
],
"chapter_title": [
"first sentence",
"second sentence",
"…"
],
"chapter_title": [
]
}
A Jupyter Notebook is available to demo run, check out the tutorial on Google Colab :