This repository contains code for the paper: Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs
you need to have some GEDCOM files (.ged). We cannot provide GEDCOM files due to privacy regulations.
you can split the GEDCOM files into sub-graphs using the ged-bfs algorithm (C# pseudo code).
convert GEDCOM files to the Gen-SQuAD dataset (
train and eval ( or inference (
Trainig script can be found in training\ (python)
--model_type bert
--model_name_or_path bert-base-uncased
--dataset_folder /home/Gen_SQuAD_2
--per_gpu_train_batch_size 8
--per_gpu_eval_batch_size 8
--learning_rate 3e-5
--num_train_epochs 20
--max_seq_length 512
--doc_stride 128
--output_dir /home/Uncle_BERT_2
--save_steps 5000
--threads 16
--max_test_examples 512
--max_train_examples 131072
--model_name_or_path can be also set to other models pretrained for SQuAD
--do_train can be removed if only eval is needed
--do_eval can be removed if only training is needed
--dataset_folder should contains the preprocessing output (Gen-SQuAD)
Other parameters are self-explanatory.
Suissa, O., Zhitomirsky-Geffet, M., & Elmalech, A. (2023). Question answering with deep neural networks for semi-structured heterogeneous genealogical knowledge graphs. Semantic Web, 14(2), 209-237.