Note: At the moment, the implementation can only read the evaluation files in QALD format.
To evaluate the configured pipelines, the eval_config.json file needs to be modified. Afterwards, perform the following steps:
- Setup and start the BERT Similarity computation service (bert similarity service readme)
- Execute
python run_test.py
to generate the gold and prediction files for all the pipelines (check file for custom arguments); - Execute
python eval_test.py
to evaluate each prediction file against its gold file using BENG; - Wait for BENG to finish evaluation;
- Execute
python gen_eval_results.py
to extract the results from BENG and write it to a tsv file namedevaluation_results.tsv
; - Optionally, execute
python format_translated_qald.py
to format the predictions back into QALD format.