Evaluation

Note: At the moment, the implementation can only read the evaluation files in QALD format.

To evaluate the configured pipelines, the eval_config.json file needs to be modified. Afterwards, perform the following steps:

Setup and start the BERT Similarity computation service (bert similarity service readme)
Execute python run_test.py to generate the gold and prediction files for all the pipelines (check file for custom arguments);
Execute python eval_test.py to evaluate each prediction file against its gold file using BENG;
Wait for BENG to finish evaluation;
Execute python gen_eval_results.py to extract the results from BENG and write it to a tsv file named evaluation_results.tsv;
Optionally, execute python format_translated_qald.py to format the predictions back into QALD format.

Provide feedback