Code for the paper: "Attention as a Guide for Simultaneous Speech Translation" published at ACL 2023.
To run the agent, please make sure that SimulEval v1.0.2 (commit d1a8b2f) is installed
and set --port
accordingly.
We release the offline ST models used for EDAtt simultaneous inference.
- Common files: gcmvn.npz | source_vocab.model | source_vocab.txt
- en-de: checkpoint | config_simul.yaml | target_vocab.model | target_vocab.txt
- en-es: checkpoint | config_simul.yaml | target_vocab.model | target_vocab.txt
Set --source
, --target
, and --config
as described in the
Fairseq Simultaneous Translation repository.
--model-path
is the path to the offline ST model checkpoint (either en-de or en-es),
--attn-threshold
is the value of alpha used for the inference (alpha=[0.6, 0.4, 0.2, 0.1, 0.05, 0.03]
in the paper).
The output will be saved in --output
.
simuleval \
--agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_edatt.py \
--source ${SRC_LIST_OF_AUDIO} \
--target ${TGT_FILE} \
--data-bin ${DATA_ROOT} \
--config config_simul.yaml \
--model-path ${ST_SAVE_DIR}/checkpoint_avg7.pt \
--extract-attn-from-layer 3 \
--frame-num 2 --attn-threshold $ALPHA \
--speech-segment-factor 20 \
--output ${OUT_DIR} \
--port ${PORT} \
--gpu \
--scores
To ensure complete reproducibility, we also release the outputs obtained by EDAtt using SimulEval 1.0.2:
- en-de: outputs folder
- en-es: outputs folder
@inproceedings{papi-et-al-2023-edatt,
title = "Attention as a Guide for Simultaneous Speech Translation",
author = {Papi, Sara and Negri, Matteo and Turchi, Marco},
booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics",
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics"
}