Code for the paper: "AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation" published at Interspeech 2023.
To run the agent, please make sure that SimulEval v1.0.2 (commit d1a8b2f) is installed
and set --port
accordingly.
We release the offline ST models used for AlignAtt simultaneous inference:
- Common files: config_simul.yaml | gcmvn.npz | source_vocab.model | source_vocab.txt
- en-de: checkpoint | target_vocab.model | target_vocab.txt
- en-es: checkpoint | target_vocab.model | target_vocab.txt
- en-fr: checkpoint | target_vocab.model | target_vocab.txt
- en-it: checkpoint | target_vocab.model | target_vocab.txt
- en-nl: checkpoint | target_vocab.model | target_vocab.txt
- en-pt: checkpoint | target_vocab.model | target_vocab.txt
- en-ro: checkpoint | target_vocab.model | target_vocab.txt
- en-ru: checkpoint | target_vocab.model | target_vocab.txt
Please replace spm_unigram8000_st_target.{model/txt}
, spm_unigram.en.{model/txt}
, and gcmvn.npz
of the config_simul.yaml
with your absolute path to files.
For the simultaneous inference, set --source
, --target
, and --config
as described in the
Fairseq Simultaneous Translation repository.
--model-path
is the path to the offline ST model checkpoint,
--frame-num
is the value of f used for the inference (f=[2, 4, 6, 8, 10, 12, 14]
in the paper).
The output will be saved in --output
.
simuleval \
--agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_alignatt.py \
--source ${SRC_LIST_OF_AUDIO} \
--target ${TGT_FILE} \
--data-bin ${DATA_ROOT} \
--config config_simul.yaml \
--model-path ${ST_SAVE_DIR}/checkpoint_avg7.pt \
--extract-attn-from-layer 3 \
--frame-num ${FRAME} \
--speech-segment-factor 25 \
--output ${OUT_DIR} \
--port ${PORT} \
--gpu \
--scores
For the offline inference, please refer to the Speechformer README.
To ensure complete reproducibility, we also release the outputs obtained by AlignAtt using SimulEval 1.0.2:
- en-de: outputs folder
- en-es: outputs folder
- en-fr: outputs folder
- en-it: outputs folder
- en-nl: outputs folder
- en-pt: outputs folder
- en-ro: outputs folder
- en-ru: outputs folder
@inproceedings{papi-et-al-2023-alignatt,
title = "AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation",
author = {Papi, Sara and Negri, Matteo and Turchi, Marco},
booktitle = "Proc. of Interspeech 2023",
year = "2023",
address = "Dublin, Ireland",
}