AlignAtt agent for Simultaneous Speech Translation (Interspeech 2023)

Code for the paper: "AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation" published at Interspeech 2023.

📎 Requirements

To run the agent, please make sure that SimulEval v1.0.2 (commit d1a8b2f) is installed and set --port accordingly.

📌 Pre-trained offline models

We release the offline ST models used for AlignAtt simultaneous inference:

Common files: config_simul.yaml | gcmvn.npz | source_vocab.model | source_vocab.txt
en-de: checkpoint | target_vocab.model | target_vocab.txt
en-es: checkpoint | target_vocab.model | target_vocab.txt
en-fr: checkpoint | target_vocab.model | target_vocab.txt
en-it: checkpoint | target_vocab.model | target_vocab.txt
en-nl: checkpoint | target_vocab.model | target_vocab.txt
en-pt: checkpoint | target_vocab.model | target_vocab.txt
en-ro: checkpoint | target_vocab.model | target_vocab.txt
en-ru: checkpoint | target_vocab.model | target_vocab.txt

Please replace spm_unigram8000_st_target.{model/txt}, spm_unigram.en.{model/txt}, and gcmvn.npz of the config_simul.yaml with your absolute path to files.

🤖 Inference

For the simultaneous inference, set --source, --target, and --config as described in the Fairseq Simultaneous Translation repository. --model-path is the path to the offline ST model checkpoint, --frame-num is the value of f used for the inference (f=[2, 4, 6, 8, 10, 12, 14] in the paper).
The output will be saved in --output.

simuleval \
    --agent ${FBK_FAIRSEQ_ROOT}/examples/speech_to_text/simultaneous_translation/agents/v1_0/simul_offline_alignatt.py \
    --source ${SRC_LIST_OF_AUDIO} \
    --target ${TGT_FILE} \
    --data-bin ${DATA_ROOT} \
    --config config_simul.yaml \
    --model-path ${ST_SAVE_DIR}/checkpoint_avg7.pt \
    --extract-attn-from-layer 3 \
    --frame-num ${FRAME} \
    --speech-segment-factor 25 \
    --output ${OUT_DIR} \
    --port ${PORT} \
    --gpu \
    --scores

For the offline inference, please refer to the Speechformer README.

💬 Outputs

To ensure complete reproducibility, we also release the outputs obtained by AlignAtt using SimulEval 1.0.2:

en-de: outputs folder
en-es: outputs folder
en-fr: outputs folder
en-it: outputs folder
en-nl: outputs folder
en-pt: outputs folder
en-ro: outputs folder
en-ru: outputs folder

📍Citation

@inproceedings{papi-et-al-2023-alignatt,
title = "AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation",
author = {Papi, Sara and Negri, Matteo and Turchi, Marco},
booktitle = "Proc. of Interspeech 2023",
year = "2023",
address = "Dublin, Ireland",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md

ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md

AlignAtt agent for Simultaneous Speech Translation (Interspeech 2023)

📎 Requirements

📌 Pre-trained offline models

🤖 Inference

💬 Outputs

📍Citation

Files

ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md

Latest commit

History

ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md

File metadata and controls

AlignAtt agent for Simultaneous Speech Translation (Interspeech 2023)

📎 Requirements

📌 Pre-trained offline models

🤖 Inference

💬 Outputs

📍Citation