From 4b7966b0b2cf8a2c260cdd0479632da8fd30b56c Mon Sep 17 00:00:00 2001
From: Marco Gaido <mgaido@fbk.eu>
Date: Thu, 26 Oct 2023 10:42:38 +0200
Subject: [PATCH] [!145][RELEASE] Gradient-reversal and multi-gender models to
 control gender (CLiC-it 2023)

# Which work do we release?

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation (CLiC-it 2023)

# What changes does this release refer to?

12884f97dd1a76ee79218dd8d4b790d0a29b38fe 538639e93c7926a6fd5bf1aa1824bb832e5fa172
---
 README.md                          |   1 +
 fbk_works/MULTIGENDER_CLIC_2023.md | 196 +++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)
 create mode 100644 fbk_works/MULTIGENDER_CLIC_2023.md

diff --git a/README.md b/README.md
index e3719ae3..fa9c461c 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,7 @@ Dedicated README for each work can be found in the `fbk_works` directory.
 
  ### 2023
 
+ - [[CLiC-IT 2023] **How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation**](fbk_works/MULTIGENDER_CLIC_2023.md)
  - [[EMNLP 2023] **Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection**](fbk_works/SHALLOW_FUSION_GENDER_BIAS.md)
  - [[WMT 2023] **Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES**](fbk_works/INES_eval.md)
  - [[ASRU 2023] **No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation**](fbk_works/PITCH_MANIPULATION_ASR.md)
diff --git a/fbk_works/MULTIGENDER_CLIC_2023.md b/fbk_works/MULTIGENDER_CLIC_2023.md
new file mode 100644
index 00000000..147e87ee
--- /dev/null
+++ b/fbk_works/MULTIGENDER_CLIC_2023.md
@@ -0,0 +1,196 @@
+# How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation (CLiC-it 2023)
+
+Instructions to reproduce the paper
+["How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation"](http://arxiv.org/abs/2310.15114).
+
+## 📍 Preprocess and Setup
+
+Download all the corpora listed in our paper and preprocess them as explained [here](SPEECHFORMER.md#preprocessing). 
+
+## 🏃 Training
+The models of the paper have been trained with the following scripts.
+All the scripts below assume that 4 GPUs are used with at least 16GB of VRAM.
+On different hardware, you may need to adjust the parameters `--max-tokens` (e.g. lower it if you have lower VRAM)
+and `--update-freq` so that the product `num_gpus * max_tokens * update_freq` remains the same.
+
+### Multi-gender Baseline
+
+To train multi-gender models, you first need to edit the YAML config file
+generated by the preprocessing script, so as to have:
+
+```
+audio_root: $YOUR_AUDIO_ROOT_DIR
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: $YOUR_TGTLANG_SENTENCEPIECE_MODEL
+bpe_tokenizer_src:
+  bpe: sentencepiece
+  sentencepiece_model: $YOUR_ENGLISH_SENTENCEPIECE_MODEL
+input_channels: 1
+input_feat_per_channel: 80
+sampling_alpha: 1.0
+prepend_tgt_lang_tag: True
+specaugment:
+  freq_mask_F: 27
+  freq_mask_N: 1
+  time_mask_N: 1
+  time_mask_T: 100
+  time_mask_p: 1.0
+  time_wrap_W: 0
+transforms:
+  '*':
+  - utterance_cmvn
+  _train:
+  - utterance_cmvn
+  - specaugment
+vocab_filename: $YOUR_TGTLANG_SENTENCEPIECE_TOKENS_TXT
+vocab_filename_src: $YOUR_ENGLISH_SENTENCEPIECE_TOKENS_TXT
+```
+
+which we name `config_st_mix_multigender.yaml` hereinafter.
+Mind the `prepend_tgt_lang_tag: True`.
+
+Your SentencePiece models should contain tags for the two genders as the special tokens
+`&lt;lang:He&gt;` and `&lt;lang:She&gt;`. In addition, the TSV you have obtained from the preprocessing
+of your data must be enriched with a `tgt_lang` column containing either `He` or `She` according to
+the gender of the speaker (in the following, we assume the TSV is named `train_st_src_gender_multilang.tsv`.
+To know the gender of each speaker, please refer to
+[MuST-Speakers](https://mt.fbk.eu/must-speakers/).
+
+Then, train multi-gender models with the following command:
+
+```
+python train.py ${DATA_ROOT} \
+    --train-subset train_st_src_gender_multilang \
+    --valid-subset dev_with_gender_lang \
+    --save-dir ${ST_SAVE_DIR} \
+    --num-workers 5 --max-update 50000 \
+    --max-tokens 10000 --adam-betas '(0.9, 0.98)' \
+    --user-dir examples/speech_to_text \
+    --task speech_to_text_aux_classification --config-yaml config_st_mix_multigender.yaml  \
+    --ignore-prefix-size 1 \
+    --criterion ctc_multi_loss --underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --arch conformer \
+    --ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg \
+    --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 25000 \
+    --clip-norm 10.0 \
+    --seed 1 --update-freq 8 \
+    --skip-invalid-size-inputs-valid-test \
+    --log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
+```
+
+### Finetuned Multi-gender Baseline
+
+To obtain a multi-gender model that is fine-tuned from the base ST one,
+add to the training command above `--allow-extra-tokens --finetune-from-model $BASE_ST_MODEL_CHECKPOINT`,
+change the learning rate to `5e-4`, and the `lr-scheduler` to `fixed`.
+
+
+
+### Multi-gender Gradient Reversal
+
+First, you need to add the following lines to the YAML config file, so as to obtain `config_st_mix_multigender_with_aux.yaml`:
+
+```
+aux_classes:
+  - He
+  - She
+```
+
+Then, you need to duplicate the `tgt_lang` column in the TSV files,
+naming the new column as `auxiliary_target`.
+
+The training can be executed with the following script:
+
+```
+python train.py ${DATA_ROOT} \
+    --train-subset train_st_src_gender_multilang \
+    --valid-subset dev_with_gender_lang \
+    --save-dir ${ST_SAVE_DIR} \
+    --num-workers 5 --max-update 50000 --keep-last-epochs 10 \
+    --max-tokens 10000 --adam-betas '(0.9, 0.98)' \
+    --user-dir examples/speech_to_text \
+    --task speech_to_text_aux_classification --config-yaml config_st_mix_multigender_with_aux.yaml  \
+    --ignore-prefix-size 1 \
+    --criterion ctc_multi_loss --underlying-criterion cross_entropy_multi_task --label-smoothing 0.1 \
+    --arch multitask_conformer --reverted-classifier --auxiliary-loss-weight 0.5 --reverted-lambda 0.5 \
+    --ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg \
+    --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 25000 \
+    --clip-norm 10.0 \
+    --seed 1 --update-freq 8 \
+    --skip-invalid-size-inputs-valid-test \
+    --log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
+```
+
+To obtain the **weighted** variant, add `--auxiliary-loss-class-weights 0.8 1.4` to the command above.
+
+### Finetuned Multi-gender Gradient Reversal
+
+To fine-tune from a pre-trained multi-gender model, the procedure is the same as above,
+but the script is the following:
+
+```
+python train.py ${DATA_ROOT} \
+    --train-subset train_st_src_gender_multilang \
+    --valid-subset dev_with_gender_lang \
+    --save-dir ${ST_SAVE_DIR} \
+    --num-workers 5 --max-update 50000 \
+    --max-tokens 10000 --adam-betas '(0.9, 0.98)' \
+    --user-dir examples/speech_to_text \
+    --task speech_to_text_aux_classification --config-yaml config_st_mix_multigender.yaml  \
+    --ignore-prefix-size 1 \
+    --criterion ctc_multi_loss --underlying-criterion cross_entropy_multi_task --label-smoothing 0.1 \
+    --arch multitask_conformer --reverted-classifier --auxiliary-loss-weight 0.5 --reverted-lambda 10 \
+    --ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg --auxiliary-loss-class-weights 0.8 1.4 \
+    --allow-extra-tokens --allow-partial-loading --finetune-from-model $PATH_TO_PRETRAINED_MULTIGENDER_MODEL \
+    --optimizer adam --lr 5e-4 --lr-scheduler fixed \
+    --clip-norm 10.0 \
+    --seed 1 --update-freq 8 \
+    --skip-invalid-size-inputs-valid-test \
+    --log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
+```
+
+Similarly, the **weighted** variant is obtained by adding
+`--auxiliary-loss-class-weights 0.8 1.4` to the command above.
+
+### Audio Manipulation
+
+To enable the audio manipulation that converts speakers' vocal traits into the opposite gender,
+edit the `config_st_mix_multigender.yaml` file adding:
+
+```
+opposite_pitch:
+  gender_tsv: /home/ubuntu/disk2/corpora/MuST-Speakers_v1.1/MuST-Speakers_v1.1.tsv
+  sampling_rate: 16000
+  p_male: $PROB_MANIP
+  p_female: $PROB_MANIP
+raw_transforms:
+  _train:
+  - opposite_pitch
+waveform_sample_rate: 16000
+is_input_waveform: True
+```
+
+where `$PROB_MANIP` has been set to 0.5 and 0.8 in the experiments reported in the paper.
+
+## 🔍 Evaluation
+
+Evaluation of the system outputs has been performed with SacreBLEU v2.0
+and the [MuST-SHE Gender Accuracy Script](../examples/speech_to_text/scripts/gender/mustshe_gender_accuracy.py)
+v1.1.
+
+## ⭐ Citation
+
+If you use this work, please cite:
+
+```bibtex
+@inproceedings{gaido-et-al-multigender,
+  title={{How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation}},
+  author={Gaido, Marco and Fucci, Dennis and Negri, Matteo and Bentivogli, Luisa},
+  year={2023},
+  booktitle="Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)",
+  address="Venice, Italy"
+}
+```