Skip to content

Commit

Permalink
[!145][RELEASE] Gradient-reversal and multi-gender models to control …
Browse files Browse the repository at this point in the history
…gender (CLiC-it 2023)

# Which work do we release?

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation (CLiC-it 2023)

# What changes does this release refer to?

12884f97dd1a76ee79218dd8d4b790d0a29b38fe 538639e93c7926a6fd5bf1aa1824bb832e5fa172
  • Loading branch information
mgaido91 committed Oct 26, 2023
1 parent 71bcaac commit 4b7966b
Show file tree
Hide file tree
Showing 2 changed files with 197 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Dedicated README for each work can be found in the `fbk_works` directory.

### 2023

- [[CLiC-IT 2023] **How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation**](fbk_works/MULTIGENDER_CLIC_2023.md)
- [[EMNLP 2023] **Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection**](fbk_works/SHALLOW_FUSION_GENDER_BIAS.md)
- [[WMT 2023] **Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES**](fbk_works/INES_eval.md)
- [[ASRU 2023] **No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation**](fbk_works/PITCH_MANIPULATION_ASR.md)
Expand Down
196 changes: 196 additions & 0 deletions fbk_works/MULTIGENDER_CLIC_2023.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation (CLiC-it 2023)

Instructions to reproduce the paper
["How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation"](http://arxiv.org/abs/2310.15114).

## 📍 Preprocess and Setup

Download all the corpora listed in our paper and preprocess them as explained [here](SPEECHFORMER.md#preprocessing).

## 🏃 Training
The models of the paper have been trained with the following scripts.
All the scripts below assume that 4 GPUs are used with at least 16GB of VRAM.
On different hardware, you may need to adjust the parameters `--max-tokens` (e.g. lower it if you have lower VRAM)
and `--update-freq` so that the product `num_gpus * max_tokens * update_freq` remains the same.

### Multi-gender Baseline

To train multi-gender models, you first need to edit the YAML config file
generated by the preprocessing script, so as to have:

```
audio_root: $YOUR_AUDIO_ROOT_DIR
bpe_tokenizer:
bpe: sentencepiece
sentencepiece_model: $YOUR_TGTLANG_SENTENCEPIECE_MODEL
bpe_tokenizer_src:
bpe: sentencepiece
sentencepiece_model: $YOUR_ENGLISH_SENTENCEPIECE_MODEL
input_channels: 1
input_feat_per_channel: 80
sampling_alpha: 1.0
prepend_tgt_lang_tag: True
specaugment:
freq_mask_F: 27
freq_mask_N: 1
time_mask_N: 1
time_mask_T: 100
time_mask_p: 1.0
time_wrap_W: 0
transforms:
'*':
- utterance_cmvn
_train:
- utterance_cmvn
- specaugment
vocab_filename: $YOUR_TGTLANG_SENTENCEPIECE_TOKENS_TXT
vocab_filename_src: $YOUR_ENGLISH_SENTENCEPIECE_TOKENS_TXT
```

which we name `config_st_mix_multigender.yaml` hereinafter.
Mind the `prepend_tgt_lang_tag: True`.

Your SentencePiece models should contain tags for the two genders as the special tokens
`<lang:He>` and `<lang:She>`. In addition, the TSV you have obtained from the preprocessing
of your data must be enriched with a `tgt_lang` column containing either `He` or `She` according to
the gender of the speaker (in the following, we assume the TSV is named `train_st_src_gender_multilang.tsv`.
To know the gender of each speaker, please refer to
[MuST-Speakers](https://mt.fbk.eu/must-speakers/).

Then, train multi-gender models with the following command:

```
python train.py ${DATA_ROOT} \
--train-subset train_st_src_gender_multilang \
--valid-subset dev_with_gender_lang \
--save-dir ${ST_SAVE_DIR} \
--num-workers 5 --max-update 50000 \
--max-tokens 10000 --adam-betas '(0.9, 0.98)' \
--user-dir examples/speech_to_text \
--task speech_to_text_aux_classification --config-yaml config_st_mix_multigender.yaml \
--ignore-prefix-size 1 \
--criterion ctc_multi_loss --underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--arch conformer \
--ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg \
--optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
--warmup-updates 25000 \
--clip-norm 10.0 \
--seed 1 --update-freq 8 \
--skip-invalid-size-inputs-valid-test \
--log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
```

### Finetuned Multi-gender Baseline

To obtain a multi-gender model that is fine-tuned from the base ST one,
add to the training command above `--allow-extra-tokens --finetune-from-model $BASE_ST_MODEL_CHECKPOINT`,
change the learning rate to `5e-4`, and the `lr-scheduler` to `fixed`.



### Multi-gender Gradient Reversal

First, you need to add the following lines to the YAML config file, so as to obtain `config_st_mix_multigender_with_aux.yaml`:

```
aux_classes:
- He
- She
```

Then, you need to duplicate the `tgt_lang` column in the TSV files,
naming the new column as `auxiliary_target`.

The training can be executed with the following script:

```
python train.py ${DATA_ROOT} \
--train-subset train_st_src_gender_multilang \
--valid-subset dev_with_gender_lang \
--save-dir ${ST_SAVE_DIR} \
--num-workers 5 --max-update 50000 --keep-last-epochs 10 \
--max-tokens 10000 --adam-betas '(0.9, 0.98)' \
--user-dir examples/speech_to_text \
--task speech_to_text_aux_classification --config-yaml config_st_mix_multigender_with_aux.yaml \
--ignore-prefix-size 1 \
--criterion ctc_multi_loss --underlying-criterion cross_entropy_multi_task --label-smoothing 0.1 \
--arch multitask_conformer --reverted-classifier --auxiliary-loss-weight 0.5 --reverted-lambda 0.5 \
--ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg \
--optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
--warmup-updates 25000 \
--clip-norm 10.0 \
--seed 1 --update-freq 8 \
--skip-invalid-size-inputs-valid-test \
--log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
```

To obtain the **weighted** variant, add `--auxiliary-loss-class-weights 0.8 1.4` to the command above.

### Finetuned Multi-gender Gradient Reversal

To fine-tune from a pre-trained multi-gender model, the procedure is the same as above,
but the script is the following:

```
python train.py ${DATA_ROOT} \
--train-subset train_st_src_gender_multilang \
--valid-subset dev_with_gender_lang \
--save-dir ${ST_SAVE_DIR} \
--num-workers 5 --max-update 50000 \
--max-tokens 10000 --adam-betas '(0.9, 0.98)' \
--user-dir examples/speech_to_text \
--task speech_to_text_aux_classification --config-yaml config_st_mix_multigender.yaml \
--ignore-prefix-size 1 \
--criterion ctc_multi_loss --underlying-criterion cross_entropy_multi_task --label-smoothing 0.1 \
--arch multitask_conformer --reverted-classifier --auxiliary-loss-weight 0.5 --reverted-lambda 10 \
--ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg --auxiliary-loss-class-weights 0.8 1.4 \
--allow-extra-tokens --allow-partial-loading --finetune-from-model $PATH_TO_PRETRAINED_MULTIGENDER_MODEL \
--optimizer adam --lr 5e-4 --lr-scheduler fixed \
--clip-norm 10.0 \
--seed 1 --update-freq 8 \
--skip-invalid-size-inputs-valid-test \
--log-format simple >> ${ST_SAVE_DIR}/train.log 2> ${ST_SAVE_DIR}/train.err
```

Similarly, the **weighted** variant is obtained by adding
`--auxiliary-loss-class-weights 0.8 1.4` to the command above.

### Audio Manipulation

To enable the audio manipulation that converts speakers' vocal traits into the opposite gender,
edit the `config_st_mix_multigender.yaml` file adding:

```
opposite_pitch:
gender_tsv: /home/ubuntu/disk2/corpora/MuST-Speakers_v1.1/MuST-Speakers_v1.1.tsv
sampling_rate: 16000
p_male: $PROB_MANIP
p_female: $PROB_MANIP
raw_transforms:
_train:
- opposite_pitch
waveform_sample_rate: 16000
is_input_waveform: True
```

where `$PROB_MANIP` has been set to 0.5 and 0.8 in the experiments reported in the paper.

## 🔍 Evaluation

Evaluation of the system outputs has been performed with SacreBLEU v2.0
and the [MuST-SHE Gender Accuracy Script](../examples/speech_to_text/scripts/gender/mustshe_gender_accuracy.py)
v1.1.

## ⭐ Citation

If you use this work, please cite:

```bibtex
@inproceedings{gaido-et-al-multigender,
title={{How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation}},
author={Gaido, Marco and Fucci, Dennis and Negri, Matteo and Bentivogli, Luisa},
year={2023},
booktitle="Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)",
address="Venice, Italy"
}
```

0 comments on commit 4b7966b

Please sign in to comment.