Skip to content
/ CoBERT Public

Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

License

Notifications You must be signed in to change notification settings

mct10/CoBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoBERT

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Introduction

Code BERT (CoBERT) is an approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning. CoBERT outperforms the most recent state-of-the-art performance on the ASR task and brings significant improvements on the SUPERB speech translation (ST) task.

se

Pre-Trained Models

Model Pretraining Data Model
code teacher 1 Librispeech 960 hr download
code teacher 2 Librispeech 960 hr download
CoBERT base Librispeech 960 hr download

Extract features using pre-trained models

import torch
import torch.nn.functional as F
from cobert.models.cobert_with_teacher import CobertWithTeacherConfig, CobertWithTeacherModel

checkpoint = torch.load("path/to/checkpoint.pt")
cfg = CobertWithTeacherConfig(**checkpoint["cfg"]["model"])
model = CobertWithTeacherModel.build_model(cfg)

# code teacher is useless in this case. remove them.
model.code_teacher_model = None
for k in list(checkpoint["model"].keys()):
    if "code_teacher_model" in k:
        del checkpoint["model"][k]

# also delete ema
del checkpoint["model"]["_ema"]
model.load_state_dict(checkpoint["model"])
model.eval()

wav_input_16khz = torch.randn(1,10000)
normalize = checkpoint["cfg"]["task"]["normalize"]  # True by default
if normalize:
    wav_input_16khz = F.layer_norm(wav_input_16khz[0], wav_input_16khz[0].shape).unsqueeze(0)

# extract representations for each layer
layer_results = model.extract_features(source=wav_input_16khz, padding_mask=None)["layer_results"]
# T x B x C -> B x T x C
layer_results = [l[0].transpose(0, 1) for l in layer_results]

Implementation

Setup

Please follow the instructions below to clone the code and install the python environment for CoBERT.

git clone https://github.com/mct10/CoBERT.git
cd CoBERT
git submodule update --init fairseq
pip install --editable fairseq/
cd fairseq
python setup.py build develop
cd ..

Data Preparation

We follow the steps for prepare the manifest in here and HuBERT label in here.

code teacher 1

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/code_teacher_1/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    task.label_dir=/path/to/codes \
    model.label_rate=50 \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train -m \
  --config-dir cobert/config/code_teacher_1/finetuning \
  --config-name base_100h \
  task.data=/path/to/manifest \
  task.label_dir=/path/to/label \
  +task.code_dir=/path/to/codes \
  model.w2v_path=/path/to/ckpt \
  common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir cobert/config/code_teacher_1/decode \
  --config-name infer_viterbi \
  task.data=/path/to/manifest \
  task.normalize=false \
  +task.code_dir=/path/to/codes \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

code teacher 2

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/code_teacher_2/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    +model.no_sin_pos_embed=true \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train \
    --config-dir cobert/config/code_teacher_2/finetuning \
    --config-name base_100h \
    task.data=/path/to/manifest \
    task.label_dir=/path/to/label \
    model.w2v_path=/path/to/ckpt \
    dataset.train_subset=train_100h \
    dataset.valid_subset=dev_other \
    common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir cobert/config/code_teacher_2/decode \
  --config-name infer_viterbi \
  task.data=/path/to/manifest \
  task.normalize=false \
  task.label_dir=/path/to/label \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

CoBERT

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/cobert/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    model.code_teacher_ckpt=/path/to/teacher \
    model.code_teacher_type=code_teacher_2 \
    +model.multi_outputs=true \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train \
    --config-dir fairseq/examples/wav2vec/config/finetuning \
    --config-name base_100h \
    task.data=/path/to/manifest \
    model.w2v_path=/path/to/ckpt \
    +model.normalize=true \
    dataset.train_subset=train_100h \
    dataset.valid_subset=dev_other \
    common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir fairseq/examples/speech_recognition/new/conf \
  --config-name infer \
  task=audio_finetuning \
  task.labels=ltr \
  decoding.type=viterbi \
  task.data=/path/to/manifest \
  task.normalize=true \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

Citation

If you find our work is useful in your research, please cite the following paper:

@article{meng2022cobert,
  title   = {CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning},
  author  = {Meng, Chutong and Ao, Junyi and Ko, Tom and Wang, Mingxuan and Li, Haizhou},
  eprint={2210.04062},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  year={2022}
}

About

Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages