EMOTION2VEC

Official PyTorch code for extracting features and training downstream models with
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

(Logo generated by DALL·E 3)

Guides

emotion2vec is the first universal speech emotion representation model. Through self-supervised pre-training, emotion2vec has the ability to extract emotion representation across different tasks, languages, and scenarios.

Model

The paper is coming soon.

Performance

Performance on IEMOCAP

emotion2vec achieves SOTA with only linear layers on the mainstream IEMOCAP dataset.

Performance on other languages

emotion2vec achieves SOTA compared with SOTA SSL models on multiple languages (Mandarin, French, German, Italian, etc.). Refer to the paper for more details.

Performance on other speech emotion tasks

Refer to the paper for more details.

Extract features

Download extracted features

We provide the extracted features of popular emotion dataset IEMOCAP. The features are extracted from the last layer of emotion2vec. The features are stored in .npy format and the sample rate of the extracted features is 50Hz. The utterance-level features are computed by averaging the frame-level features.

frame-level: Google Drive | Baidu Netdisk (password: zb3p)
utterance-level: Google Drive | Baidu Netdisk (password: qu3u)

All wav files are extracted from the original dataset for diverse downstream tasks. If want to train with standard 5531 utterances for 4 emotions classification, please refer to iemocap_downstream.

Extract features from your dataset

The minimum environment requirements are python>=3.8 and torch>=1.13. Our testing environments are python=3.8 and torch=2.01.

git clone repos.

pip install fairseq
git clone https://github.com/ddlBoJack/emotion2vec.git

download emotion2vec checkpoint from:

Google Drive
Baidu Netdisk (password: b9fq).

modify and run scripts/extract_features.sh

Training your downstream model

We provide training scripts for IEMOCAP dataset in iemocap_downstream. You can modify the scripts to train your downstream model on other datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EMOTION2VEC

Guides

Model

Performance

Performance on IEMOCAP

Performance on other languages

Performance on other speech emotion tasks

Extract features

Download extracted features

Extract features from your dataset

Training your downstream model

Files

README.md

Latest commit

History

README.md

File metadata and controls

EMOTION2VEC

Guides

Model

Performance

Performance on IEMOCAP

Performance on other languages

Performance on other speech emotion tasks

Extract features

Download extracted features

Extract features from your dataset

Training your downstream model