Pytorch implementation for codes in multimodal sentiment analysis.
Note: We strongly recommend that you browse the overall structure of our code at first. If you have any question, feel free to contact us.
In this framework, we support the following methods:
Type | Model Name | From |
---|---|---|
Single-Task | EF_LSTM | MultimodalDNN |
Single-Task | LF_DNN | - |
Single-Task | TFN | Tensor-Fusion-Network |
Single-Task | LMF | Low-rank-Multimodal-Fusion |
Single-Task | MFN | Memory-Fusion-Network |
Single-Task | Graph-MFN | Graph-Memory-Fusion-Network |
Single-Task | MulT(without CTC) | Multimodal-Transformer |
Single-Task | MISA | MISA |
Multi-Task | MLF_DNN | MMSA |
Multi-Task | MTFN | MMSA |
Multi-Task | MLMF | MMSA |
Multi-Task | SELF_MM | Self-MM |
Detailed results are shown in results/result-stat.md
- Clone this repo and install requirements.
git clone https://github.com/thuiar/MMSA
cd MMSA
pip install -r requirements.txt
Download dataset features and pre-trained berts from the following links.
- Baidu Cloud Drive with code:
ctgs
- Google Cloud Drive
For all features, you can use SHA-1 Hash Value
to check the consistency.
MOSI/unaligned_50.pkl
:5da0b8440fc5a7c3a457859af27458beb993e088
MOSI/aligned_50.pkl
:5c62b896619a334a7104c8bef05d82b05272c71c
MOSEI/unaligned_50.pkl
:db3e2cff4d706a88ee156981c2100975513d4610
MOSEI/aligned_50.pkl
:ef49589349bc1c2bc252ccc0d4657a755c92a056
SIMS/unaligned_39.pkl
:a00c73e92f66896403c09dbad63e242d5af756f8
Due to the size limitations, the MOSEI features and SIMS raw videos are available in Baidu Cloud Drive
only. All dataset features are organized as:
{
"train": {
"raw_text": [],
"audio": [],
"vision": [],
"id": [], # [video_id$_$clip_id, ..., ...]
"text": [],
"text_bert": [],
"audio_lengths": [],
"vision_lengths": [],
"annotations": [],
"classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
"regression_labels": []
},
"valid": {***}, # same as the "train"
"test": {***}, # same as the "train"
}
For MOSI and MOSEI, the pre-extracted text features are from BERT, different from the original glove features in the CMU-Multimodal-SDK.
For SIMS, if you want to extract features from raw videos, you need to install Openface Toolkits first, and then refer our codes in the data/DataPre.py
.
python data/DataPre.py --data_dir [path_to_Dataset] --language ** --openface2Path [path_to_FeatureExtraction]
For bert models, you also can download Bert-Base, Chinese from Google-Bert. And then, convert tensorflow into pytorch using transformers-cli
Then, modify config/config_*.py
to update dataset pathes.
python run.py --modelName *** --datasetName ***
- CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality
- Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis
Please cite our paper if you find our work useful for your research:
@inproceedings{yu2020ch,
title={CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality},
author={Yu, Wenmeng and Xu, Hua and Meng, Fanyang and Zhu, Yilin and Ma, Yixiao and Wu, Jiele and Zou, Jiyun and Yang, Kaicheng},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
pages={3718--3727},
year={2020}
}
@article{yu2021learning,
title={Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis},
author={Yu, Wenmeng and Xu, Hua and Yuan, Ziqi and Wu, Jiele},
journal={arXiv preprint arXiv:2102.04830},
year={2021}
}