This is the official github repository for the paper:"Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks" by Yahui Fu, Haiyue Song, Tianyu Zhao, and Tatsuya Kawahara. This work has been accepted to IWSDS 2024.
data/
contains the pre-processed corpora (sample data is the placeholder)log/
contains the log file where results are savedmodel/
contains the trained model (model.pt is the placeholder)
Install python3, make virtual enviroment (recommended), and install python packages by:
pip install --upgrade pip && pip -install -r requirements.txt
We have already put the pre-processed corpora in data/
folder. If you want to re-run the preprocessing by yourself, please follow the steps below:
- Big-Five label preparation, this is to convert the personality questionnaire to big5 labels.
python big5_preprocessing.py
- Speaker-independently corpus splitting for monologue experiments
python nocontext_dataset_split.py
- Speaker-independently corpus splitting for dialogue experiments
python context_dataset_split.py
- Speaker-independently monologue data augmentation
python nocontext_data_augmentation.py
- Speaker-independently dialogue data augmentation
python context_data_augmentation.py
- This allows to train a MLP model on the original monologue dataset without data augmentation.
python train.py
- Here are other settings for training:
- MLP model on the augmented monologue dataset.
python train.py --data_folder ./data/monologue_split_500k
- MLP model on the original dialogue dataset.
python train.py --data_folder ./data/dialogue_split_original --context 1 --context_model_type linear
- Proposed HCGNN model on the original dialogue dataset.
python train.py --data_folder ./data/dialogue_split_original --context 1 --context_model_type gcn-nospk2pred-lastnode --model_variant hcgnn
- For more details about the arguments, please refer to
train.py --help
.
This contains the best result we obtained in the paper, results on the test set are shown in the last several lines in the log file:
- log/monologue_split_500k_MLP.log
Here are some other results we obtained in the paper:
- log/monologue_split_original_MLP.log
- log/dialogue_split_original_MLP.log
- log/dialogue_split_original_HCGNN.log
If you find our work useful in your research, please consider citing:
@article{fu2024enhancing,
title={Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks},
author={Fu, Yahui and Song, Haiyue and Zhao, Tianyu and Kawahara, Tatsuya},
journal={arXiv preprint arXiv:2401.05871},
year={2024}
}
For any queries related to the paper or the implementation, feel free to contact:
- Haiyue Song is in charge of the data augmentation part. [email protected]
- Yahui Fu is in charge of the HC-GNN model part. [email protected]