Skip to content

MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing

Notifications You must be signed in to change notification settings

Suda-iaiNLP/MODDP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing

Implementation of the paper MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing. The paper has been accepted in Findings ACL 2024.

Abstract

Dialogue discourse parsing (DDP) aims to capture the relations between utterances in the dialogue. In everyday real-world scenarios, dialogues are typically multi-modal and cover open-domain topics. However, most existing widely used benchmark datasets for DDP contain only textual modality and are domain-specific. This makes it challenging to accurately and comprehensively understand the dialogue without multi-modal clues, and prevents them from capturing the discourse structures of the more prevalent daily conversations. This paper proposes MODDP, the first multi-modal Chinese discourse parsing dataset derived from open-domain daily dialogues, consisting 864 dialogues and 18,114 utterances, accompanied by 12.7 hours of video clips. We present a simple yet effective benchmark approach for multi-modal DDP. Through extensive experiments, we present several benchmark results based on MODDP. The significant improvement in performance from introducing multi-modalities into the original textual unimodal DDP model demonstrates the necessity of integrating multi-modalities into DDP.

Requirements

Pytorch >= 2.1.1

Transformers >= 4.18.0

Data Preparation

You can directly load the text data from the dataset folder and download the image and audio features from all_features.pkl.

If the link is broken or you need the original video data, please contact [email protected].

Training

python main.py \
    --config_file ./config.cfg \
    --seed 42 \
    --postfix experiments/train \
    --text_plm_name_or_path /path/to/roberta \
    --vision_plm_name_or_path /path/to/vit \
    --audio_plm_name_or_path /path/to/wav2vec2 \
    --bert_path /path/to/bert \

Or run directly

bash run.sh

Predict and Evaluation

python main.py \
    --config_file ./config.cfg \
    --seed 42 \
    --postfix experiments/predict \
    --text_plm_name_or_path /path/to/roberta \
    --vision_plm_name_or_path /path/to/vit \
    --audio_plm_name_or_path /path/to/wav2vec2 \
    --bert_path /path/to/bert \
    --ckpt_path /path/to/best/model \
    --train False \
    --predict True \

Citation

If you find this repo helpful, please cite the following paper:

    title = "{MODDP}: A Multi-modal Open-domain {C}hinese Dataset for Dialogue Discourse Parsing",
    author = "Gong, Chen  and
      Kong, DeXin  and
      Zhao, Suxian  and
      Li, Xingyu  and
      Fu, Guohong",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.628",
    doi = "10.18653/v1/2024.findings-acl.628",
    pages = "10561--10573",
}

About

MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published