Skip to content

Latest commit

 

History

History
61 lines (42 loc) · 2.79 KB

README.md

File metadata and controls

61 lines (42 loc) · 2.79 KB

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

📄 Paper • 🤗 Dataset • 🔱 Detector

Overview

M4GT-Bench: Multilingual, Multidomain, and Multi-generator corpus of MGTs. The benchmark is compiled of three tasks: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection where one need to identify, which particular model generated the text; and (3) mixed human-machine text detection, where a word boundary delimiting MGT from human-written content should be determined.

Dataset

English:

Multilingual:

Mixed:

Detectors

Task 1 Results

Monolingual

Multilingual

Task 2 Results

Task 3 Results

Human Evaluation

Sampling

We split 140 examples into four groups, each involving three domains and four generators, with 48 examples including five demonstrations for learning.

Results

Citation

If our work is useful for your own, you can cite us with the following BibTex entry:

@article{wang2024m4gt,
  title={M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection},
  author={Wang, Yuxia and Mansurov, Jonibek and Ivanov, Petar and Su, Jinyan and Shelmanov, Artem and Tsvigun, Akim and Afzal, Osama Mohanned and Mahmoud, Tarek and Puccetti, Giovanni and Arnold, Thomas and others},
  journal={to appear in ACL 2024},
  year={2024}
}