M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Overview

M4GT-Bench: Multilingual, Multidomain, and Multi-generator corpus of MGTs. The benchmark is compiled of three tasks: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection where one need to identify, which particular model generated the text; and (3) mixed human-machine text detection, where a word boundary delimiting MGT from human-written content should be determined.

Dataset

English:

Multilingual:

Mixed:

Detectors

Task 1 Results

Monolingual

Multilingual

Task 2 Results

Task 3 Results

Human Evaluation

Sampling

We split 140 examples into four groups, each involving three domains and four generators, with 48 examples including five demonstrations for learning.

Results

Citation

If our work is useful for your own, you can cite us with the following BibTex entry:

@article{wang2024m4gt,
  title={M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection},
  author={Wang, Yuxia and Mansurov, Jonibek and Ivanov, Petar and Su, Jinyan and Shelmanov, Artem and Tsvigun, Akim and Afzal, Osama Mohanned and Mahmoud, Tarek and Puccetti, Giovanni and Arnold, Thomas and others},
  journal={to appear in ACL 2024},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Overview

Dataset

English:

Multilingual:

Mixed:

Detectors

Task 1 Results

Monolingual

Multilingual

Task 2 Results

Task 3 Results

Human Evaluation

Sampling

Results

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Overview

Dataset

English:

Multilingual:

Mixed:

Detectors

Task 1 Results

Monolingual

Multilingual

Task 2 Results

Task 3 Results

Human Evaluation

Sampling

Results

Citation