Code for the paper MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate
Main libraries required (can be installed with pip):
transformers
datasets
pandas
numpy
openai
To use OpenAI models, it reads the API key from the environment variable OPENAI_API_KEY
. You can add it to your environment with the following command:
cond env config vars set OPENAI_API_KEY='your key'
The following datasets have been used in the experiments:
- TruthfulQA
- MMLU
- MedMCQA
- Scalr
Datasets can be downloaded from the following link: data download
The folder is expected to be saved in the directory: multiagent_debate/data
main.py
: It generates the general debate for all datasets
advers.py
: It generates the debate for the adversaries (currently OpenAI)advers_optim.py
: It generates the debate for the optimized attacker
evaluate.py
: It runs the evaluation for the all the files. Mode: [majority/judge]
@article{amayuelas2024multiagent,
title={MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate},
author={Amayuelas, Alfonso and Yang, Xianjun and Antoniades, Antonis and Hua, Wenyue and Pan, Liangming and Wang, William},
journal={arXiv preprint arXiv:2406.14711},
year={2024}
}