CodeAttack 🧑‍💻🐞

A novel jailbreak method CodeAttack to systematically investigate the safety vulnerability issues of LLMs in the domain of code.

CodeAttack is one of the strongest jailbreak methods for LLMs so far.

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

An enhanced version of CodeAttack, highly effective against the latest GPT-4 and Claude-3 series models, will be available next week!

👉 Paper

For more details, please refer to our paper ACL 2024.

🛠️ Usage

✨An example run:

python3 main.py --num-sample 1 \
--prompt-type python_stack_plus \
--query-file ./data/harmful_behaviors.csv \
--target-model=gpt-4o \
--judge-model=gpt-4o \
--exp-name=main \
--target-max-n-tokens=1000 \
--multi-thread \
--temperature 0 \
--start-idx 0 --end-idx -1

Experiments

The 'data' foler contains the dataset harmful_behaviors.csv from AdvBench. You can add your original dataset here.
The 'prompt_templates' folder provides templates for our CodeAttack in Python, C++ and Go. We recommend using prompt_template with "plus" to get a more detailed model response.
The 'prompts' folder contains the adversarial prompts generated by our CodeAttack. For convenience, we include three versions of CodeAttack curated using AdvBench: data_python_string_full.json, data_python_list_full.json, and data_python_stack_full.json.

💡Framework

Overview of our CodeAttack. CodeAttack constructs a code template with three steps: (1) Input encoding which encodes the harmful text-based query with common data structures; (2) Task understanding which applies a decode() function to allow LLMs to extract the target task from various kinds of inputs; (3) Output specification which enables LLMs to fill the output structure with the user’s desired content.

⭐️Results

Citation

If you find our paper&tool interesting and useful, please feel free to give us a star and cite us through:

@inproceedings{
Ren2024codeattack,
title={Exploring Safety Generalization Challenges of Large Language Models via Code},
author={Qibing Ren and Chang Gao and Jing Shao and Junchi Yan and Xin Tan and Wai Lam and Lizhuang Ma},
booktitle={The 62nd Annual Meeting of the Association for Computational Linguistics},
year={2024},
url={https://arxiv.org/abs/2403.07865}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
figs		figs
prompt_templates		prompt_templates
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_preparation.py		data_preparation.py
judge.py		judge.py
main.py		main.py
post_processing.py		post_processing.py
requirements.txt		requirements.txt
target_llm.py		target_llm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAttack 🧑‍💻🐞

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

👉 Paper

🛠️ Usage

Experiments

💡Framework

⭐️Results

Citation

About

Releases

Packages

Contributors 2

Languages

License

renqibing/CodeAttack

Folders and files

Latest commit

History

Repository files navigation

CodeAttack 🧑‍💻🐞

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

👉 Paper

🛠️ Usage

Experiments

💡Framework

⭐️Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages