Attack to induce LLMs within hallucinations
-
Updated
May 17, 2024 - Python
Attack to induce LLMs within hallucinations
Papers about red teaming LLMs and Multimodal models.
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Restore safety in fine-tuned language models through task arithmetic
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
Repository accompanying the paper https://arxiv.org/abs/2407.14937
NeurIPS'24 - LLM Safety Landscape
Some Thoughts on AI Alignment: Using AI to Control AI
A prettified page for MIT's AI Risk Database
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.
Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.
To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."