#

llm-safety

Here are 10 public repositories matching this topic...

PKU-YuanGroup / Hallucination-Attack

Attack to induce LLMs within hallucinations

nlp machine-learning deep-learning ai-safety adversarial-attacks hallucinations llm llm-safety

Updated May 17, 2024
Python

Libr-AI / OpenRedTeaming

Papers about red teaming LLMs and Multimodal models.

safety awesome-list papers language-model redteaming llm-safety

Updated Nov 22, 2024

Babelscape / ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

nlp benchmark ai artificial-intelligence nlp-machine-learning red-teaming bias-detection safety-monitoring transformers-models llm llm-evaluation llm-safety llm-safety-benchmark

Updated Sep 20, 2024
Python

declare-lab / resta

Restore safety in fine-tuned language models through task arithmetic

alignment safety alignment-algorithm llm llms llm-safety llms-benchmarking llm-safety-benchmark

Updated Mar 28, 2024
Python

llm-editing / editing-attack

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

llms knowledge-editing llm-safety

Updated Nov 9, 2024
Python

dapurv5 / awesome-red-teaming-llms

Repository accompanying the paper https://arxiv.org/abs/2407.14937

awesome awesome-list ai-safety adversarial-attacks red-teaming ai-security llm-security llm-safety

Updated Nov 12, 2024

poloclub / llm-landscape

NeurIPS'24 - LLM Safety Landscape

llm llm-safety safety-basin llm-safety-landscape llm-landscape

Updated Oct 29, 2024
Python

Dicklesworthstone / some_thoughts_on_ai_alignment

Some Thoughts on AI Alignment: Using AI to Control AI

ai alignment ai-alignment llm-safety llm-aligment

Updated Jun 20, 2024

Privacy-Engineering-CMU / ai-risk-prettified

A prettified page for MIT's AI Risk Database

machine-learning privacy ai deep-learning risk-analysis risk jailbreak artificial-intelligence safety ai-safety ethics ai-ethics ai-risk ethics-in-ai large-language-models llm llms large-language-model llm-safety

Updated Aug 24, 2024
HTML

copyleftdev / ai-testing-prompts

Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.

ai-explainability llm-safety large-language-models-testing ai-testing-best-practices gpt-model-reliability ai-bias-testing ai-security-testing model-compliance ai-performance-optimization machine-learning-testing-frameworks

Updated Apr 15, 2024

Improve this page

Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."