Core Concepts

When evaluating a LLM based application, following are the important dimensions to consider:

1.Persona

LLM Persona refers to a distinct personality or identity that is attributed to the model's responses and interactions. Understanding the persona of LLM as well as of its users is equally important. This understanding can help us with factors such as – Engagement of interaction, tailoring the persona, user’s demographics/context, creating trust and building reliability.

For defining the LLM persona some important questions to be considered are:

Is the system open domain/book or closed domain/book?
What is the domain purpose/scope/boundary?
Is the system Conversational / non conversational?
LLM models used are Proprietary/open source?
Is the LLM model used as-is or RAG based or fine-tuned for the domain?

For defining the user persona some important questions to be considered are:

What is the geo/region of deployment?
Are there any regulatory, audit and compliance requirements?
Who are the end users: Internal (business unit, employees) or external (partners, vendors, customers)?
User attributes such as: expertise, interest, intention, device, OS, Platform, stage in marketing funnel, demographic and language preference?

2.Problem space

Problem space refers to the range of tasks, challenges, and scenarios that the model is designed to address or operate within. For example, an LLM may specialize in medical text, financial data, legal documents, or social media posts, each representing a distinct problem space with its own unique characteristics and challenges. Understanding the problem space is essential for training, fine-tuning, and evaluating LLMs effectively, as well as for designing applications and systems that leverage their capabilities to address real-world challenges in language processing and understanding.

Major problem space where LLMs are used in enterprise are:

Coding and debugging
QnA / Chatbot
Content creation
Translation
Information search & retrieval
Content moderation
Summarization
Automation with agents

3.Modality

Modality refers to the different types of input data or information sources that the model is designed to process and generate responses for. LLMs are now multimodal and can handle following modalities at both input and out layer:

Text
Audio
Image
Video

4.Capabilities of LLMs

LLMs capabilities can be categorized as intrinsic - which are an integral part of it and extrinsic – which need to be set explicitly. These need to be evaluated to make sure they behave in alignment with human values and business expectations. Some important capabilities are listed below, a detailed list can be found in the alignment mind map.

Accuracy
Fairness
Robustness
Intelligence
Audit
Privacy
Conversational
Regulations

5.Risks and Vulnerabilities of LLMs

As per OWASP, following are the major risks and vulnerabilities seen in LLM applications:

Prompt injections and jail breaking
Misinformation and/or hallucinations
Sensitive data leakage
Privacy concerns
Inadequate sandboxing
Security risks
Bias and fairness
Regulatory and legal challenges
Unauthorized access / code execution
Harmful/toxic content generation
Improper error handling
Copyright infringement

Test Types

1. Adversarial Testing

Following figure shows the different limitations and risks posed by LLMs that need to be tested to understand how and how much the LLM can act in malicious ways that are not in line with human values and enterprise policies. This is an exhaustive list so based on each LLM application, we will select the adversaries that needs to be tested.

To perform the adversary test, red teaming exercise is recommended. Red teaming is a best practice in the responsible development of LLM applications as it helps to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations by probing, testing, and attacking the AI systems.

2. Alignment Testing

Following figure shows the different capabilities against which LLM application need to be tested to understand how and how much the LLM is aligned with functional and non-functional requirements of the enterprise use case. This is an exhaustive list so based on each LLM application, we will select the capabilities that needs to be tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core Concepts

Test Types

1. Adversarial Testing

2. Alignment Testing

Clone this wiki locally