This repository contains the codebase for my master thesis focused on developing DialogueReact, a novel framework for generating human-like dialogues suitable for complex social science simulations. The framework builds upon previous work in conversation synthesis by incorporating react prompting, dialogue acts, and agentic behaviour.
This codebase can also be used as a framework to experiment with agent-based modeling (ABM) simulations focused on conversations. It provides a flexible architecture for creating conversational agents with different capabilities (basic, memory-based, react-based) and managing multi-agent interactions through various LLM backends.
Here's a simple example of how to set up an experiment comparing different agent types:
from dialogue_react_agent import DialogueReactAgent
from chat_llm import Agent, MemoryAgent
from llm_engines import LLMApi
from groupchat_thread import ChatThread
from chat_eval import calc_perplexity, calc_distinct_n
# Setup LLM backend
llm = LLMApi() # or ChatgptLLM() for OpenAI
# Create different types of agents
basic_agent = Agent(name="Alice", llm=llm,
interests=["art", "music"],
behavior="friendly")
memory_agent = MemoryAgent(name="Bob", llm=llm,
interests=["science", "books"],
behavior="analytical")
react_agent = DialogueReactAgent(name="Carol", llm=llm,
interests=["technology", "philosophy"],
behavior="inquisitive")
# Run conversations with different agent combinations
chat1 = ChatThread(agent_list=[basic_agent, memory_agent], neutral_llm=llm)
chat2 = ChatThread(agent_list=[memory_agent, react_agent], neutral_llm=llm)
# Generate conversations
conv1 = chat1.run_chat(max_messages=20)
conv2 = chat2.run_chat(max_messages=20)
# Analyze results
metrics1 = {
'perplexity': calc_perplexity(conv1),
'distinct1': calc_distinct_n(conv1, n=1),
'distinct2': calc_distinct_n(conv1, n=2)
}
metrics2 = {
'perplexity': calc_perplexity(conv2),
'distinct1': calc_distinct_n(conv2, n=1),
'distinct2': calc_distinct_n(conv2, n=2)
}
# Compare different agent combinations
print("Basic + Memory agents:", metrics1)
print("Memory + React agents:", metrics2)
To use this library, you need to have the following:
- Python 3.x
- oobabooga/text-generation-webui with any instruct model loaded (or any OpenAI-like API) set up at
http://127.0.0.1:1200
- Alternatively, a GPT-3.5 model is provided as
ChatgptLLM
inllm_engines.py
. It will try to load anOPENAI_API_KEY
from a.env
file in the root directory of the project.
- Clone the repository:
git clone https://github.com/your-username/your-repo.git
- Install dependencies:
pip install -r requirements.txt
- Set up your environment variables in
.env
if using OpenAI's API
The framework supports multiple LLM backends through the llm_engines.py
module:
LLMApi
: Connects to a local instance of oobabooga's text-generation-webui athttp://127.0.0.1:1200
ChatgptLLM
: Uses OpenAI's GPT-3.5 API (requires API key in.env
)
from places_replication import NaiveConversationAgent, NaiveConversationGeneration
from llm_engines import LLMApi
# Create agents with personas
agents = [
NaiveConversationAgent("Alice", persona="She likes cats and philosophy of language."),
NaiveConversationAgent("Bob", persona="He likes dogs and classical opera.")
]
# Initialize generator with chosen LLM
generator = NaiveConversationGeneration(agents, neutral_llm=LLMApi())
# Generate conversation with minimum turns
conversation = generator.generate_conversation(min_turns=15)
from chat_eval import load_chat_history, calc_perplexity, calc_distinct_n
# Load a previous chat
chat_history = load_chat_history("chat_history/naive_chat_history_chat_1719249634.json")
# Calculate metrics
perplexity = calc_perplexity(chat_history)
distinct_1 = calc_distinct_n(chat_history, n=1) # Distinct-1 metric
distinct_2 = calc_distinct_n(chat_history, n=2) # Distinct-2 metric
For more advanced usage, including DialogueReact agents, memory-based agents, and group chats, refer to:
dialogue_react_agent.py
for DialogueReact implementationgroupchat_thread.py
for multi-agent conversationsadvanced_agent.py
for enhanced agent capabilities
The framework implements several types of agents with different capabilities:
- Basic Agent (
Agent
inchat_llm.py
): Simple agent with basic chat capabilities - Memory Agent (
MemoryAgent
): Agent with memory capabilities using vector stores - DialogueReact Agent (
DialogueReactAgent
): Main implementation incorporating react prompting and dialogue acts - Advanced Agent (
AdvancedAgent
): Extended agent with additional capabilities
To replicate the thesis results, follow these steps:
-
Dataset Preparation
- Download the Message-Persona-Context (MPC) dataset using
benchmark_fewshots.py
- Run
build_fits_dataset.py
to download and prepare the FITS dataset - Use
name_dist_experiment.py
to generate name distributions - Execute
synthetise_fits_personas.py
to create agent personas - Run
synthetise_dialogue_react.py
to prepare dialogue examples - Use
few_shots_benchmark.ipynb
to analyze the MPC dataset
- Download the Message-Persona-Context (MPC) dataset using
-
Generation Phase The generation process creates 100 comparable conversations for each approach:
a) Initial Setup
- 100 topics are randomly selected from the FITS dataset
- For each topic, 2 interested personas are selected from the generated dataset
- Names are generated for each persona pair
b) Baseline Generation (
thesis_chat_generation_naive.ipynb
)- Generates one-line conversation descriptions from personas
- Creates conversations using few-shot examples
- Ensures minimum 10 turns per conversation
- Regenerates failed conversations
c) Agentic Approaches (
thesis_chat_generation_non_naive.py
)- Instantiates agents with generated personas and names
- Starts with random predefined greetings
- Generates up to 50 turns per conversation
- Memory component uses n=10 for generation
- Reflection component uses n=25 for generation
d) Cross-LLM Validation (
thesis_chat_generation_non_naive_gemma2.py
)- Repeats the process using Gemma 2 27B
- Uses same persona pairs and topics
- Validates consistency across different LLMs
-
Evaluation
- Execute
running_eval.py
for the main evaluation pipeline - Alternative: use
running_eval_gemma2.py
for Gemma2 evaluations - Analyze results using
exploring_eval_results.ipynb
- Execute
dialogue_react_agent.py
: Implementation of the DialogueReact frameworkchat_llm.py
: Core LLM chat implementation and basic agent classesgroupchat_thread.py
: Management of multi-agent conversationsagent_factory.py
: Factory pattern for agent creationllm_engines.py
: LLM engine implementations
chat_eval.py
: Evaluation metrics and assessment toolsrunning_eval.py
: Main evaluation executiontest_*.py
files: Unit tests for components
reading_chats.ipynb
: Analysis of generated conversationsexploring_eval_results.ipynb
: Evaluation metrics visualizationchat_interactive.ipynb
: Interactive DialogueReact testingfew_shots_benchmark.ipynb
: Few-shot learning analysis
prompts/
: React prompting and dialogue act templatesfits_personas/
: Generated agent personaschat_logs/
: Generated conversationschat_history/
: Historical dialogue dataexample_messages.json
: Training dialogue datasetnames_dist.jsonl
: Name distribution data
thesis_chat_generation_*.py
: Main generation implementationssynthetise_*.py
: Persona and dialogue pattern generation
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Set up your environment variables in
.env
- Follow the replication steps above
- DialogueReact framework implementation
- React prompting and dialogue acts integration
- Multiple agent architectures
- Comprehensive evaluation metrics
- Interactive dialogue generation
- Extensive logging and analysis tools
The system generates various logs:
chat.log
: Generated conversationsevaluation.log
: Evaluation metricsagent_internal.log
: Agent reasoning statesrunning_eval.log
: Evaluation processlitellm.log
: LLM interactions
The framework includes tests for:
- DialogueReact components
- Agent behavior and interactions
- Dialogue generation quality
- React prompting effectiveness
The project includes a retro-style conversation viewer (conversation_stream.py
) that provides a visual way to review generated conversations:
python conversation_stream.py
Features:
- Matrix-style terminal interface with green text on black background
- Load and stream multiple JSON conversation files
- Character-by-character streaming animation
- Alternating colors for different speakers
- Spacebar to skip to next conversation
- ESC to exit fullscreen mode
The project includes a name generation analysis tool (name_dist_experiment.py
) that studies the relationship between personas and generated names:
python name_dist_experiment.py
This experiment:
- Takes 50 random personas
- Generates 100 names for each persona
- Tracks name frequency distribution
- Saves results to
names_dist.jsonl
- Helps understand LLM name generation patterns and biases
The results can be analyzed using name_analysis.ipynb
to understand patterns in how LLMs associate names with different personas.