This repository contains an implementation of a Deep Q-Network (DQN) to play Tic-Tac-Toe. The agent is trained to play against an opponent that makes optimal moves with a configurable probability, allowing for gradual difficulty scaling during training.
- Deep Q-Network implementation with target network for stable training
- Configurable opponent difficulty using
smartMovePlayer1
parameter - Experience replay buffer for improved learning
- Dynamic epsilon-greedy exploration strategy
- Performance evaluation and model checkpointing
- Visualization of training metrics
The DQN uses a neural network with the following architecture:
- Input layer: 9 neurons (one for each board position)
- Hidden layers: Two layers with 288 neurons each using ReLU activation
- Output layer: 9 neurons (Q-values for each possible move) using linear activation
The training process includes several key components:
- Experience Replay: Stores game transitions in a replay buffer with a maximum size of 10,000 experiences
- Target Network: Updated periodically to stabilize training
- Epsilon-Greedy Strategy:
- Starts with configurable initial epsilon
- Decays by factor of 0.9975
- Minimum epsilon of 0.05
The training process tracks several metrics:
- Win rate
- Draw rate
- Combined win and draw rate
- Training loss
- Epsilon values
- Opponent difficulty levels
These metrics are visualized in plots included in the repository.
To play a game against the trained model:
python TicTacToeDQN.py <smartMoveProbability>
Example:
python TicTacToeDQN.py 0.5
The smartMoveProbability
parameter (between 0 and 1) determines how often the opponent makes optimal moves:
- 0.0: Completely random moves
- 1.0: Always makes the best possible move
- 0.5: Makes optimal moves 50% of the time
To train the model, use the PlayerSQN
class:
agent = PlayerSQN(
epsilon=0.4, # Initial exploration rate
smartMovePlayer1=0.2, # Initial opponent difficulty
save_interval=100, # Episodes between checkpoints
model_save_path="model.weights.h5"
)
agent.train(num_episodes=1000)
TicTacToeDQN.py
: Main implementation of the DQN agentTicTacToePerfectPlayer.py
: Implementation of the opponent playermodel.weights.h5
: Saved model weightsplots/
: Directory containing training visualization plots
The repository includes visualization plots showing:
- Win rates over training episodes
- Combined win and draw rates
- Training loss
- Epsilon decay
- Opponent difficulty progression
- TensorFlow
- NumPy
- Matplotlib (for plotting)
- Python 3.x
The DQN implementation includes several optimizations:
- Double Q-learning with target network for stable training
- Dynamic difficulty adjustment based on performance
- State representation using -1 (opponent), 0 (empty), and 1 (agent) for board positions
- Batch training with size 64
- Gamma (discount factor) set to 0.99