Personal project on multi-armed bandits.
- Bandits: GaussianBandit
- Policies: Fixed, Greedy, Random, EpsilonGreedy, UCB, ThompsonSampling
Source: Reinforcement learning, Chapter 2, by Richard S. Sutton and Andrew G.Barto.
This project was developed with python 3.7.7.
- Create virtual environment
python -m venv venv
- Source:
- Linux/macos
source venv/bin/activate
- Windows
- Linux/macos
- Upgrade pip
python -m pip install --upgrade pip
python -m pip install -U pip setuptools wheel
- Install requirements
python -m pip install -r requirements.txt
- Install tool
python -m pip install -e .
The repo has an example notebook with some plots as discussed in the book Reinforcement Learning Chapter 2.