Implementations of Reinforcement Learning agents based on The Bible of Reinforcement Learning [1]
Multi-armed Bandits are implemented with stationary and non-stationary environments using following action-selection methods:
- Static
- Random
- Greedy
- ε-greedy
- Split
- Linear decay ε-greedy
- Optimistic
- UCB
- Gradient (w/ and w/o baseline)
The Dynamic Programming implementation consists of an algebraic solution as well as an random agent with seperate case and in-place iterative solutions.
Following agents are already implemented:
- First-Visit / Every-Visit
- On-Policy / Off-Policy
Following Agents are already implemented:
- Sarsa
- Expected Sarsa
- Q-Learning
- Double Q-Learning
[1] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Second edition. Cambridge, Massachusetts: The MIT Press, 2018.