Skip to content

Implementation of Reinforcement Learning Agents

Notifications You must be signed in to change notification settings

fkabs/rl-agents

Repository files navigation

rl-agents

Implementations of Reinforcement Learning agents based on The Bible of Reinforcement Learning [1]

Multi-armed Bandits

Multi-armed Bandits are implemented with stationary and non-stationary environments using following action-selection methods:

  • Static
  • Random
  • Greedy
  • ε-greedy
  • Split
  • Linear decay ε-greedy
  • Optimistic
  • UCB
  • Gradient (w/ and w/o baseline)

Dynamic Programming

The Dynamic Programming implementation consists of an algebraic solution as well as an random agent with seperate case and in-place iterative solutions.

Monte Carlo Methods

Following agents are already implemented:

  • First-Visit / Every-Visit
  • On-Policy / Off-Policy

Temporal-Difference Learning

Following Agents are already implemented:

  • Sarsa
  • Expected Sarsa
  • Q-Learning
  • Double Q-Learning

References

[1] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Second edition. Cambridge, Massachusetts: The MIT Press, 2018.

About

Implementation of Reinforcement Learning Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published