Skip to content

Latest commit

 

History

History
25 lines (14 loc) · 1 KB

README.md

File metadata and controls

25 lines (14 loc) · 1 KB

RL: Vanilla REINFORCE algorithms

REINFORCE algorithms

REINFORCE algorithm is the most basic policy grdient method that applies likelihood ratio policy gradient to learn a suitable policy.

pseudocode[1]

However, in my implementation, the policy gradient has combined with a baseline to increase stability. It is modified as followed:

Environment and Results

  • Discrete Action space : CartPole-v0
  • Continuous Action space: 2-link arm

Reference

[1] Reinforcement Learning: An Introduction