REINFORCE algorithm is the most basic policy grdient method that applies likelihood ratio policy gradient to learn a suitable policy.
However, in my implementation, the policy gradient has combined with a baseline to increase stability. It is modified as followed:
- Discrete Action space : CartPole-v0
- Continuous Action space: 2-link arm