RL: Vanilla REINFORCE algorithms

REINFORCE algorithms

REINFORCE algorithm is the most basic policy grdient method that applies likelihood ratio policy gradient to learn a suitable policy.

However, in my implementation, the policy gradient has combined with a baseline to increase stability. It is modified as followed:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
pic		pic
README.md		README.md
REINFORCE_continuous.py		REINFORCE_continuous.py
REINFORCE_discrete.py		REINFORCE_discrete.py