This is a very simple implementation for Path Consistency Learning (PCL). It currently only supports environments with a discrete action space and very simple environments.
First install the requirements
$ pip install -r requirements.txt
To run training on the CartPole environment:
$ python main.py
This logs the loss, reward and average sequence length to tensorboard, which can be viewed with
$ tensorboard --logdir=runs
Currently the implementation depends on initialization a lot, so you might need a few runs to get good results.
A very simple model for the cartpole environment is provided under res/models/cart_pole.
You can see it acting by running:
$ python test_model.py
- Add unified PCL
- Test on more complex environments
- Use epsilon-greedy strategy in the beginning to force exploration
- Implement prioritized replay buffer as described in the paper
- Test how expert trajectories improve convergence speed