Skip to content

Implementation of reinforcement learning algorithms for the OpenAI Gym environment LunarLander-v2

Notifications You must be signed in to change notification settings

lazavgeridis/LunarLander-v2

Repository files navigation

Introduction

Attempting to solve the LunarLander-v2 OpenAI Gym environment. For the time being, there are implementations for:

  1. Monte-Carlo
  2. Sarsa
  3. Q-learning
  4. DQN

Training Clips

These training snapshots are captured towards the end of the training phase (~10000 epsiodes). A random agent is also provided for comparison:

Random
random

Monte-Carlo
monte-carlo

Sarsa
sarsa

Q-learning
q-learning

DQN
dqn

Experiments

  1. lr=0.1
    all_agents_lr01
  2. lr=0.01
    all_agents_lr001
  3. lr=0.001
    all_agents_lr0001

Note that the value of learning rate (equivalently the constant step-size parameter) only affects the performance of Sarsa, Q-learning and DQN. The Monte-Carlo variant implemented here is called Every Visit Monte-Carlo and it uses a true running average in the policy evaluation step. Subsequently, learning rate does not have any effect on the algorithm's training progress.
For the 3 experiments conducted above, we compute the average reward for each rl algorithm (averaged across 10000 episodes of training)

Learning rate=0.1 Learning rate=0.01 Learning rate=0.001
Monte-Carlo -23.134 -53.209 -44.191
Sarsa 95.445 -55.731 -182.449
Q-learning 95.793 -77.473 -173.590
DQN -150.694 -70.869 206.948

Execution

The purpose of this project is to compare the performance of the rl algorithms implemented. So, for a complete comparison, meaning that all of the above agents are trained during program execution, run the following:

python train.py --agents random monte-carlo sarsa q-learning dqn

this will train each agent separately using default values, which are n_episodes=10000, lr=0.001, gamma=0.99, final_eps=0.01.

For a more customized execution, you can explicitly provide the value for each argument. For example, to only train a dqn agent:

python train.py --agents dqn --n_episodes 3000 --lr 0.0001 --gamma 0.99 --final_eps 0.02

After training a dqn agent, you can test how well it generalizes using:

python autopilot.py <num_episodes> models/*/qnetwork_{*}.pt

To compare dqn to a random agent:

python random_agent.py <num_episodes>

Implementation References

  1. OpenAI baselines
  2. Reinforcement Learning (DQN) Tutorial
  3. Solving The Lunar Lander Problem under Uncertainty using Reinforcement Learning

About

Implementation of reinforcement learning algorithms for the OpenAI Gym environment LunarLander-v2

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages