Pytorch implementation of deep Q-learning on the openAI lunar lander environment

Q-learning agent is tasked to learn the task of landing a spacecraft on the lunar surface.

Environment is provided by the openAI gym 1

Base environment and agent is written in RL-Glue standard 2, providing the library and abstract classes to inherit from for reinforcement learning experiments.

Updates:

Added expected sarsa functionality. Change agent_parameter['name'] to either 'q-learning' or 'expected_sarsa' for each type of learning algorithm.

Results

type of agent	reward sum for each episode	last episode
Q learning agent
Expected sarsa agent

Lunar Lander Environment

Rewards

The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector. Reward for moving from the top of the screen to the landing pad and zero speed is about 100..140 points. If the lander moves away from the landing pad it loses reward. The episode finishes if the lander crashes or comes to rest, receiving an additional -100 or +100 points. Each leg with ground contact is +10 points. Firing the main engine is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200 points.

State

s[0] is the horizontal coordinate
s[1] is the vertical coordinate
s[2] is the horizontal speed
s[3] is the vertical speed
s[4] is the angle
s[5] is the angular speed
s[6] 1 if first leg has contact, else 0
s[7] 1 if second leg has contact, else 0

Actions

Four discrete actions available:

0: do nothing

1: fire left orientation engine

2: fire main engine

3: fire right orientation engine

Implementation Details

    # Experiment parameters
    experiment_parameters = {
        "num_episodes" : 500,
        "checkpoint_freq": 100,
        "print_freq": 1,
        "load_checkpoint": None,
        # OpenAI Gym environments allow for a timestep limit timeout, causing episodes to end after 
        # some number of timesteps.
        "timeout" : 1600
    }
    # Environment parameters
    environment_parameters = {
        "gym_environment": 'LunarLander-v2',
        'solved_threshold': 200,
        'seed': 0
    }
    # Agent parameters
    device = "cuda" if torch.cuda.is_available() else "cpu"
    agent_parameters = {
        'network_config': {
            'state_dim': 8,
            'num_hidden_units': 256,
            'action_dim': 4,
            'seed': 0
        },
        'optimizer_config': {
            'step_size': 1e-3,
            'betas': (0.9, 0.999)
        },
        'name': 'q-learning',
        'device': device,
        'replay_buffer_size': 50000,
        'minibatch_size': 64,
        'num_replay_updates_per_step': 4,
        'gamma': 0.99,
        'tau': 0.001,
        'checkpoint_dir': 'model_weights',
        'seed': 0
    }

Self implemented Softmax

Implemented own softmax equation to avoid overflow problems from taking exponential of large numbers, using the softmax(x) = softmax(x-c) identity. 𝜏 is the temperature parameter which controls how much the agent focuses on the highest valued actions. The smaller the temperature, the more the agent selects the greedy action. Conversely, when the temperature is high, the agent selects among actions more uniformly random.

Dependencies

Instructions for installing openAI gym environment in Windows
Tqdm
ffmpeg (conda install -c conda-forge ffmpeg)
pytorch (conda install pytorch torchvision cudatoolkit=10.2 -c pytorch)
numpy

How to use

Training model for lunar lander environment

git clone https://github.com/Jason-CKY/lunar_lander_DQN.git
cd lunar_lander_DQN
Edit experiment parameters in main.py
python main.py

usage: test.py [-h] [--env ENV] [--agent AGENT] [--checkpoint CHECKPOINT] [--gif]

optional arguments:
  -h, --help                show this help message and exit
  --env ENV                 Environment name
  --agent AGENT             Agent name (q-learning/expected_sarsa)
  --checkpoint CHECKPOINT   Name of checkpoint.pth file under model_weights/env/agent/
  --gif                     Save rendered episode as a gif to model_weights/env/agent/recording.gif

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
Agents		Agents
Environments		Environments
images		images
model_weights/LunarLander-v2		model_weights/LunarLander-v2
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
__init__.py		__init__.py
main.py		main.py
plot.py		plot.py
rl_glue.py		rl_glue.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch implementation of deep Q-learning on the openAI lunar lander environment

Updates:

Results

Lunar Lander Environment

Rewards

State

Actions

Implementation Details

Self implemented Softmax

Dependencies

How to use

Training model for lunar lander environment

About

Releases

Packages

Languages

Jason-CKY/lunar_lander_DQN

Folders and files

Latest commit

History

Repository files navigation

Pytorch implementation of deep Q-learning on the openAI lunar lander environment

Updates:

Results

Lunar Lander Environment

Rewards

State

Actions

Implementation Details

Self implemented Softmax

Dependencies

How to use

Training model for lunar lander environment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages