Skip to content

A Deep Policy Gradient implementation - REINFORCE and N-step A2C- for LunarLander-v2

Notifications You must be signed in to change notification settings

ibrahim-elshar/Deep_PG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment 3 part 1 for 10-703 Deep Reinforcement Learning and Control - CMU

This repository contains 2 Python scripts for solving the LunarLander-v2 openai gym environment using deep reinforcement learning algorithms: REINFORCE algorithm (without baseline) is found in reinforce.py; N-step Advantage Actor-Critic algorithm found in a2c.py

Note: current implementation of a2c.py contains critic network model parameters specific to n=1 a2c algorithm (namely, 30x30x30 MLP instead of 20x20x20 MLP used for n=20, 50, and 100).

usage: reinforce.py [-h] [--model-config-path MODEL_CONFIG_PATH]
                    [--num-episodes NUM_EPISODES] [--lr LR]
                    [--render | --no-render]

optional arguments:
  -h, --help            show this help message and exit
  --model-config-path MODEL_CONFIG_PATH
                        Path to the model config file.
  --num-episodes NUM_EPISODES
                        Number of episodes to train on.
  --lr LR               The learning rate.
  --render              Whether to render the environment.
  --no-render           Whether to render the environment.

=========================================================

usage: a2c.py [-h] [--model-config-path MODEL_CONFIG_PATH]
              [--num-episodes NUM_EPISODES] [--lr LR] [--critic-lr CRITIC_LR]
              [--n N] [--render | --no-render]

optional arguments:
  -h, --help            show this help message and exit
  --model-config-path MODEL_CONFIG_PATH
                        Path to the actor model config file.
  --num-episodes NUM_EPISODES
                        Number of episodes to train on.
  --lr LR               The actor's learning rate.
  --critic-lr CRITIC_LR
                        The critic's learning rate.
  --n N                 The value of N in N-step A2C.
  --render              Whether to render the environment.
  --no-render           Whether to render the environment.

About

A Deep Policy Gradient implementation - REINFORCE and N-step A2C- for LunarLander-v2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages