Lunar-Lander-DRL-Dueling-DoubleDQN (D3QN)

This is a Deep Reinforcement Learning solution for the Lunar Lander problem in OpenAI Gym using dueling network architecture and the double DQN algorithm. The solution was developed in a Jupyter notebook on the Kaggle platform, utilizing the GPU P100 accelerator. You can find the model weights in the model folder and the results in csv format in the results folder.

Package Requirements

pip install swig, gym[box2d]

Description of Lunar Lander problem (LunarLander-v2)

Action Space	`Discrete(4)`
Observation Space	`Box([-1.5 -1.5 -5. -5. -3.1415927 -5. -0. -0. ], [1.5 1.5 5. 5. 3.1415927 5. 1. 1. ], (8,), float32)`
import	`gymnasium.make("LunarLander-v2")`

Possible actions

Action discrete value	Description
0	No action
1	Fire left orientation engine
2	Fire main engine
3	Fire left orientation engine

Observation Descriptions

Position in the observation or state space list	Description
0	Lander horizontal coordinate
1	Lander vertical coordinate
2	Lander horizontal speed
3	Lander vertical speed
4	Lander angle
5	Lander angular speed
6	Bool: 1 if first leg has contact, else 0
7	Bool: 1 if second leg has contact, else 0

Reward Descriptions

For each step in the environment, a reward is granted. The total reward for an episode is the sum of the rewards at each step. For each step, the reward:

is increased/decreased the closer/further the lander is to the landing pad.
is increased/decreased the slower/faster the lander is moving.
is decreased the more the lander is tilted (angle not horizontal).
is increased by 10 points for each leg that is in contact with the ground.
is decreased by 0.03 points each frame a side engine is firing.
is decreased by 0.3 points each frame the main engine is firing.
-100 for crashing the lander.
+100 points for landing safely.

An episode is considered a solution if it scores at least 200 points.

Episode termination conditions

The lander crashes.
The lander gets outside of the viewport (x coordinate is greater than 1).
Episode length > 400.

Dueling Double DQN solution (D3QN)

Train

Test

Parameter	Value
Number of episodes	5000
Learning rate	0.00015
Discount Factor	0.99
Epsilon	1.0
Desired epsilon	0.01
Epsilon decay	Epsilon - Desired Epsilon / 500,000 states
Batch size	32
Target network update rate (training steps)	150
Loss function used	HuberLoss
Optimizer used	Adam
Memory size	1,000,000
Storage experience size condition for start the training	5,000
Experience replay strategy	Uniform

Parameter	Value
Number of episodes	10
Epsilon	0.01

Note: You can experiment with various hyperparameters to achieve improved results.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
model		model
results		results
LICENSE		LICENSE
Lunar-Lander-D3DQN-Adam-HuberLoss-lr00015.ipynb		Lunar-Lander-D3DQN-Adam-HuberLoss-lr00015.ipynb
README.md		README.md
show_landing.gif		show_landing.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunar-Lander-DRL-Dueling-DoubleDQN (D3QN)

Package Requirements

Description of Lunar Lander problem (LunarLander-v2)

Possible actions

Observation Descriptions

Reward Descriptions

Episode termination conditions

Dueling Double DQN solution (D3QN)

Results

Test result video

About

Releases

Packages

Languages

License

EnriqManComp/Lunar-Lander-DRL-Dueling-DoubleDQN

Folders and files

Latest commit

History

Repository files navigation

Lunar-Lander-DRL-Dueling-DoubleDQN (D3QN)

Package Requirements

Description of Lunar Lander problem (LunarLander-v2)

Possible actions

Observation Descriptions

Reward Descriptions

Episode termination conditions

Dueling Double DQN solution (D3QN)

Results

Test result video

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages