Maze solving with different algorithms and their comparisons.

With Value Iteraiton and Policy Iteration
Q-learning
Double-Q-learning
SARSA (State-Action-Reward-State-Action)

1) VI and PI

2) Q-learning

3) Double-Q-learning

4) SARSA

SARSA using Q-learning

You can find the code for results below here. In which, we will collect the rewards for 5 runs and plot them together to see any patterns.

We see the common pattern that the rewards are initially bad, but as the number of episodes increases, the agent gets better and the reward reach an asymptote.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Maze solving with different algorithms and their comparisons.

1) VI and PI

2) Q-learning

3) Double-Q-learning

4) SARSA

SARSA using Q-learning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Maze solving with different algorithms and their comparisons.

1) VI and PI

2) Q-learning

3) Double-Q-learning

4) SARSA

SARSA using Q-learning