Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QLearningAgent learns incorrect results #1247

Open
CrosleyZack opened this issue Jan 20, 2022 · 0 comments
Open

QLearningAgent learns incorrect results #1247

CrosleyZack opened this issue Jan 20, 2022 · 0 comments

Comments

@CrosleyZack
Copy link

I believe there is a flaw in the QLearningAgent implementation in reinforcement.py, possibly resulting from how run_single_trial is written.

I was testing this with the 4x3 environment problem given in 17.1. Upon reaching a terminal state (TERMINAL?(s1) == True), the __call__ function returns None. This causes run_single_trial to exit. If called again in a loop for multiple trials (IE for _ in range(N): run_single_trial(agent_program, mdp)), this results in a call to QLearningAgent.__call__ with s1 being the initial state [(1,1) for 4x3 environment], r1 being the reward for this state (-0.04 for 4x3 environment), TERMINAL?(s) == TRUE [as s is either (4,2) or (4,3)], and a == None. This then sets Q[s, None] = r1 = -0.04, instead of the actual termination value of 1 or -1. This results in an incorrect policy. Simply change line 93 to Q[s, None] = r fixes the issue and learns a correct policy.

I recognize this does not match the pseudocode in the book (21.8), and I am not certain if this is simply due to the implementation of run_single_trial. A better fix may be available which more closely matches the pseudocode from 21.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant