Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epsilon Greedy: graph for Accuracy, Performance...sensitive to the declaration order of the arm ? #21

Open
phsimon opened this issue Feb 28, 2022 · 0 comments

Comments

@phsimon
Copy link

phsimon commented Feb 28, 2022

By coding for retrieving performance curve as shown in chapter "analyzing Results from Monte Carlo (chapter 4) study" Approach 1 (Proba of selecting best arm) Approach 2 (Average Reward), I noticed that changing the order of the arm, may change dramatically the curves.
For instance if I declare the mean of my five arms as follow :
means=[0.8, 0.9, 0.1, 0.5, 0.5] n_arms=len(means) random.shuffle(means) arms=[BernoulliArm(mu) for mu in means]

The shuffle change the order within means list, therefore the order of the arms, and the performance curve may be very different.

for instance, considering [0.9, 0.5, 0.1, 0.5, 0.8] as order after shuffling I get this curve (average reward per time):

image

whereas considering [0.1, 0.5, 0.9, 0.5, 0.8] , I get

image

Do you have an explanation ? Same phenomenon for proba of selecting best arm (but no so much).

Parameters:
num_sims=1000
horizon=250
epsilon=0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant