Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

Open
rezunli96 opened this issue Mar 14, 2019 · 1 comment
Open

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

rezunli96 opened this issue Mar 14, 2019 · 1 comment

Comments

@rezunli96
Copy link

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where it seems it involves many computations to enumerate all possible mean-field actions and their probabilities. I took a look at you MF-AC implementation in battle-game, but it appears to me (please correct me if i am wrong) here the MF-values are substituted with the returns from the sampled trajectory? Could you explain more about how to calculate the MF-value eq(10), for both MF-AC and MF-Q? Thanks

@rezunli96
Copy link
Author

It just occurred to me that the sampled trajectory is an unbiased estimator of the MF-Value? It works for REINFORCE-like AC. But still confused how to calculated for off-policy RL like MF-Q?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant