How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

rezunli96 · 2019-03-14T16:27:15Z

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where it seems it involves many computations to enumerate all possible mean-field actions and their probabilities. I took a look at you MF-AC implementation in battle-game, but it appears to me (please correct me if i am wrong) here the MF-values are substituted with the returns from the sampled trajectory? Could you explain more about how to calculate the MF-value eq(10), for both MF-AC and MF-Q? Thanks

rezunli96 · 2019-03-14T16:53:37Z

It just occurred to me that the sampled trajectory is an unbiased estimator of the MF-Value? It works for REINFORCE-like AC. But still confused how to calculated for off-policy RL like MF-Q?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

rezunli96 commented Mar 14, 2019

rezunli96 commented Mar 14, 2019

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q #12

Comments

rezunli96 commented Mar 14, 2019

rezunli96 commented Mar 14, 2019