You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I recently get some confusion when trying to reproduce your work, particular about experiment (1) on gaussian squeezing. According to my understanding in order to implement MAA2C algorithm as described in the DeepMind's NeurIPS 17 paper, the critic network should represent the Q-value function which takes joint action of the players into input. However, it seems that gaussian squeeze task is a stateless environment. According to your implementation details, there is a discount factor \gamma for AC methods but not for Q-learning method. So how do you define the state for gaussian squeezing? And if it is stateless, how can one use A2C methods?
The text was updated successfully, but these errors were encountered:
Hi I recently get some confusion when trying to reproduce your work, particular about experiment (1) on gaussian squeezing. According to my understanding in order to implement MAA2C algorithm as described in the DeepMind's NeurIPS 17 paper, the critic network should represent the Q-value function which takes joint action of the players into input. However, it seems that gaussian squeeze task is a stateless environment. According to your implementation details, there is a discount factor \gamma for AC methods but not for Q-learning method. So how do you define the state for gaussian squeezing? And if it is stateless, how can one use A2C methods?
The text was updated successfully, but these errors were encountered: