You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've implemented random actions drawn from an arbitrary distribution for the obstacles;
However, calcReward does not take into account this new reality.
The issue is that calcReward uses extends the current state using the current action to see if there would be a crash... a more reliable source of truth would be to calculate reward based on the original and following state
In MCTS, we simply simulate soooo we can actually use simulated obstacle actions to see what our expected value will be. In FS, an option would be to simply get the expected value.
Basically, assume that "obstacles" draw from a distribution over their actions and act accordingly.
This will likely explode the state space, so we should try to come up with a cool method to solve it :)
The text was updated successfully, but these errors were encountered: