-
Notifications
You must be signed in to change notification settings - Fork 8
Design
Collection of design choices made or to be considered.
- Partial information, custom observations:
Game
- Full information (state observation) but simultaneous moves:
SimultaneousGame
- Partial information simultaneous game is implemented just as Game as there are no significant advantages.
- Partial information, perfect recall sequence-based observations:
ObservationGame
- Full information game (state observation)
CompleteInformationGame
Note that if update_state
in a complete information game depends on history, observing state information is not enough!
Internal game state can be updated in place or immutable and always created anew.
ActivePlayer
can be renamed to StateInfo
. StateInfo
should also contain (new/all) observations.
Some variants of update_state
could accept Situation
instead of State
. While implementations could use some of the information there (history, observations, rewards, past StateInfo
), it could also make the API more brittle. (But I would prefer situation
anyway - Tomas).
Having Game.state_info()
may be slower (recalculation or caching of some info computed on update) and may not be cleaner.
Game.initial_state() -> (state, state_info)
Game.update_state(situation, action) -> (state, state_info)
Game.initial_state() -> state
Game.update_state(situation, action) -> state
Game.state_info(state) -> state_info
Game.initial_state() -> (state, state_info)
Game.update_state(state, action) -> state_info
Game.initial_state() -> state
Game.update_state(state, action)
Game.state_info(state) -> state_info
One fixed action list for a Game
instance (e.g. in Game.actions
), every action list is a subset, indexing always into this set.
(+): Easy to record and interpret (just indices), easy to encode as NN output.
(-): Some games may be hard to fit (examples? but games with huge action sets (card shuffling) are hard to learn anyway).
(+): Supports large sets, seems flexible
(-): Type mishmashes (indices vs. actual numerical actions), reverse indexing, unintended very large action sets (e.g. floats).
- Shuffle whole deck (infeasible anyway, can be replaced by drawing indiv. cards)
- Select from arbitrary many cards (e.g. Port Royal card selection)
- In Port Royal, you can just offer all card types (would work better anyway)
- Some other card games?
- Bidding games have large/unbounded bids (quantize? how does deepstack do it?)
- Allow one unbounded set of actions (e.g. negative indices or all indices beyond last action)
- Encoding from NN still has to be customized (e.g. bidding quantization, recurrent NN decoder, ...)