-
Notifications
You must be signed in to change notification settings - Fork 8
Design
Collection of design choices made or to be considered.
Proposed classes:
-
Game
- Abstract base class.-
PartialInformationGame
- Partial information, custom observations (information set IDs).-
ObservationSequenceGame
- Partial information, perfect recall sequence-based observations. -
SimultaneousGame
- One-turn simultaneous game.-
MatrixGame
- Payoff determined by a matrix/ndarray
.
-
-
-
PerfectInformationGame
- Full information game (full situation observation).
-
Note that if update_state
in a complete information game depends on history, "observing" state information is generally not enough (e.g. to decide strategy or send to client). Sending the whole situation may be then necessary.
Potential classes:
-
SimultaneousSequentialGame
Full information (state observation) but simultaneous moves: - Partial information simultaneous game is implemented just as Game as there are no significant advantages.
-
SymmetricGame
- ?
Indices are more useful and compact while computing. Actions themselves may be better when visualising (e.g. to client side).
In sequential-observation game, the own-action observations are likely better as the actions directly (rather than indices).
In sequential-observation game, StateInfo.observations
are only the "new" observations, while Situation.observations
are the complete observations. In general Game
, Situation.observations
are the complete tuple (usually alias of StateInfo.observations
).
All observations in complete info games may are just None
.
Internal game state can be updated in place or immutable and always created anew.
ActivePlayer
can be renamed to StateInfo
. StateInfo
should also contain (new/all) observations.
Some variants of update_state
could accept Situation
instead of State
. While implementations could use some of the information there (history, observations, rewards, past StateInfo
), it could also make the API more brittle. (But I would prefer situation
anyway - Tomas).
Having Game.state_info()
may be slower (recalculation or caching of some info computed on update) and may not be cleaner.
Game.initial_state() -> (state, state_info)
Game.update_state(situation, action) -> (state, state_info)
Game.initial_state() -> state
Game.update_state(situation, action) -> state
Game.state_info(state) -> state_info
Game.initial_state() -> (state, state_info)
Game.update_state(state, action) -> state_info
Game.initial_state() -> state
Game.update_state(state, action)
Game.state_info(state) -> state_info
One fixed action list for a Game
instance (e.g. in Game.actions
), every action list is a subset, indexing always into this set.
(+): Easy to record and interpret (just indices), easy to encode as NN output.
(-): Some games may be hard to fit (examples? but games with huge action sets (card shuffling) are hard to learn anyway).
(+): Supports large sets, seems flexible
(-): Type mishmashes (indices vs. actual numerical actions), reverse indexing, unintended very large action sets (e.g. floats).
- Shuffle whole deck (infeasible anyway, can be replaced by drawing indiv. cards)
- Select from arbitrary many cards (e.g. Port Royal card selection)
- In Port Royal, you can just offer all card types (would work better anyway)
- Some other card games?
- Bidding games have large/unbounded bids (quantize? how does deepstack do it?)
- Allow one unbounded set of actions (e.g. negative indices or all indices beyond last action)
- Encoding from NN still has to be customized (e.g. bidding quantization, recurrent NN decoder, ...)