Skip to content
Tomáš Gavenčiak edited this page Jan 8, 2019 · 3 revisions

Collection of design choices made or to be considered.

Game types and classes

Proposed classes:

  • Game - Abstract base class.
    • PartialInformationGame - Partial information, custom observations (information set IDs).
      • ObservationSequenceGame - Partial information, perfect recall sequence-based observations.
      • SimultaneousGame - One-turn simultaneous game.
        • MatrixGame - Payoff determined by a matrix/ndarray.
    • PerfectInformationGame - Full information game (full situation observation).

Note that if update_state in a complete information game depends on history, "observing" state information is generally not enough (e.g. to decide strategy or send to client). Sending the whole situation may be then necessary.

Potential classes:

  • SimultaneousSequentialGame Full information (state observation) but simultaneous moves:
  • Partial information simultaneous game is implemented just as Game as there are no significant advantages.
  • SymmetricGame - ?

Minor choices

History as actions vs. action indices

Indices are more useful and compact while computing. Actions themselves may be better when visualising (e.g. to client side).

In sequential-observation game, the own-action observations are likely better as the actions directly (rather than indices).

Observations in StateInfo vs Situation

In sequential-observation game, StateInfo.observations are only the "new" observations, while Situation.observations are the complete observations. In general Game, Situation.observations are the complete tuple (usually alias of StateInfo.observations).

All observations in complete info games may are just None.

Game specification style

Internal game state can be updated in place or immutable and always created anew. ActivePlayer can be renamed to StateInfo. StateInfo should also contain (new/all) observations.

Some variants of update_state could accept Situation instead of State. While implementations could use some of the information there (history, observations, rewards, past StateInfo), it could also make the API more brittle. (But I would prefer situation anyway - Tomas).

Having Game.state_info() may be slower (recalculation or caching of some info computed on update) and may not be cleaner.

Const state [proposed]

  • Game.initial_state() -> (state, state_info)
  • Game.update_state(situation, action) -> (state, state_info)

Separate StateInfo

  • Game.initial_state() -> state
  • Game.update_state(situation, action) -> state
  • Game.state_info(state) -> state_info

Updating state in-place

  • Game.initial_state() -> (state, state_info)
  • Game.update_state(state, action) -> state_info

Separate info

  • Game.initial_state() -> state
  • Game.update_state(state, action)
  • Game.state_info(state) -> state_info

Action space

Fixed action set [proposed]

One fixed action list for a Game instance (e.g. in Game.actions), every action list is a subset, indexing always into this set.

(+): Easy to record and interpret (just indices), easy to encode as NN output.

(-): Some games may be hard to fit (examples? but games with huge action sets (card shuffling) are hard to learn anyway).

Arbitrary (open) action set [current]

(+): Supports large sets, seems flexible

(-): Type mishmashes (indices vs. actual numerical actions), reverse indexing, unintended very large action sets (e.g. floats).

Games with unlimited/large action sets

  • Shuffle whole deck (infeasible anyway, can be replaced by drawing indiv. cards)
  • Select from arbitrary many cards (e.g. Port Royal card selection)
    • In Port Royal, you can just offer all card types (would work better anyway)
  • Some other card games?
  • Bidding games have large/unbounded bids (quantize? how does deepstack do it?)

Solutions

  • Allow one unbounded set of actions (e.g. negative indices or all indices beyond last action)
    • Encoding from NN still has to be customized (e.g. bidding quantization, recurrent NN decoder, ...)