Bots to play VGC; random and DQN player created; this project is abandoned for Elite Furret AI. Here are my initial notes from readings:
- You and opponent make turns simultaneously
- Your opponents moves affect the effects of your moves (e.g. if you’re KO’ed)
- Large aspect of luck / randomness:
- Damage Rolls
- Move success (accuracy, multiple protects, OHKOs)
- Secondary Move effects
- Luck-based abilities (Super Luck, Effect Spore)
- Critical Hits
- Moves made now can affect outlook through next couple of turns (e.g. tailwind, reflect, protect)
- Imperfect knowledge
- Abilities
- Items
- Moves
- EV spreads
- IV spreads
- Pokemon brought
- Why VGC will be more computationally difficult
- Action Space; Singles Action Space is 13 and Doubles Action Space is 418.
- Turn Action Space; Singles is 169 while Doubles is 174,724 (ignoring probabilities which exacerbates this because the probabilities compound)
- More emphasis on teampreview (since there's an extra element of picking 4 out of 6). However, whie this is certainly another type of problem, it will be comparatively trivial to solve.
- We need to learn a “policy”; given the game state, and thus a set of possible actions, ML will choose the most optimal action
- Because of terms like “reverse sweeping” (e.g. going down 4 to 1 so that you can win with your last mon), it’s hard to have heuristics of whether a game was totally or just lost — the best way to approach this would be to reward the ML with +1 with a win and a -1 with a loss. We can add more heuristics to speed up training though
- Input:
- State: All knowable information of the current battle, your pokemon and the opponent's team
- Possible Actions:
- Singles: 4 actions * tera + 5 switches = 13
- Doubles: (4 actions * 3 targets * tera + 2 switches)^2 = 676 options
- In reality, there are more limited options (530 total, detailed below), but we can stick w/ 676 for simplicity
- if two switches: 2
- if one switch: 2 (cuz either mon could switch) *
- switch mon: 2 possible switches *
- non-switch mon: 4 moves * 3 targets * tera
- = 96
- if no mon switches:
- if one teras: 2 possible teras * (4 moves * 3 targets) * (4 moves * 3 targets)
- if none teras: (4 moves * 3 targets)*(4 moves * 3 targets)
- = 432
- In reality, there are more limited options (530 total, detailed below), but we can stick w/ 676 for simplicity
- Output:
- Best action (or output of our policy given the battle's state)
- There is no such thing as a “best action” because a perfectly predictable strategy is exploitable. An understanding of how good each action is ideal
- Best action (or output of our policy given the battle's state)
- Primarily, ignore the existence of Zoroark and Ditto :)
- Pokemon is a game of wincons
- Goal is to eliminate all other pokemon that are threats to your wincon and outspeed/KO the rest
- The ability to set yourself up and “look ahead” will be key
- Trainers think about HP not in the number, but in a way that matters: “survive two hits from kartana”
- Players dont bank on unlikely secondary effects unless absolutely necessary
- If we can eliminate pokemon each turn regardless of what the other player does, we win
- They work backwards:
- In what situations will I win and how can I set up those situations?
- Is it possible to identify 2+ x2 (where the opponent only has 2 mons left) situations in which we win and work backwards from there? As in, what HP/conditions will I need to beat the last two mons?
- In what situations will I lose and how can I avoid those situations?
- Is it possible to look at the situations in which I only have 2 mons left and I lose?
- In what situations will I win and how can I set up those situations?
- In winning positions, they start making decisions according to probabilities: how can you minimize P(loss)?
- Markov Property; previous states don't matter given the current battle state. However, two caveats here:
- This is certainly not true in a competitive format where certain players have tendencies (e.g .play more or less aggressively). Whether or not players currently play optimally enough for us to need to maximize rewards based on another player's likelihood to make a move is stil an unknown.
- Cardinality of battle states is for all intents-and-purposes infinite (e.g. PP, HP values, timer)
- Stochastic Game (players make moves simultaneously in decision-making scenarios; the joint actions results in a transition to a different game state)
- Multi-Agent (two player)
- Zero Sum (aka constant sum; there is no mutually beneficial reward)
- Partially Observable (each player has imperfect information)
- Reinforcement Learning because:
- You dont know the probability of getting to the next state (due to opponent decision)
- You don’t immediately know which states lead you to the reward
- Second Choice: Monte Carlo Tree Searching?
- This would only really be applicable in Singles since in Doubles, this would be really expensive
- We could create heuristics to limit your search space (e.g. not attacking your own pokemon unless healing or weakness policy)
- “Backwards Induction” (implemented as Alpha-Beta pruning) can help prune gamestates
- Intelligent prioritization of branches to search
- Twitter Discussion: https://twitter.com/TBFUnreality/status/1059894045177200645
- Simulator Github: https://github.com/smogon/pokemon-showdown
- Paper on Poker ML using Imperfect Information: http://modelai.gettysburg.edu/2013/cfr/cfr.pdf
- Someone creating DeepLearning for 1:1 battle: https://towardsdatascience.com/poke-agent-pokemon-battling-reinforcement-learning-27ef10885c5c
- Create a showdown bot to operate easily: https://github.com/dramamine/leftovers-again
- ^ It will make a request and pass a gamestate, which we will return with a move
- Working Pokemon AI: https://github.com/vasumv/pokemon_ai/ (windows only)
- Minimax AI: http://doublewise.net/pokemon/ (named Technical Machine)
- Someone doing simple deep learning: https://web.stanford.edu/class/aa228/reports/2018/final151.pdf
- Schoolwork: https://docplayer.net/63514819-Artificial-intelligence-for-pokemon-showdown.html
- Schoolwork: https://nyuscholars.nyu.edu/en/publications/showdown-ai-competition
- http://julian.togelius.com/Lee2017Showdown.pdf
- Includes primers on how to model AI using showdown API
- Also a bunch of fairly simple starter algorithms and their performance
- https://papers.nips.cc/paper/2012/file/3df1d4b96d8976ff5986393e8767f5b2-Paper.pdf
- Lit review on MARL: https://arxiv.org/pdf/2011.00583.pdf
- Zero sum games can always be reformulated to Linear Problems (Dantzig, 1951)
- Training against itself? (AlphaZero/AlphaGo approach)
- “Portfolio Optimization” would be a team builder algorithm, and is unsolved
- Minimax Q?
- Create DeepLearning stuff using poke-env: https://poke-env.readthedocs.io/en/stable/getting_started.html
- This is what I go with
- Implemented Bot: https://github.com/pmariglia/showdown
- https://github.com/pkmn/EPOke
- https://www.yuzeh.com/assets/CoG-2019-Pkmn.pdf
- Forked Hsahovic's PokeEnv
- Added more Doubles Support
- Install Node and requirements for PokeEnv (e.g. python3.6, tensorflow, orljson, keras-rl2==1.0.3)
- Set up Bash Profile
To run an example where we simulate random battles, from home directory:
source ~/.bash_profile
,
node Pokemon-Showdown/pokemon-showdown
then python3.8 simulators/simulate_random_doubles.py