reuniclusVGC

Bots to play VGC; random and DQN player created; this project is abandoned for Elite Furret AI. Here are my initial notes from readings:

Rules of Pokemon that make it hard:

You and opponent make turns simultaneously
1. Your opponents moves affect the effects of your moves (e.g. if you’re KO’ed)
Large aspect of luck / randomness:
1. Damage Rolls
2. Move success (accuracy, multiple protects, OHKOs)
3. Secondary Move effects
4. Luck-based abilities (Super Luck, Effect Spore)
5. Critical Hits
Moves made now can affect outlook through next couple of turns (e.g. tailwind, reflect, protect)
Imperfect knowledge
1. Abilities
2. Items
3. Moves
4. EV spreads
5. IV spreads
6. Pokemon brought
Why VGC will be more computationally difficult
1. Action Space; Singles Action Space is 13 and Doubles Action Space is 418.
2. Turn Action Space; Singles is 169 while Doubles is 174,724 (ignoring probabilities which exacerbates this because the probabilities compound)
3. More emphasis on teampreview (since there's an extra element of picking 4 out of 6). However, whie this is certainly another type of problem, it will be comparatively trivial to solve.

ML basics:

We need to learn a “policy”; given the game state, and thus a set of possible actions, ML will choose the most optimal action
Because of terms like “reverse sweeping” (e.g. going down 4 to 1 so that you can win with your last mon), it’s hard to have heuristics of whether a game was totally or just lost — the best way to approach this would be to reward the ML with +1 with a win and a -1 with a loss. We can add more heuristics to speed up training though
Input:
- State: All knowable information of the current battle, your pokemon and the opponent's team
- Possible Actions:
  - Singles: 4 actions * tera + 5 switches = 13
  - Doubles: (4 actions * 3 targets * tera + 2 switches)^2 = 676 options
    - In reality, there are more limited options (530 total, detailed below), but we can stick w/ 676 for simplicity
      - if two switches: 2
      - if one switch: 2 (cuz either mon could switch) *
        
        switch mon: 2 possible switches *
        
        non-switch mon: 4 moves * 3 targets * tera
        
        = 96
      - if no mon switches:
        
        if one teras: 2 possible teras * (4 moves * 3 targets) * (4 moves * 3 targets)
        
        if none teras: (4 moves * 3 targets)*(4 moves * 3 targets)
        
        = 432
Output:
- Best action (or output of our policy given the battle's state)
  - There is no such thing as a “best action” because a perfectly predictable strategy is exploitable. An understanding of how good each action is ideal

Shortcuts I will take to simplify modeling:

Primarily, ignore the existence of Zoroark and Ditto :)

Ways Players Think

Pokemon is a game of wincons
- Goal is to eliminate all other pokemon that are threats to your wincon and outspeed/KO the rest
- The ability to set yourself up and “look ahead” will be key
Trainers think about HP not in the number, but in a way that matters: “survive two hits from kartana”
Players dont bank on unlikely secondary effects unless absolutely necessary
If we can eliminate pokemon each turn regardless of what the other player does, we win
They work backwards:
- In what situations will I win and how can I set up those situations?
  - Is it possible to identify 2+ x2 (where the opponent only has 2 mons left) situations in which we win and work backwards from there? As in, what HP/conditions will I need to beat the last two mons?
- In what situations will I lose and how can I avoid those situations?
  - Is it possible to look at the situations in which I only have 2 mons left and I lose?
In winning positions, they start making decisions according to probabilities: how can you minimize P(loss)?

Type of Problem VGC is

Markov Property; previous states don't matter given the current battle state. However, two caveats here:
- This is certainly not true in a competitive format where certain players have tendencies (e.g .play more or less aggressively). Whether or not players currently play optimally enough for us to need to maximize rewards based on another player's likelihood to make a move is stil an unknown.
- Cardinality of battle states is for all intents-and-purposes infinite (e.g. PP, HP values, timer)
Stochastic Game (players make moves simultaneously in decision-making scenarios; the joint actions results in a transition to a different game state)
Multi-Agent (two player)
Zero Sum (aka constant sum; there is no mutually beneficial reward)
Partially Observable (each player has imperfect information)

Potential ML approaches:

Reinforcement Learning because:
- You dont know the probability of getting to the next state (due to opponent decision)
- You don’t immediately know which states lead you to the reward
Second Choice: Monte Carlo Tree Searching?
- This would only really be applicable in Singles since in Doubles, this would be really expensive
- We could create heuristics to limit your search space (e.g. not attacking your own pokemon unless healing or weakness policy)
- “Backwards Induction” (implemented as Alpha-Beta pruning) can help prune gamestates
- Intelligent prioritization of branches to search

Useful Links:

Twitter Discussion: https://twitter.com/TBFUnreality/status/1059894045177200645
Simulator Github: https://github.com/smogon/pokemon-showdown
Paper on Poker ML using Imperfect Information: http://modelai.gettysburg.edu/2013/cfr/cfr.pdf
Someone creating DeepLearning for 1:1 battle: https://towardsdatascience.com/poke-agent-pokemon-battling-reinforcement-learning-27ef10885c5c
Create a showdown bot to operate easily: https://github.com/dramamine/leftovers-again
1. ^ It will make a request and pass a gamestate, which we will return with a move
2. Working Pokemon AI: https://github.com/vasumv/pokemon_ai/ (windows only)
Minimax AI: http://doublewise.net/pokemon/ (named Technical Machine)
1. Code: https://bitbucket.org/davidstone/technical-machine/src/master/
Someone doing simple deep learning: https://web.stanford.edu/class/aa228/reports/2018/final151.pdf
Schoolwork: https://docplayer.net/63514819-Artificial-intelligence-for-pokemon-showdown.html
Schoolwork: https://nyuscholars.nyu.edu/en/publications/showdown-ai-competition
http://julian.togelius.com/Lee2017Showdown.pdf
1. Includes primers on how to model AI using showdown API
2. Also a bunch of fairly simple starter algorithms and their performance
https://papers.nips.cc/paper/2012/file/3df1d4b96d8976ff5986393e8767f5b2-Paper.pdf
1. https://github.com/samhippie/shallow-red
Lit review on MARL: https://arxiv.org/pdf/2011.00583.pdf
1. Zero sum games can always be reformulated to Linear Problems (Dantzig, 1951)
2. Training against itself? (AlphaZero/AlphaGo approach)
3. “Portfolio Optimization” would be a team builder algorithm, and is unsolved
4. Minimax Q?
Create DeepLearning stuff using poke-env: https://poke-env.readthedocs.io/en/stable/getting_started.html
1. This is what I go with
Implemented Bot: https://github.com/pmariglia/showdown
https://github.com/pkmn/EPOke
https://www.yuzeh.com/assets/CoG-2019-Pkmn.pdf

How I got this up and running

Forked Hsahovic's PokeEnv
Added more Doubles Support
Install Node and requirements for PokeEnv (e.g. python3.6, tensorflow, orljson, keras-rl2==1.0.3)
Set up Bash Profile

To run an example where we simulate random battles, from home directory: source ~/.bash_profile, node Pokemon-Showdown/pokemon-showdown then python3.8 simulators/simulate_random_doubles.py

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Pokemon-Showdown		Pokemon-Showdown
bots		bots
examples		examples
helpers		helpers
node_modules		node_modules
simulators		simulators
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reuniclusVGC

Rules of Pokemon that make it hard:

ML basics:

Shortcuts I will take to simplify modeling:

Ways Players Think

Type of Problem VGC is

Potential ML approaches:

Useful Links:

How I got this up and running

About

Releases

Packages

Languages

License

caymansimpson/reuniclusVGC

Folders and files

Latest commit

History

Repository files navigation

reuniclusVGC

Rules of Pokemon that make it hard:

ML basics:

Shortcuts I will take to simplify modeling:

Ways Players Think

Type of Problem VGC is

Potential ML approaches:

Useful Links:

How I got this up and running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages