VGC Problem Definition #268

caymansimpson · 2022-03-05T00:28:29Z

caymansimpson
Mar 5, 2022

First Discussion! I wanted to formalize the problem definition of Singles and VGC from an AI perspective that make them such hard problems to solve. All of us working in this area have already thought deeply about this -- and I'm sure this stream-of-consciousness will miss a ton! I would love people's thoughts and feedback, and I can go back and edit/add to this; I hope to use this as a foundation on which we can start discussing promising angles/approaches to tackling this problem.

Problem Area	Singles	VGC
Agents	This is a multi-agent problem.	Ditto.
Actions	Players decide actions simultaneously when making decisions and those joint actions result into a transition to a different game state, dependent on several random factors; This problem is stochastic. One thing that makes this problem very hard is that actions very often interact (e.g. roar before trick room).	In VGC, this problem is a lot worse, because the interactions between actions are even more complex (e.g. trying to self-Swagger in front of a mon with Follow Me).
Action Space	For singles, the action space is very straightforward; 13 (assuming dynamax). This means turn action space (both agents) is 169. Because there are so few actions one individual agent can make, this makes Singles prime for Minimax. However, Singles games.	For VGC, the action space is 418 (assuming dynamax), which means turn action space is ~175K. This means that any attempt at Minimax approaches will need to critically focus on pruning.
Game Length	Average turns per game are far higher in Singles, which means that reward-shaping is critical.	For VGC, especially in the dynamax world, games can end very quickly, which means that reward-shaping may not be as critical, as reaching terminal nodes (end of games) is more computationally feasible (with efficient pruning).
Imperfect Information	Pokeon is a game that is partially observable; this means that no player has perfect information (EV spreads, movesets, items). At high levels of play, agents should consider the information they expose to their opponent, and how that could possibly increase an opponent's efficacy.	In VGC, there is an extra element to this as well, which is "out of the 6 mons in team selection, which 4 did my opponent bring?"
Team Choice	Not relevant for Singles.	VGC offers this extra layer of strategic complexity -- given imperfect information and past information learned from a previous game (if applicable), what team offers the best chance of winning? This can be considered a separate problem.
Teambuilding	How to build a team to optimally perform in the meta is an interesting problem, but can be considered a separate problem.	This is more intricate in VGC, because of the "4 out of 6" mechanic mentioned above.
Contextual Knowledge	In Pokemon, there is a huge amount of contextual knowledge needed -- from items, to the meta, to movesets, abilities, stats and interactions between different battle mechanics. A successful agent will probably have some element of predictions of an opponent's moveset/items/EV spreads as it learns more information through a battle in order to refine its decision making. Another additional complexity is that this game is a game of edge-cases. Thinking about the actual battle engine coded out makes me cringe!	This is slightly more complex due to the ability to affect your own mons with your moves.
Luck	Because of move probabilities (accuracy and secondary effects), abilities (Moody), item probabilities (Quick Claw), damage rolls, critical hit chances, etc., actions don't reliably translate to states.	This is even harder to predict in VGC, where more actions mean more random variables, which means more possible states.
State Space	For all intents and purposes, we should treat it as infinite due to PP, HP and timer.	State space is technically bigger in VGC due to more possible interactions, but for all intents and purposes, they are the same.
Markov property	We can't assume that this game follows Markov properties, since past player behaviors will offer insight into future behaviors (eg risk-taking, defensive playstyles); this is especially true because there is often no optimal move.	Ditto.
Strategic Optimizations	Pokemon is a game of win conditions; best players see multiples ways of winning and aim to get closer to those setups. Because "reverse sweeps" are not uncommon, reward shaping to bias AI towards win conditions may prove especially difficult. This makes tree-based approaches very attractive over RL-based ones.	Ditto.
Other Properties	Pokemon is a zero-sum game, so there is technically a nash equilibrium. This means minimax solutions are possible (despite large action space and essentially a continuous state space).	Ditto.

From the above, my view is there are two approaches that might work for VGC: tree-based and RL.

	Pros	Cons
Tree-based	Don't need to encode contextual knowledge and edge-cases	Need ability to simulate Most of this problem is how to prune efficiently.
RL	Will handle continuous state spaces well	Reward shaping is difficult Encoding knowledge of move interactions will require incredible amounts of custom logic and training data.
Both	Most promising RL AIs are a combination of both. They're able to take Pros of both Tree-based and RL-approaches.	Cons are that there are no RL AIs that have been successful in the area that VGC lives in (continuous state space, high degree of chance). Also heavy computational cost.

hsahovic · 2022-03-09T10:45:38Z

hsahovic
Mar 9, 2022
Maintainer

This is a good summary of a lot of important issues related to this problem.
One thing I would add regarding the last section is that you can combine RL and Tree-based approaches - that's what AlphaZero does, for instance.

7 replies

caymansimpson Jun 9, 2024
Author

Agreed. I do think Stochastic MuZero can solve VGC (MuZero only applies to perfect information games), but given the complex game mechanics, I would consider the solution intractable since Stochasric MuZeros solution requires such high amounts of training data for more simplistic games like chess, and hundreds of GPUs with custom-built infrastructure. Stochastic MuZero has also not shown great promise in stochastic environments either, so that’s another thing to solve.

acxz Jun 9, 2024

the solution intractable since Stochasric MuZeros solution requires such high amounts of training data

Agreed. For folks without infinite hardware there must be a trade-off between theoretically elegant solutions and practical ones (i.e. something that I can throw in Pokemon Showdown).

Stochastic MuZero has also not shown great promise in stochastic environments either, so that’s another thing to solve.

Could you clarify this? For example, Stochastic MuZero has not shown promise in stochastic environments as opposed to what alternatives?

caymansimpson Aug 14, 2024
Author

Sorry for the lag in response; I think Stochastic MuZero has shown promise in more mechanistically straightforward and low stochasticity games; I think it's a cool and valuable algorithm! I chatted with a few folks at DeepMind who think that the scale and training requirements of the models for VGC would be intractable, even for professional teams at their respective companies.

The two areas that I think have shown promise in domains similar to VGC (within Deep RL) are ESCHER and rNAD, but neither has been validated/tried. Though their teams seems think their algorithms may be fits?

acxz Dec 2, 2024

Came across ReBEL, i.e. AlphaZero for imperfect information.

caymansimpson Dec 2, 2024
Author

Yeah! I also talked with the team that developed it, and they said it would likely be impossible due to the number of possible infostates we could be in (in VGC: mons brought/in back, items, moves, abilities, EVs). I think we could probably do something with abstraction, but unclear whether that would make the problem computationally feasible

melondonkey · 2022-09-12T20:47:22Z

melondonkey
Sep 12, 2022

Great discussion here. I'm a bit late to it, but want to get involved as I've been thinking about RL for VGC (here's my current blog but more basic analytics) . I do believe deep RL + MTC + self-play is the ultimate answer to this problem but I don't know how attainable a good enough version is for pedestrians like me with both limited brain power and compute power.

Here are some intuitions and random thoughts I have about the problem currently, but have not yet gotten far along enough to put ideas into practice:

Seed the algorithm with data from the meta. Most people describe the team space in terms of combinatorics but when you think of it in terms of entropy the uncertainty is vastly reduced (I have some blog posts tracking meta entropy in VGC). This does go against the design philosophy of AlphaGo Zero in that it uses data, but it also greatly reduces the search space and could be a path to topping the ladder. For example, we could have a team generator that constructs by marginalizing the usage statistics and then iteratively sampling based on the joint distribution. I've written some scrapers for the meta in R but would need to port them to python to work into a team constructor on poke-env.
Probabilistic programming? I'm kind of iffy on this one because I'm not sure how much this adds with deep models but was wondering if tf-probability could aid with representing the uncertainty
Updating the network appropriately for simultaneous moves. Assuming the architecture has a combined policy and state network like AGZero, I'm unclear on how the network would update "simultaneoulsy" given that it has two different imperfect perspectives of the game without one player getting to "cheat" with updated parameters. This problem has probably already been tackled or solved though I just don't know the answer.
Make use of pre-trained mon-embeddings/move-embeddings, etc for transfer learning. Perhaps even the move descriptions could be used.

As added bonus here's a scraper script to extract meta info for a given format. Like I say, I'll try to submit a PR one day with something like this but I have a pretty steep learning curve ahead of me getting acquainted with all the current tooling.

Gist here

2 replies

NoahvdV Sep 12, 2022

Hi, big fan of your blog, been following it for quite a while.
Thanks for sharing your meta scraping code!

melondonkey Sep 13, 2022

Thanks! Actually means a lot as sometimes it feels like I’m posting into a void. The chef post got a lot of attention but everything else has been pretty mum. However, that has taught me that the abstract really needs to connect with the concrete to resonate. That’s part of why I think VGC is the best meta to try for a proof of concept with AI.

melondonkey · 2022-09-15T17:10:07Z

melondonkey
Sep 15, 2022

Can you go over the calcs for the action space for VGC being 418? I calculated it as:

4 moves into 4 targets = 16
4 special moves (dynamax, terastalize, etc) into 4 targets = 16
2 switch options
16 + 16 + 2 = 34 options for one Pokemon
34*34 = 1156 for two Pokemon
1156 - 1 = 1155 (since both Pokemon cannot switch into the same mon)

The edge cases I didn't include are pivot moves like U-Turn and Parting Shot.

0 replies

caymansimpson · 2022-09-25T04:41:06Z

caymansimpson
Sep 25, 2022
Author

This is my math, where I break each state space by how we choose mons switch: - if two switches: 2 (cuz positions matter) - if one switch: 2 (cuz either mon could switch) * - switch mon: 2 possible switches - if non-switch dynamaxes: 4 moves * 2 targets - if non-switch doesn't dynamax: 4 moves * 3 targets = 80 - if no mon switches: - if one dynamaxes: 2 possible dynamaxes * (4 moves * 2 targets) * (4 moves * 3 targets) - if none dynamaxes: (4 moves * 3 targets)*(4 moves * 3 targets) = 336 I think where we differ is: 1/ position matters if we switch two 2/ dynamax moves only have a max of two targets 3/ im assuming moves have a max of three targets (can’t think of ones that have 4 off the top of my head; but I’m probably missing multiple 😛) 4/ only one mon can dynamax on a turn 5/ pivot moves, which is something I definitely consider

…

Sent from my iPhone

On Sep 15, 2022, at 10:10 AM, melondonkey ***@***.***> wrote: Can you go over the calcs for the action space for VGC being 418? I calculated it as: 4 moves into 4 targets = 16 4 special moves (dynamax, terastalize, etc) into 4 targets = 16 2 switch options 16 + 16 + 2 = 34 options for one Pokemon 34*34 = 1156 for two Pokemon 1156 - 1 = 1155 (since both Pokemon cannot switch into the same mon) The edge cases I didn't include are pivot moves like U-Turn and Parting Shot. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

0 replies

acxz · 2024-06-09T19:38:10Z

acxz
Jun 9, 2024

Imperfect Information
Pokeon is a game that is partially observable; this means that no player has perfect information (EV spreads, movesets, items). At high levels of play, agents should consider the information they expose to their opponent, and how that could possibly increase an opponent's efficacy.

I'd like to add more here.

Observability, Im/perfect information, and In/complete information are all different concepts, although closely related.

As you mentioned, Pokemon Battles are partially observable, as the state of pokemon battles includes opponent stats, movesets, items that are unknown to the agent.

Pokemon battles are also imperfect information games as the stats, movesets, items of the opponent are hidden from the agent, but important to keep in mind the action (i.e. opponent moves at a specific turn) is not hidden.

Pokemon battles are (I believe) are incomplete information games as while the objective of each agent (knock out opponent pokemon) is known to all other agents, the policies and strategies are not.

Just my two cents.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VGC Problem Definition #268

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

VGC Problem Definition #268

Replies: 5 comments · 9 replies

hsahovic Mar 9, 2022 Maintainer

caymansimpson Jun 9, 2024 Author

caymansimpson Aug 14, 2024 Author

caymansimpson Dec 2, 2024 Author

caymansimpson Sep 25, 2022 Author

Replies: 5 comments 9 replies

hsahovic
Mar 9, 2022
Maintainer

caymansimpson Jun 9, 2024
Author

caymansimpson Aug 14, 2024
Author

caymansimpson Dec 2, 2024
Author

caymansimpson
Sep 25, 2022
Author