This is the code for optimappo (paper, website) which enables otpimism in multi-agent policy gradient methods by shaping the advantage estimation. This is a simple, but effective way to improve MAPPO on deterministic tasks by overcoming the relative overgeneralization problem.
- Please refer to MAPPO to install the python virtural environment.
- We also need to install Multi-Agent MuJoCo.
cd scripts
./train_mujoco_local.sh
If you found this code is useful for your work, please cite our paper:
@inproceedings{zhao2024optimistic,
title={Optimistic Multi-Agent Policy Gradient},
author={Zhao, Wenshuai and Zhao, Yi and Li, Zhiyuan and Kannala, Juho and Pajarinen, Joni},
booktitle={Proceedings of the International Conference on Machine Learning},
year={2024}
}