Objectives
- v1.0
- Modular design (independent environment, agent, training arena, learning algorithm, approximate function and auxiliary modules)
- Open closed principle (implemented by wrapper design)
- Multi-platform (support Pytorch and Tensorflow for convenient performance comparison with other open sources)
- Multi-algorithm; Multi-model (implemented A2C,Acktr,PPO,... algorithms; CNN,RNN,... models)
- Multi-CPU sampling; Multi-GPU calculation
- Modules for custom environments (used pygame for 2d cases and Panda3D for 3d cases)
- Automatic hyperparameter search (implemented by Optuna)
- v2.0
- Modules for POMDPs (Inference in vision)
- Modules for Multi-agent problems (Training arena for multi-player multi-type multi-agent without explicitly communication)
- v3.0
- Distributional calculation
- Model based algorithms
- Explicitly communications
Results
- v1.0
- Surpass or do as well as OpenAI's Baselines on Atari and Mujoco cases
- v2.0
- Performance similar with state-of-the-art algorithms on flickering Atari cases (POMDP cases)
- Solved Starcraft minigames like micro management problems (Multi-agent cases)
Animations
Left:BeamRider's feature maps, Middle:Breakout's feature maps, Right:Visualizations of filters for MNIST
Objectives
- Optimize controlling of multiple kinds of unmanned vehicles
- Capture targets efficiently while avoiding collision with obstacles
- Adapting for POMDP environments
Results
- Best Paper Award in SPIE Journal of Applied Remote Sensing, 2019 https://doi.org/10.1117/1.JRS.13.044509
Sketches
Task descriptions by stages
Objectives
- Training topology controllers to control electricity transportation in power grids, while keeping people and equipment safe from irregular wave motion or natural disasters.
Results
- Ranked 4th in Learning to run a power network challenge https://l2rpn.chalearn.org/
- Invited talk for the Annual Meeting of IEEJ (The Institute of Electrical Engineers of Japan), 2020
Animations
A tiny example case of power grids
Objectives
- General agent to escape from different mazes
Results
- Solved
Animations
Objectives
- General agents to win in very complex environments
Results
- Familiar with pysc2 APIs
Objectives
- Starcraft's minigame like setting for optimal tactic learning
Results
- Long range units learned hit and run tactic, Short range units learned surround tactic.
Animations
Left:Hit and run,Right:Surround
Objectives
- Control controllable cellular to guide controllable cellular automaton to a specific target state
Results
- Foundational cases are Solved.
Animations
Left: Small case, Middle: Large case, Right: Loop tree case
Objectives
- Kaggle competition, a resource management game where agent build and control a small armada of ships to collect more halite, a luminous energy source. https://www.kaggle.com/c/halite
Results
- Purely machine learned agent was an excellent collector with high collection efficiency, but not aggressive enough to beat top agents (with some hand-crafted features).
Animations
Custom rendering for agent development
Objectives
- Control multiple types of multiple robots to scan an area as soon as possible while avoiding static and dynamic obstacles.
Results
- Well learned.
Animations