Brain Agent

Brain Agent is a distributed agent learning system for large-scale and multi-task reinforcement learning, developed by Kakao Brain. Brain Agent is based on the V-trace actor-critic framework IMPALA and modifies Sample Factory to utilize multiple GPUs and CPUs for a much higher throughput rate during training.

Features

First publicly available implementations of reproducing SOTA results on DMLAB30.
Scalable & massive throughput.
BrainAgent can produce and train 20B frames/week, or 34K fps, with 16 V100 GPUs, by scaling up high throughput single node system Sample Factory.
Based on following algorithms and architectures.
- TransformerXL-I core and ResNet encoder
- V-trace for update algorithm
- IMPALA for system framework
- PopArt for multitask handling
For self-supervised representation learning, we include 2 additional features.
- ResNet-based decoder to reconstruct the original input image (trxl_recon)
- Additional autoregressive transformer to predict the images of future steps from the current state embedding and future action sequence (trxl_future_pred)
Provide codes for both training and evaluation, along with SOTA model checkpoint with 28M params.

How to Install

Python 3.7
Pytorch 1.9.0
CUDA 11.1
Install DMLab envrionment - DMLab Github
pip install -r requirements.txt

Description of Codes

dist_launch.py -> distributed training launcher
eval.py -> entry point for evaluation
train.py -> entry point for training
brain_agent
- core
  - agents
    - dmlab_multitask_agent.py
  - algos
    - aux_future_predict.py -> Computes auxiliary loss by predicting future state transitions with autoregressive transformer. Used only for (trxl_future_pred).
    - popart.py
    - vtrace.py
  - actor_worker.py
  - learner_worker.py
  - policy_worker.py
  - shared_buffer.py -> Defines SharedBuffer class for zero-copy communication between workers.
- envs
  - dmlab
- utils
  - ...
configs
- ... -> Hyperparam configs for each of training/evaluation.

How to Run

Training

1 node x 1 GPU

   python train.py cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR

1 node x 4 GPUs = 4 GPUs

   python -m dist_launch --nnodes=1 --node_rank=0 --nproc_per_node=4 -m train \ 
     cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR

4 nodes x 4 GPUs each = 16 GPUs

   sleep 120; python -m dist_launch --nnodes=4 --node_rank=0 --nproc_per_node=4 --master_addr=$MASTER_ADDR -m train \ 
     cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR 
   sleep 120; python -m dist_launch --nnodes=4 --node_rank=1 --nproc_per_node=4 --master_addr=$MASTER_ADDR -m train \ 
     cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR
   sleep 120; python -m dist_launch --nnodes=4 --node_rank=2 --nproc_per_node=4 --master_addr=$MASTER_ADDR -m train \ 
     cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR
   sleep 120; python -m dist_launch --nnodes=4 --node_rank=3 --nproc_per_node=4 --master_addr=$MASTER_ADDR -m train \ 
     cfg=configs/trxl_recon_train.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR

Evaluation

python eval.py cfg=configs/trxl_recon_eval.yaml train_dir=$TRAIN_DIR experiment=$EXPERIMENT_DIR test.checkpoint=$CHECKPOINT_FILE_PATH

Setting Hyperparameters

All the default hyperparameters are defined at configs/default.yaml
Other config files override on configs/default.yaml.
You can use pre-defined hyperparameters for our experiments with configs/trxl_recon_train.yaml or configs/trxl_future_pred.yaml.

Results for DMLAB30

Settings
- 3 runs with different seeds
- 100 episodes per each run
- HNS : Human Normalised Score

Results

Model	Mean HNS	Median HNS	Mean Capped HNS
MERLIN	115.2	-	89.4
GTrXL	117.6	-	89.1
CoBERL	115.47	110.86	-
R2D2+	-	99.5	85.7
LASER	-	97.2	81.7
PBL	104.16	-	81.5
PopArt-IMPALA	-	-	72.8
IMPALA	-	-	58.4
Ours (lstm_baseline, 20B ckpt)	103.03 ± 0.37	92.04 ± 0.73	81.35 ± 0.25
Ours (trxl_baseline, 20B ckpt)	111.95 ± 1.00	105.43 ± 2.61	85.57 ± 0.20
Ours (trxl_recon, 20B ckpt)	123.60 ± 0.84	108.63 ± 1.20	91.25 ± 0.41
Ours (trxl_future_pred, 20B ckpt)	128.00 ± 0.43	108.80 0.99	90.53 ± 0.26

Results for all 30 tasks

Level	lstm_baseline	trxl_baseline	trxl_recon	trxl_future_pred
rooms_collect_good_objects_(train / test)	94.22 ± 0.84 / 95.13 ± 0.61	97.85 ± 0.31 / 95.20 ± 1.26	97.58 ± 0.20 / 89.39 ± 1.42	98.19 ± 0.18 / 98.52 ± 0.95
rooms_exploit_deferred_effects_(train / test)	37.84 ± 2.23 / 4.36 ± 1.84	38.40 ± 3.82 / 1.73 ± 0.63	38.86 ± 3.48 / 4.04 ± 0.89	40.93 ± 3.12 / 2.26 ± 0.71
rooms_select_nonmatching_object	50.13 ± 2.95	98.78 ± 1.38	99.52 ± 0.97	113.20 ± 1.14
rooms_watermaze	45.09 ± 4.70	36.92 ± 6.90	111.20 ± 2.29	55.82 ± 0.74
rooms_keys_doors_puzzle	51.75 ± 8.90	55.86 ± 4.25	61.24 ± 9.09	64.95 ± 8.43
language_select_described_object	150.57 ± 0.58	154.90 ± 0.22	155.35 ± 0.17	158.23 ± 0.90
language_select_located_object	225.97 ± 1.93	244.46 ± 1.56	252.04 ± 0.31	261.20 ± 1.15
language_execute_random_task	126.49 ± 2.35	139.63 ± 1.23	145.21 ± 0.36	150.20 ± 1.35
language_answer_quantitative_question	153.92 ± 2.35	162.99 ± 2.42	163.72 ± 1.36	166.07 ± 1.72
lasertag_one_opponent_small	234.90 ± 6.19	243.52 ± 3.96	249.99 ± 6.64	279.54 ± 4.14
lasertag_three_opponents_small	235.61 ± 1.92	242.61 ± 3.75	246.68 ± 5.99	264.20 ± 3.76
lasertag_one_opponent_large	74.88 ± 5.06	83.51 ± 1.31	82.55 ± 2.15	94.86 ± 3.64
lasertag_three_opponents_large	84.78 ± 2.42	92.04 ± 2.17	96.54 ± 0.67	105.83 ± 0.47
natlab_fixed_large_map	98.10 ± 1.77	110.74 ± 1.34	120.53 ± 1.79	118.17 ± 1.79
natlab_varying_map_regrowth	108.54 ± 1.20	107.16 ± 2.68	108.14 ± 1.25	104.83 ± 1.26
natlab_varying_map_randomized	85.33 ± 6.52	86.33 ± 7.30	85.53 ± 6.69	77.74 ± 0.84
skymaze_irreversible_path_hard	55.29 ± 9.08	60.63 ± 4.73	61.63 ± 2.52	66.30 ± 5.69
skymaze_irreversible_path_varied	77.02 ± 3.57	77.41 ± 0.67	81.31 ± 2.34	79.36 ± 7.95
psychlab_arbitrary_visuomotor_mapping	52.17 ± 2.06	51.46 ± 0.45	101.82 ± 0.19	101.80 ± 0.00
psychlab_continuous_recognition	52.57 ± 0.46	52.41 ± 0.92	102.46 ± 0.32	102.30 ± 0.00
psychlab_sequential_comparison	76.82 ± 0.45	75.48 ± 1.16	75.74 ± 0.58	76.13 ± 0.77
psychlab_visual_search	101.54 ± 0.10	101.58 ± 0.04	101.91 ± 0.00	101.90 ± 0.00
explore_object_locations_small	118.89 ± 0.93	121.47 ± 0.26	123.54 ± 2.61	126.67 ± 2.08
explore_object_locations_large	111.46 ± 2.91	120.70 ± 2.12	115.43 ± 1.64	129.83 ± 2.41
explore_obstructed_goals_small	136.92 ± 6.02	148.05 ± 1.96	166.75 ± 3.63	174.30 ± 3.72
explore_obstructed_goals_large	92.36 ± 5.81	106.73 ± 7.86	153.44 ± 3.20	176.43 ± 1.50
explore_goal_locations_small	143.21 ± 8.21	154.87 ± 4.41	177.16 ± 0.37	193.00 ± 3.75
explore_goal_locations_large	98.50 ± 9.61	117.33 ± 6.75	160.39 ± 3.32	178.13 ± 7.15
explore_object_rewards_few	76.29 ± 1.52	108.64 ± 0.89	109.58 ± 3.53	110.07 ± 1.42
explore_object_rewards_many	72.33 ± 0.87	105.33 ± 1.52	105.15 ± 0.75	107.23 ± 1.59

Learning curves

Distributed RL System Overview

Notes

Acknowledgement
- Sample Factory for optimized single node training, Transformer-XL for neural net core architecture.
License
- This repository is released under the MIT license, included here.
- This repository includes some codes from sample-factory (MIT license) and transformer-xl (Apache 2.0 License).
Contact
- Agent learning team, Kakao Brain.
- If you have any question or feedback regarding this repository, please email to contact@kakaobrain.com

Citation

@misc{kakaobrain2022brain_agent, title = {Brain Agent}, 
author = {Donghoon Lee, Taehwan Kwon, Seungeun Rho, Daniel Wontae Nam, Jongmin Kim, Daejin Jo, and Sungwoong Kim}, 
year = {2022}, howpublished = {\url{https://github.com/kakaobrain/brain_agent}} }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Brain Agent

Features

How to Install

Description of Codes

How to Run

Training

Evaluation

Setting Hyperparameters

Results for DMLAB30

Distributed RL System Overview

Notes

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Brain Agent

Features

How to Install

Description of Codes

How to Run

Training

Evaluation

Setting Hyperparameters

Results for DMLAB30

Distributed RL System Overview

Notes

Citation