To get started with MiniHack environments, we provide baseline agents using the TorchBeast framework. TorchBeast provides a PyTorch implementation of IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
TorchBeast comes in two variants: MonoBeast and PolyBeast. PolyBeast is the more powerful version of the framework and allows training agents across multiple machines. For further details, see the TorchBeast paper.
For MiniHack, we use the PolyBeast implementation of TorchBeast and additionally provide an implementation of the following exploration methods:
- RND: Exploration by Random Network Distillation
- RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
To install and train a polybeast agent in MiniHack, first install polybeast by following the instructions here, then use the following commands:
pip install ".[polybeast]"
# Test IMPALA run
python3 -m minihack.agent.polybeast.polyhydra env=MiniHack-Room-5x5-v0 total_steps=100000
We use the hydra framework for configuring our experiments. All environment and training parameters can be specified using command line arguments (or edited directly in config.yaml
). See config.yaml
file in minihack.agent.polybeast
for more information. Be sure to set up appropriate parameters for logging with wandb (disabled by default).
# Single IMPALA run
python3 -m minihack.agent.polybeast.polyhydra model=baseline env=MiniHack-Room-5x5-v0 total_steps=1000000
# Single RND run
python3 -m minihack.agent.polybeast.polyhydra model=rnd env=MiniHack-Room-5x5-v0 total_steps=1000000
# Single RND run
python3 -m minihack.agent.polybeast.polyhydra model=ride state_counter=coordinates env=MiniHack-Room-5x5-v0 total_steps=1000000
# To perform a sweep on the cluster: add another --multirun command and comma-separate values
python3 -m minihack.agent.polybeast.polyhydra --multirun model=baseline,rnd env=MiniHack-Room-Random-15x15-v0,MiniHack-Room-Monster-15x15-v0 total_steps=10000000
To replicate results of the paper performed using polybeast, simply run a sweep of 5 runs with IMPALA, RND or RIDE agents on the desired environments as follows:
python3 -m minihack.agent.polybeast.polyhydra --multirun model=baseline name=1,2,3,4,5 env=MiniHack-Room-Random-15x15-v0,MiniHack-Room-Monster-15x15-v0 total_steps=10000000
For navigation tasks, the default parameters are already set. For skill acquisition tasks, additionally set learning_rate=0.00005 msg.model=lt_cnn
.
The learning curves for all of our polybeast experiments can be accessed in our Weights&Biases repository.
The following script allows to evaluate the performance of a model pre-trained with polybeast:
# Watch the learned behaviour step-by-step in the terminal
python3 -m minihack.agent.polybeast.evaluate --env MiniHack-Room-5x5-v0 -c /path/to/checkpoint/directory --watch
# Evaluate the pre-trained model for 1 episode and save the replay as a GIF file
python3 -m minihack.agent.polybeast.evaluate --env MiniHack-Room-5x5-v0 -c /path/to/checkpoint/directory -n 1 --no-watch --save_gif --gif_path replay.gif
# Print all options of the evaluation script
python3 -m minihack.agent.polybeast.evaluate --help