Skip to content

A curated list of awesome exploration RL resources (continually updated)

License

Notifications You must be signed in to change notification settings

opendilab/awesome-exploration-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Exploration Methods in Reinforcement Learning

Updated on 2024.11.29

  • Here is a collection of research papers for Exploration methods in Reinforcement Learning (ERL). The repository will be continuously updated to track the frontier of ERL. Welcome to follow and star!

  • The balance of exploration and exploitation is one of the most central problems in reinforcement learning. In order to give readers an intuitive feeling for exploration, we provide a visualization of a typical hard exploration environment in MiniGrid below. In this task, a series of actions to achieve the goal often require dozens or even hundreds of steps, in which the agent needs to fully explore different state-action spaces in order to learn the skills required to achieve the goal.

minigrid_hard_exploration
A typical hard-exploration environment: MiniGrid-ObstructedMaze-Full-v0.

Table of Contents

A Taxonomy of Exploration RL Methods

(Click to Collapse)

In general, we can divide reinforcement learning process into two phases: collect phase and train phase. In the collect phase, the agent chooses actions based on the current policy and then interacts with the environment to collect useful experience. In the train phase, the agent uses the collected experience to update the current policy to obtain a better performing policy.

According to the phase the exploration component is explicitly applied, we simply divide the methods in Exploration RL into two main categories: Augmented Collecting Strategy, Augmented Training Strategy:

  • Augmented Collecting Strategy represents a variety of different exploration strategies commonly used in the collect phase, which we further divide into four categories:

    • Action Selection Perturbation
    • Action Selection Guidance
    • State Selection Guidance
    • Parameter Space Perturbation
  • Augmented Training Strategy represents a variety of different exploration strategies commonly used in the train phase, which we further divide into seven categories:

    • Count Based
    • Prediction Based
    • Information Theory Based
    • Entropy Augmented
    • Bayesian Posterior Based
    • Goal Based
    • (Expert) Demo Data

Note that there may be overlap between these categories, and an algorithm may belong to several of them. For other detailed survey on exploration methods in RL, you can refer to Tianpei Yang et al and Susan Amin et al.


A non-exhaustive, but useful taxonomy of methods in Exploration RL. We provide some example methods for each of the different categories, shown in blue area above.

Here are the links to the papers that appeared in the taxonomy:

[1] Go-Explore: Adrien Ecoffet et al, 2021
[2] NoisyNet, Meire Fortunato et al, 2018
[3] DQN-PixelCNN: Marc G. Bellemare et al, 2016
[4] #Exploration Haoran Tang et al, 2017
[5] EX2: Justin Fu et al, 2017
[6] ICM: Deepak Pathak et al, 2018
[7] RND: Yuri Burda et al, 2018
[8] NGU: Adrià Puigdomènech Badia et al, 2020
[9] Agent57: Adrià Puigdomènech Badia et al, 2020
[10] VIME: Rein Houthooft et al, 2016
[11] EMI: Wang et al, 2019
[12] DIYAN: Benjamin Eysenbach et al, 2019
[13] SAC: Tuomas Haarnoja et al, 2018
[14] BootstrappedDQN: Ian Osband et al, 2016
[15] PSRL: Ian Osband et al, 2013
[16] HER Marcin Andrychowicz et al, 2017
[17] DQfD: Todd Hester et al, 2018
[18] R2D3: Caglar Gulcehre et al, 2019

Papers

format:
- [title](paper link) (presentation type, openreview score [if the score is public])
  - author1, author2, author3, ...
  - Key: key problems and insights
  - ExpEnv: experiment environments

NeurIPS 2024

(Click to Collapse)
  • Learning Formal Mathematics From Intrinsic Motivation

    • Gabriel Poesia, David Broman, Nick Haber, Noah Goodman
    • Key: Jointly learns to prove formal mathematical theorems and propose harder provable conjectures in a self-improving loop; utilizes dependent type theory and hindsight relabeling to improve sample efficiency.
    • ExpEnv: Propositional logic, arithmetic, and group theory.
  • RL-GPT: Integrating Reinforcement Learning and Code-as-policy

    • Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
    • Key: Two-level hierarchical framework combining reinforcement learning and large language models (LLMs); achieves high efficiency by integrating coding for high-level planning with RL for low-level actions.
    • ExpEnv: Minecraft and MineDojo tasks, achieving SOTA performance.
  • SeeA*: Efficient Exploration-Enhanced A* Search by Selective Sampling

    • Dengwei Zhao, Shikui Tu, Lei Xu
    • Key: Enhances A* search by constructing a dynamic OPEN subset through selective sampling, enabling exploration of promising branches; theoretical and empirical efficiency improvements.
    • ExpEnv: Retrosynthetic planning (organic chemistry), logic synthesis (IC design), and Sokoban game.

ICML 2024

(Click to Collapse)

ICLR 2024

(Click to Collapse)

NeurIPS 2023

(Click to Collapse)

ICML 2023

(Click to Collapse)

ICLR 2023

(Click to Collapse)

NeurIPS 2022

(Click to Collapse)

ICML 2022

(Click to Collapse)

ICLR 2022

(Click to Collapse)

NeurIPS 2021

(Click to Collapse)

Classic Exploration RL Papers

(Click to Collapse)

Contributing

Our purpose is to provide a starting paper guide to who are interested in exploration methods in RL. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Exploration RL is released under the Apache 2.0 license.

(Back to top)