You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Agents are entities with a sample_action and update method, in potence.
We exclude from the list exploration strategies and curricula.
Implement means either to produce new code from the paper directly, or to port an implementation from elsewhere, should that implementation be modular enough.
Vanilla value learning
Vanilla DQN
SAC
SAC Discrete
Categorical DQN
Double DQN
Dueling DQN
N-step DQN
Noisy DQN
Rainbow
DQN with Prioritised Experience Replay
Agent57
Expected Eligibility Traces
Vanilla Actor critic
A2C
DDPG
ACER
PPO
TD3
ACKTR
Model-based
DreamerV1
DreamerV2
DreamerV3
AlphaZero
MuZero
Forward-Backward RL
Credit assignment
RUDDER
Temporal Value Transport
Hindsight methods
Hindsight Credit Assignment
Hindsight Policy Gradients
Hindsight Experience Replay
Upside-Down RL
Policy Gradients incorporating the future
[ ]
Sequence modelling
Decision Transformers
Online Decision Transformers
Trajectory Transformer
UniMASK
Distributed
R2D2
IMPALA
Seed RL
Meta RL
RL2
Learned Policy Gradient
Algorithm Distillation
The text was updated successfully, but these errors were encountered:
Agent
s are entities with asample_action
andupdate
method, in potence.We exclude from the list exploration strategies and curricula.
Implement means either to produce new code from the paper directly, or to port an implementation from elsewhere, should that implementation be modular enough.
Vanilla value learning
Vanilla Actor critic
Model-based
Credit assignment
Hindsight methods
Sequence modelling
Distributed
Meta RL
The text was updated successfully, but these errors were encountered: