You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Comment:
This is the STOA model based reinforcement learning as published in 2020.03.
I can't see why it performs so good. Maybe using a good world model? Parallel?
Problem:
In the past, it has been challenging to learn accurate world models and leverage them to learn successful behaviors. While recent research, such as our Deep Planning Network (PlaNet), has pushed these boundaries by learning accurate world models from images, model-based approaches have still been held back by ineffective or computationally expensive planning mechanisms, limiting their ability to solve difficult tasks
Innovation:
By learning to compute compact model states from raw images, the agent(Dreamer) is able to efficiently learn from thousands of predicted sequences in parallel using just one GPU. Dreamer achieves a new state-of-the-art in performance, data efficiency and computation time on a benchmark of 20 continuous control tasks given raw image inputs.
Agent component: The classical components of agents that learn in imagination are dynamics
learning(Yu: transition model), behavior learning(Yu: trajectories), and environment interaction(Yu: reward) (Sutton, 1991)
• Learning the latent dynamics model from the dataset of past experience to predict future rewards from actions and past observations. Any learning objective for the world model can be
incorporated with Dreamer. We review existing methods for learning latent dynamics in Section 4.
• Learning action and value models from predicted latent trajectories, as described in Section 3.
The value model optimizes Bellman consistency for imagined rewards and the action model is
updated by propagating gradients of value estimates back through the neural network dynamics.
• Executing the learned action model in the world to collect new experience for growing the dataset
Latent dynamics
Dreamer uses a latent dynamics model that consists of three components. The
representation model encodes observations and actions to create continuous vector-valued model
states st with Markovian transitions (Watter et al., 2015; Zhang et al., 2019; Hafner et al., 2018). The
transition model predicts future model states without seeing the corresponding observations that will later cause them. The reward model predicts the rewards given the model states,
Representation model: p(st | st−1, at−1, ot)
Transition model: q(st | st−1, at−1)
Reward model: q(rt | st).
The text was updated successfully, but these errors were encountered:
Link: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html
Link2: https://arxiv.org/pdf/1912.01603.pdf
Comment:
This is the STOA model based reinforcement learning as published in 2020.03.
I can't see why it performs so good. Maybe using a good world model? Parallel?
Problem:
In the past, it has been challenging to learn accurate world models and leverage them to learn successful behaviors. While recent research, such as our Deep Planning Network (PlaNet), has pushed these boundaries by learning accurate world models from images, model-based approaches have still been held back by ineffective or computationally expensive planning mechanisms, limiting their ability to solve difficult tasks
Innovation:
By learning to compute compact model states from raw images, the agent(Dreamer) is able to efficiently learn from thousands of predicted sequences in parallel using just one GPU. Dreamer achieves a new state-of-the-art in performance, data efficiency and computation time on a benchmark of 20 continuous control tasks given raw image inputs.
Agent component: The classical components of agents that learn in imagination are dynamics
learning(Yu: transition model), behavior learning(Yu: trajectories), and environment interaction(Yu: reward) (Sutton, 1991)
• Learning the latent dynamics model from the dataset of past experience to predict future rewards from actions and past observations. Any learning objective for the world model can be
incorporated with Dreamer. We review existing methods for learning latent dynamics in Section 4.
• Learning action and value models from predicted latent trajectories, as described in Section 3.
The value model optimizes Bellman consistency for imagined rewards and the action model is
updated by propagating gradients of value estimates back through the neural network dynamics.
• Executing the learned action model in the world to collect new experience for growing the dataset
Latent dynamics
Dreamer uses a latent dynamics model that consists of three components. The
representation model encodes observations and actions to create continuous vector-valued model
states st with Markovian transitions (Watter et al., 2015; Zhang et al., 2019; Hafner et al., 2018). The
transition model predicts future model states without seeing the corresponding observations that will later cause them. The reward model predicts the rewards given the model states,
Representation model: p(st | st−1, at−1, ot)
Transition model: q(st | st−1, at−1)
Reward model: q(rt | st).
The text was updated successfully, but these errors were encountered: