Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Dreamer: Scalable Reinforcement Learning Using World Models By: Danijar Hafner et. al #39

Open
QiXuanWang opened this issue Apr 16, 2020 · 0 comments

Comments

@QiXuanWang
Copy link
Owner

Link: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html
Link2: https://arxiv.org/pdf/1912.01603.pdf

Comment:
This is the STOA model based reinforcement learning as published in 2020.03.
I can't see why it performs so good. Maybe using a good world model? Parallel?

Problem:
In the past, it has been challenging to learn accurate world models and leverage them to learn successful behaviors. While recent research, such as our Deep Planning Network (PlaNet), has pushed these boundaries by learning accurate world models from images, model-based approaches have still been held back by ineffective or computationally expensive planning mechanisms, limiting their ability to solve difficult tasks

Innovation:
By learning to compute compact model states from raw images, the agent(Dreamer) is able to efficiently learn from thousands of predicted sequences in parallel using just one GPU. Dreamer achieves a new state-of-the-art in performance, data efficiency and computation time on a benchmark of 20 continuous control tasks given raw image inputs.

image

Agent component: The classical components of agents that learn in imagination are dynamics
learning(Yu: transition model), behavior learning(Yu: trajectories), and environment interaction(Yu: reward) (Sutton, 1991)

• Learning the latent dynamics model from the dataset of past experience to predict future rewards from actions and past observations. Any learning objective for the world model can be
incorporated with Dreamer. We review existing methods for learning latent dynamics in Section 4.
• Learning action and value models from predicted latent trajectories, as described in Section 3.
The value model optimizes Bellman consistency for imagined rewards and the action model is
updated by propagating gradients of value estimates back through the neural network dynamics.
• Executing the learned action model in the world to collect new experience for growing the dataset

Latent dynamics
Dreamer uses a latent dynamics model that consists of three components. The
representation model encodes observations and actions to create continuous vector-valued model
states st with Markovian transitions (Watter et al., 2015; Zhang et al., 2019; Hafner et al., 2018). The
transition model predicts future model states without seeing the corresponding observations that will later cause them. The reward model predicts the rewards given the model states,
Representation model: p(st | st−1, at−1, ot)
Transition model: q(st | st−1, at−1)
Reward model: q(rt | st).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant