-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building blocks for PEBBLE #625
base: master
Are you sure you want to change the base?
Conversation
Thanks for the implementations! |
61e6cea
to
2ab0780
Compare
9426e0b
to
1ec229c
Compare
1ec229c
to
d23d98b
Compare
d23d98b
to
73b1e36
Compare
77c85db
to
a3369d4
Compare
…ardNets can be injected from the outside
efc5ae0
to
a0bacca
Compare
a0bacca
to
74ba96b
Compare
@AdamGleave: reacting to your comments here together:
Ok, it required a larger refactor, but you can see how it looks in the last couple of commits. A good thing is that this change also addresses your other comment. It simplified the entropy reward classes (separate entropy reward and switching from pre-traininig reward) and allows for more configurability, at the expense of making wiring a little more complicated (in train_preference_comparison.py). It also results in two changes internally:
|
7434ee6
to
4fd0758
Compare
4fd0758
to
b344cbd
Compare
Description
Creates an entropy reward replay wrapper to support the unsupervised state entropy based pre-training of an agent, as described in the PEBBLE paper.
https://sites.google.com/view/icml21pebble
Testing
Added unit tests.