Building blocks for PEBBLE #625

dan-pandori · 2022-11-11T19:24:54Z

Description

Creates an entropy reward replay wrapper to support the unsupervised state entropy based pre-training of an agent, as described in the PEBBLE paper.
https://sites.google.com/view/icml21pebble

Testing

Added unit tests.

yawen-d · 2022-11-14T15:05:43Z

Thanks for the implementations!

…y buffer is filled

…re present

…e comparison trainer

…ly through sacred

src/imitation/algorithms/pebble/entropy_reward.py

src/imitation/algorithms/preference_comparisons.py

…ardNets can be injected from the outside

mifeet · 2022-12-10T21:10:33Z

@AdamGleave: reacting to your comments here together:

I'd prefer wrapping it with a NormalizedRewardNet, they're conceptually doing very different things, and we might want to use different normalization schemes (RunningNorm often works worse than EMANorm)

Ok, it required a larger refactor, but you can see how it looks in the last couple of commits.

A good thing is that this change also addresses your other comment. It simplified the entropy reward classes (separate entropy reward and switching from pre-traininig reward) and allows for more configurability, at the expense of making wiring a little more complicated (in train_preference_comparison.py).

It also results in two changes internally:

Previously, the running mean/var statistics for normalization were first updated, then normalization was applied. Now these are swapped.
Previously, reward reward calculation required conversions numpy -> torch -> numpy, now it internally converts numpy -> torch -> numpy -> torch -> numpy (because that's what the existing code for NormalizedRewardNet does). Though this applies just for pretraining.

mifeet pushed a commit that referenced this pull request Nov 29, 2022

#625 refactor RunningMeanAndVar

50ec092

mifeet pushed a commit that referenced this pull request Nov 29, 2022

#625 use RunningNorm instead of RunningMeanAndVar

1fdfc74

mifeet pushed a commit that referenced this pull request Nov 30, 2022

#625 make copy of train_preference_comparisons.py for pebble

5d1b7d7

mifeet pushed a commit that referenced this pull request Nov 30, 2022

#625 use an OffPolicy for pebble

0a435bc

mifeet pushed a commit that referenced this pull request Nov 30, 2022

#625 fix assumptions about shapes in ReplayBufferEntropyRewardWrapper

27b8a55

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 entropy reward as a function

2dec99f

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 make entropy reward serializable with pickle

1f50696

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 revert change of compute_state_entropy() from tensors to numpy

567e980

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 extract _preference_feedback_schedule()

c2bc9dc

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 introduce parameter for pretraining steps

c681ca3

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 add initialized callback to ReplayBufferRewardWrapper

ad29c34

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 fix entropy_reward.py

ec7b853

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 remove ReplayBufferEntropyRewardWrapper

d1aae17

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 introduce ReplayBufferAwareRewardFn

3d7cfca

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 rename PebbleStateEntropyReward

d348534

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 PebbleStateEntropyReward can switch from unsupervised pretraining

9090b0c

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 remove separate pebble scripts

61e6cea

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 add optional pretraining to PreferenceComparisons

f957baf

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 PebbleStateEntropyReward supports the initial phase before repla…

88371e1

…y buffer is filled

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 entropy_reward can automatically detect if enough observations a…

ddd7b2f

…re present

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 fix entropy shape

15c682a

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 rename unsupervised_agent_pretrain_frac parameter

716c710

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 specialized PebbleAgentTrainer to distinguish from old preferenc…

152efa6

…e comparison trainer

mifeet force-pushed the dpandori_wellford branch from 61e6cea to 2ab0780 Compare December 1, 2022 22:03

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 merge pebble to train_preference_comparisons.py and configure on…

ad8d76e

…ly through sacred

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 plug in pebble according to parameters

2ab0780

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 fix pre-commit errors

f3decf1

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 add test for pebble agent trainer

473c7b2

mifeet pushed a commit that referenced this pull request Dec 1, 2022

#625 add test for pebble agent trainer

189af59

#625 fix even even more pre-commit errors

7c3470e

mifeet pushed a commit that referenced this pull request Dec 2, 2022

#625 use batching for entropy computation to avoid memory issues

9426e0b

mifeet pushed a commit that referenced this pull request Dec 2, 2022

#625 use batching for entropy computation to avoid memory issues

1ec229c

mifeet force-pushed the dpandori_wellford branch from 9426e0b to 1ec229c Compare December 2, 2022 23:35

mifeet pushed a commit that referenced this pull request Dec 2, 2022

#625 use batching for entropy computation to avoid memory issues

d23d98b

mifeet force-pushed the dpandori_wellford branch from 1ec229c to d23d98b Compare December 2, 2022 23:36

#625 fix even even more pre-commit errors

73b1e36

mifeet force-pushed the dpandori_wellford branch from d23d98b to 73b1e36 Compare December 2, 2022 23:39

AdamGleave reviewed Dec 7, 2022

View reviewed changes

Jan Michelfeit added 3 commits December 10, 2022 00:42

#641 code review: remove set_replay_buffer

6daa473

#641 code review: fix comment

c80fb80

#641 code review: replace RunningNorm with NormalizedRewardNet

50577b0

mifeet force-pushed the dpandori_wellford branch from 77c85db to a3369d4 Compare December 10, 2022 01:06

#641 code review: refactor PebbleStateEntropyReward so that inner Rew…

531b353

…ardNets can be injected from the outside

mifeet force-pushed the dpandori_wellford branch 4 times, most recently from efc5ae0 to a0bacca Compare December 10, 2022 20:46

#641 fix static analysis and tests

74ba96b

mifeet force-pushed the dpandori_wellford branch from a0bacca to 74ba96b Compare December 10, 2022 20:48

mifeet force-pushed the dpandori_wellford branch 7 times, most recently from 7434ee6 to 4fd0758 Compare December 13, 2022 10:41

#641 increase coverage

b344cbd

mifeet force-pushed the dpandori_wellford branch from 4fd0758 to b344cbd Compare December 13, 2022 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building blocks for PEBBLE #625

Building blocks for PEBBLE #625

dan-pandori commented Nov 11, 2022

yawen-d commented Nov 14, 2022

mifeet commented Dec 10, 2022 •

edited

Loading

Building blocks for PEBBLE #625

Are you sure you want to change the base?

Building blocks for PEBBLE #625

Conversation

dan-pandori commented Nov 11, 2022

Description

Testing

yawen-d commented Nov 14, 2022

mifeet commented Dec 10, 2022 • edited Loading

mifeet commented Dec 10, 2022 •

edited

Loading