You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m working on a reinforcement learning project using Brax to train a PPO agent and I’m trying to implement curriculum learning by adjusting the environment's difficulty dynamically based on the training progress (e.g., current_steps or number of episodes). My goal is to pass this information to the environment during training so that I can change certain parameters (like gravity, object mass, etc.) as the agent progresses.
I’ve thought of a solution where I modify the training code to pass the current training progress into the environment’s reset function. Here’s a simplified example of what I have in mind:
However, this requires modifying the reset_fn in the training loop (brax/training/agents/ppo/train.py) to pass the training progress manually. And I also need to modify all the reset functions of the wrappers to allow the current_step to be passed into the reset function.
I've also tried to simple store a scalar value in the environment like self.num_episodes = 0, and call self.num_episodes = self.num_episodes + 1 in the reset function, unfortunately, this value never actually changes despite the reset calls. So I wonder if there's a way to achieve this without changing the training code of Brax itself.
Question:
Is there a better practice for passing or storing training progress information (like current_steps) in Brax for curriculum learning? Specifically:
Is modifying the training code the best approach, or can this be handled more elegantly by the environment itself?
Can we store or retrieve the training progress (e.g., current_steps) in the environment without needing to modify the reset function directly?
I’d appreciate any advice or best practices you can suggest for implementing this kind of feature in Brax.
Thanks for your help!
The text was updated successfully, but these errors were encountered:
Hi, I ended up modifying the step function AutoResetWrapper and adding an episode_num item to keep track of the number of episodes for each environment, the modified wrapper looks like this:
inside the step function of AutoResetWrapper:
and add the 'episode_num' key to the state dict elsewhere. You can also track the total number of environment steps similarly.
However, this walkaround can't keep track of the progress inside the environment class, so I'd like to keep the issue open for now.
Hi Brax team,
I’m working on a reinforcement learning project using Brax to train a PPO agent and I’m trying to implement curriculum learning by adjusting the environment's difficulty dynamically based on the training progress (e.g.,
current_steps
or number of episodes). My goal is to pass this information to the environment during training so that I can change certain parameters (like gravity, object mass, etc.) as the agent progresses.I’ve thought of a solution where I modify the training code to pass the current training progress into the environment’s
reset
function. Here’s a simplified example of what I have in mind:However, this requires modifying the
reset_fn
in the training loop (brax/training/agents/ppo/train.py
) to pass the training progress manually. And I also need to modify all thereset
functions of the wrappers to allow thecurrent_step
to be passed into thereset
function.I've also tried to simple store a scalar value in the environment like
self.num_episodes = 0
, and callself.num_episodes = self.num_episodes + 1
in thereset
function, unfortunately, this value never actually changes despite thereset
calls. So I wonder if there's a way to achieve this without changing the training code of Brax itself.Question:
Is there a better practice for passing or storing training progress information (like
current_steps
) in Brax for curriculum learning? Specifically:Is modifying the training code the best approach, or can this be handled more elegantly by the environment itself?
Can we store or retrieve the training progress (e.g., current_steps) in the environment without needing to modify the reset function directly?
I’d appreciate any advice or best practices you can suggest for implementing this kind of feature in Brax.
Thanks for your help!
The text was updated successfully, but these errors were encountered: