MSE computation issue #165

svsawant · 2024-09-24T09:18:47Z

In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse results from directly policy evaluation through rl_experiment.sh (A deeper dive suggests issue in how mse is handled in "RecordEpisodeStatistics")

adamhall · 2024-09-24T14:07:21Z

Thanks @svsawant, do you have a simple example we could to take a look?

svsawant · 2024-09-28T09:40:49Z

To replicate, consider the following test case. Train an RL controller with a quadrotor and go through the logs. Then, execute the trained policy using rl_experiment.sh which again prints out the run stats. The mse values from training run (after taking a square root) are higher than rmse values printed in policy execution.
Here's a test run I did for PPO with quadrotor (with attitude control interface).

Next, the run stats from the policy evaluation

Federico-PizarroBejarano assigned svsawant Sep 24, 2024

Federico-PizarroBejarano added the bug Something isn't working label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSE computation issue #165

MSE computation issue #165

svsawant commented Sep 24, 2024

adamhall commented Sep 24, 2024

svsawant commented Sep 28, 2024

MSE computation issue #165

MSE computation issue #165

Comments

svsawant commented Sep 24, 2024

adamhall commented Sep 24, 2024

svsawant commented Sep 28, 2024