You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse results from directly policy evaluation through rl_experiment.sh (A deeper dive suggests issue in how mse is handled in "RecordEpisodeStatistics")
The text was updated successfully, but these errors were encountered:
To replicate, consider the following test case. Train an RL controller with a quadrotor and go through the logs. Then, execute the trained policy using rl_experiment.sh which again prints out the run stats. The mse values from training run (after taking a square root) are higher than rmse values printed in policy execution.
Here's a test run I did for PPO with quadrotor (with attitude control interface).
Next, the run stats from the policy evaluation
In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse results from directly policy evaluation through rl_experiment.sh (A deeper dive suggests issue in how mse is handled in "RecordEpisodeStatistics")
The text was updated successfully, but these errors were encountered: