diff --git a/index.html b/index.html index 83e15d1..07e079c 100644 --- a/index.html +++ b/index.html @@ -304,10 +304,11 @@
- We propose the Mean Maximum Rank Violation (MMRV) metric to better assess the real-and-sim policy ranking consistency. - The key underlying quantity is the rank violation between two policies, which weighs the significance of the - simulator incorrectly ranking the items by the corresponding margin in real-world performance. - MMRV aggregates the N^2 rank violations by averaging the worst-case rank violation for each policy. + Besides the traditional Pearson correlation metric ("r"), we also introduce the Mean Maximum Rank Violation (MMRV) metric (lower the better) + to assess the real-and-sim policy ranking consistency and address Pearson correlation's limitations. + The key underlying quantity is the rank violation between two policies, which weighs the significance of the + simulator incorrectly ranking the policies by the corresponding margin in real-world performance. + MMRV then aggregates the N^2 rank violations by averaging the worst-case rank violation for each policy.
Control with SysID
- - @@ -365,10 +362,11 @@- SIMPLER can be used to evaluate four types of high level tasks, with many intra-task variations, for each of two robot embodiments (Google Robot and WidowX. + SIMPLER can be used to evaluate four types of high level tasks, with many intra-task variations, for each of two robot embodiments (Google Robot and WidowX). + It can also be used to compare the performance of different policies and perform checkpoint selection.
- Our approach yields a strong correlation between real-world and simulated performance for various open-source robot policies, - across two commonly used robot embodiments (Google Robot and WidowX) and over ∼1500 evaluation episodes. + SIMPLER can be used to study policies' finegrained behaviors, such as their robustness to common distribution shifts like lighting, background, camera pose, + distractor objects, and table texture changes. The simulation findings are highly correlated with those in the real-world. + Additionally, SIMPLER can be used to predict how policies will behave under novel distribution shifts in the real world, such as changes in arm textures. +
++ SIMPLER yields a strong correlation between real-world and simulated performance across ∼1500 evaluation episodes.