diff --git a/index.html b/index.html index 83e15d1..07e079c 100644 --- a/index.html +++ b/index.html @@ -304,10 +304,11 @@

Approach

Metrics for Real-to-Sim Evaluation

- We propose the Mean Maximum Rank Violation (MMRV) metric to better assess the real-and-sim policy ranking consistency. - The key underlying quantity is the rank violation between two policies, which weighs the significance of the - simulator incorrectly ranking the items by the corresponding margin in real-world performance. - MMRV aggregates the N^2 rank violations by averaging the worst-case rank violation for each policy. + Besides the traditional Pearson correlation metric ("r"), we also introduce the Mean Maximum Rank Violation (MMRV) metric (lower the better) + to assess the real-and-sim policy ranking consistency and address Pearson correlation's limitations. + The key underlying quantity is the rank violation between two policies, which weighs the significance of the + simulator incorrectly ranking the policies by the corresponding margin in real-world performance. + MMRV then aggregates the N^2 rank violations by averaging the worst-case rank violation for each policy.

Visual Matching to mitigate Real-to-Sim Visual Gap

@@ -346,10 +347,6 @@

System Identification to mitigate Real-to-Sim Control Gap

Control with SysID

- - @@ -365,10 +362,11 @@

System Identification to mitigate Real-to-Sim Control Gap

Applications

-

Evaluating Policies

+

Evaluating and Comparing Policies

- SIMPLER can be used to evaluate four types of high level tasks, with many intra-task variations, for each of two robot embodiments (Google Robot and WidowX. + SIMPLER can be used to evaluate four types of high level tasks, with many intra-task variations, for each of two robot embodiments (Google Robot and WidowX). + It can also be used to compare the performance of different policies and perform checkpoint selection.

@@ -394,7 +392,6 @@

Evaluating Policies

-
-

Paired Evaluations in Real and Sim

+
+ + + +
+ +
+

Studying and Predicting Policy Behaviors under Distribution Shifts

- Our approach yields a strong correlation between real-world and simulated performance for various open-source robot policies, - across two commonly used robot embodiments (Google Robot and WidowX) and over ∼1500 evaluation episodes. + SIMPLER can be used to study policies' finegrained behaviors, such as their robustness to common distribution shifts like lighting, background, camera pose, + distractor objects, and table texture changes. The simulation findings are highly correlated with those in the real-world. + Additionally, SIMPLER can be used to predict how policies will behave under novel distribution shifts in the real world, such as changes in arm textures. +

+
+ +
+ +
+
+ +
+

Gallery: Paired Evaluations in Real and Sim

+
+

+ SIMPLER yields a strong correlation between real-world and simulated performance across ∼1500 evaluation episodes.

Real World Rollouts for Google Robot

@@ -520,116 +538,8 @@

Simulation Rollouts for WidowX

- - - - - - - - - - - - - - - diff --git a/simpler.pdf b/simpler.pdf index 7982cb8..bbed884 100644 Binary files a/simpler.pdf and b/simpler.pdf differ diff --git a/static/images/results_bridge.png b/static/images/results_bridge.png new file mode 100644 index 0000000..6bd1382 Binary files /dev/null and b/static/images/results_bridge.png differ diff --git a/static/images/results_google_robot (copy).png b/static/images/results_google_robot (copy).png new file mode 100644 index 0000000..e5aa4c1 Binary files /dev/null and b/static/images/results_google_robot (copy).png differ diff --git a/static/images/results_google_robot.png b/static/images/results_google_robot.png new file mode 100644 index 0000000..02fe5f1 Binary files /dev/null and b/static/images/results_google_robot.png differ