doc: add benchmark section

- add state and action space descriptions - add benchmark details
Farama-Foundation · Sep 3, 2024 · 9c8a992 · 9c8a992
1 parent 16480c0
commit 9c8a992
Show file tree

Hide file tree

Showing 14 changed files with 146 additions and 0 deletions.
diff --git a/docs/_static/ml1-1.gif b/docs/_static/ml1-1.gif
diff --git a/docs/_static/ml1.gif b/docs/_static/ml1.gif
diff --git a/docs/_static/ml10-1.gif b/docs/_static/ml10-1.gif
diff --git a/docs/_static/ml10.gif b/docs/_static/ml10.gif
diff --git a/docs/_static/ml45-1.gif b/docs/_static/ml45-1.gif
diff --git a/docs/_static/ml45.gif b/docs/_static/ml45.gif
diff --git a/docs/_static/mt1-1.gif b/docs/_static/mt1-1.gif
diff --git a/docs/_static/mt1.gif b/docs/_static/mt1.gif
diff --git a/docs/_static/mt10-1.gif b/docs/_static/mt10-1.gif
diff --git a/docs/benchmark/action_space.md b/docs/benchmark/action_space.md
@@ -0,0 +1,17 @@
+---
+layout: "contents"
+title: Action Space
+firstpage:
+---
+
+# Action Space
+
+The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```.
+An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control.
+
+| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit |
+|-----|--------|-------------|-------------|---------------------|-------|------|
+| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) |
+| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) |
+| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) |
+| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) |
diff --git a/docs/benchmark/benchmark_descriptions.md b/docs/benchmark/benchmark_descriptions.md
@@ -0,0 +1,82 @@
+---
+layout: "contents"
+title: Benchmark Descriptions
+firstpage:
+---
+
+# Benchmark Descriptions
+
+The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL).
+Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL.
+Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase.
+
+## Multi-Task Problems
+
+The multi-task setting challenges the agent to learn a predefined set of skills simultaneously.
+Below, different levels of difficulty are described.
+
+### Multi-Task (MT1)
+
+In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object.
+There is no testing of generalization involved in this setting.
+
+```{figure} _static/mt1.gif
+   :alt: Multi-Task 1 
+   :width: 500
+```
+
+### Multi-Task (MT10)
+
+The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below.
+There is no testing of generalization involved in this setting.
+
+
+
+```{figure} _static/mt10.gif
+   :alt: Multi-Task 10 
+   :width: 500
+```
+
+### Multi-Task (MT50)
+
+In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld.
+This is the most challenging multi-task setting and involves no evaluation on test tasks.
+
+
+## Meta-Learning Problems
+
+Meta-RL attempts to evaluate the [transfer learning](https://en.
+wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks.
+In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks.
+
+### Meta-RL (ML1)
+
+The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location.
+For the test evaluation, unseen goal locations are used to measure generalization capabilities.
+
+
+
+```{figure} _static/ml1.gif
+   :alt: Meta-RL 1 
+   :width: 500
+```
+
+
+### Meta-RL (ML10)
+
+The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase.
+
+```{figure} _static/ml10.gif
+   :alt: Meta-RL 10 
+   :width: 500
+```
+
+### Meta-RL (ML45)
+
+The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks.
+
+
+```{figure} _static/ml45.gif
+   :alt: Meta-RL 10 
+   :width: 500
+```
diff --git a/docs/benchmark/env_task_vs_task_init.md b/docs/benchmark/env_task_vs_task_init.md
diff --git a/docs/benchmark/state_space.md b/docs/benchmark/state_space.md
@@ -0,0 +1,37 @@
+---
+layout: "contents"
+title: State Space 
+firstpage:
+---
+
+# State Space
+
+The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments:
+
+| Num | Observation Description                       | Min     | Max     | Site Name (XML)        | Joint Name (XML) | Joint Type | Unit        |
+|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------|
+| 0   | End effector x position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
+| 1   | End effector y position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
+| 2   | End effector z position in global coordinates | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
+| 3   | Gripper distance apart                       | 0.0     | 1.0     | -                      | -                 | -          | dimensionless|
+| 4   | Object x position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 5   | Object y position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 6   | Object z position in global coordinates       | -Inf    | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 7   | Object x quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
+| 8   | Object y quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
+| 9   | Object z quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
+| 10  | Object w quaternion component in global coordinates | -Inf    | Inf | objGeom (derived)      | -                 | -          | quaternion  |
+| 11  | Previous end effector x position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
+| 12  | Previous end effector y position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)| 
+| 13  | Previous end effector z position              | -Inf    | Inf     | hand                   | -                 | -          | position (m)|
+| 14  | Previous gripper distance apart               | 0.0     | 1.0     | -                      | -                 | -          | dimensionless|
+| 15  | Previous object x position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 16  | Previous object y position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 17  | Previous object z position in global coordinates | -Inf | Inf     | objGeom (derived)      | -                 | -          | position (m)|
+| 18  | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
+| 19  | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
+| 20  | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
+| 21  | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion |
+| 22  | Goal x position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
+| 23  | Goal y position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
+| 24  | Goal z position                                | -Inf    | Inf     | goal (derived)         | -                 | -          | position (m)|
diff --git a/docs/index.md b/docs/index.md
@@ -47,6 +47,16 @@ rendering/rendering
 usage/basic_usage
 ```
 
+```{toctree}
+:hidden:
+:caption: Benchmark Information
+benchmark/state_space
+benchmark/action_space
+benchmark/benchmark_descriptions
+benchmark/env_tasks_vs_task_init
+benchmark/reward_functions
+```
+
 
 ```{toctree}
 :hidden: