The NEORL2 repository is an extension of the offline reinforcement learning benchmark NeoRL. The NEORL2 repository contains datasets for training and corresponding environments for testing the trained policies. The current datasets are collected from seven open-source environments: Pipeline, Simglucose, RocketRecovery, RandomFrictionHopper, DMSD, Fusion and SafetyHalfCheetah tasks. We perform online training using reinforcement learning algorithms or PID policies on these tasks and then select suboptimal policies with returns ranging from 50% to 80% of the expert's return to generate offline datasets for each task. These suboptimal policy-sampled datasets better align with real-world task scenarios compared to random or expert policy datasets.
NeoRL2 interface can be installed as follows:
git clone https://agit.ai/Polixir/neorl2.git
cd neorl
pip install -e .
After installation, Pipeline、Simglucose、RocketRecover、DMSD and Fusion environments will be available. However, the "RandomFrictionHopper" and "SafetyHalfCheetah" tasks rely on MuJoCo. If you need to use these two environments, it is necessary to obtain a license and follow the setup instructions, and then run:
pip install -e .[mujoco]
NeoRL2 uses the OpenAI Gym API. Tasks can be created as follows:
import neorl2
import gymnasium as gym
# Create an environment
env = gym.make("Pipeline")
env.reset()
env.step(env.action_space.sample())
After creating the environment, you can use the get_dataset()
function to obtain training data and validation data:
train_data, val_data = env.get_dataset()
Each environment supports setting and getting the reward function and done function of the environment, which is very useful for adjusting the environment settings when needed.
# Set reward function
env.set_reward_func(reward_func)
# Get reward function
env.get_reward_func(reward_func)
# Set done function
env.get_done_func(done_func)
# Get done function
env.set_done_func(done_func)
You can use the following environments now:
Env Name | observation shape | action shape | have done | max timesteps |
---|---|---|---|---|
Pipeline | 52 | 1 | False | 1000 |
Simglucose | 31 | 1 | True | 480 |
RocketRecovery | 7 | 2 | True | 500 |
RandomFrictionHopper | 13 | 3 | True | 1000 |
DMSD | 6 | 2 | False | 100 |
Fusion | 15 | 6 | False | 100 |
SafetyHalfCheetah | 18 | 6 | False | 1000 |
In NeoRL2, training data and validation data returned by get_dataset()
function are dict
with the same format:
-
obs
: An N by observation dimensional array of current step's observation. -
next_obs
: An N by observation dimensional array of next step's observation. -
action
: An N by action dimensional array of actions. -
reward
: An N dimensional array of rewards. -
done
: An N dimensional array of episode termination flags. -
index
: An trajectory number-dimensional array. The numbers in index indicate the beginning of trajectories.
Simglucose: Jinyu Xie. Simglucose v0.2.1 (2018) [Online]. Available: https://github.com/jxx123/simglucose. Accessed on: 5-17-2024. code
DMSD: Char, Ian, et al. "Correlated Trajectory Uncertainty for Adaptive Sequential Decision Making." NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World. 2023. paper code
MuJoCo: Todorov E, Erez T, Tassa Y. "Mujoco: A Physics Engine for Model-based Control." Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033, 2012. paper website
Gym: Brockman, Greg, et al. "Openai gym." arXiv preprint arXiv:1606.01540 (2016). paper code
All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.