Skip to content

sparkmxy/my-offlinerl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is forked from OfflineRL for a course assignment. I merely made some optimizations to the MOPO algorithm, and only changed the file offlinerl/algo/modelbase/mopo.py on top of the original repository.

If you want to use it, I strongly recommand you work on the original repository.


OfflineRL

OfflineRL is a repository for Offline RL (batch reinforcement learning or offline reinforcement learning).

Re-implemented Algorithms

Model-free methods

  • CQL: Kumar, Aviral, et al. “Conservative Q-Learning for Offline Reinforcement Learning.” Advances in Neural Information Processing Systems, vol. 33, 2020. paper code
  • PLAS: Zhou, Wenxuan, et al. “PLAS: Latent Action Space for Offline Reinforcement Learning.” ArXiv Preprint ArXiv:2011.07213, 2020. website paper code
  • BCQ: Fujimoto, Scott, et al. “Off-Policy Deep Reinforcement Learning without Exploration.” International Conference on Machine Learning, 2018, pp. 2052–2062. paper code

Model-based methods

  • COMBO: Yu, Tianhe, et al. "COMBO: Conservative Offline Model-Based Policy Optimization." arXiv preprint arXiv:2102.08363 (2021). paper
  • MOPO: Yu, Tianhe, et al. “MOPO: Model-Based Offline Policy Optimization.” Advances in Neural Information Processing Systems, vol. 33, 2020. paper code

Install Datasets

NeoRL

git clone https://agit.ai/Polixir/neorl.git
cd neorl
pip install -e .

For more details on use, please see neorl.

D4RL (Optional)

pip install git+https://github.com/rail-berkeley/d4rl@master#egg=d4rl

For more details on use, please see d4rl.

Install offlinerl

pip install -e .

Example

# Training in HalfCheetah-v3-L-9 task using default parameters of cql algorithm
python examples/train_task.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 99

# Parameter search in the default parameter space using the cql algorithm in the HalfCheetah-v3-L-9 task
python examples/train_tune.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 99

# Training in D4RL halfcheetah-medium task using default parameters of cql algorithm (D4RL need to be installed)
python examples/train_d4rl.py --algo_name=cql --exp_name=d4rl-halfcheetah-medium-cql --task d4rl-halfcheetah-medium-v0

Parameters:

  • algo_name: Algorithm name . There are now bc, cql, plas, bcq and mopo algorithms available.
  • exp_name: Experiment name for easy visualization using aim.
  • task: Task name, See neorl for details.
  • task_data_type: Data level. Each task collects data using low, medium, and high level strategies in neorl.
  • task_train_num: Number of training data trajectories. For each task, neorl provides training data for up to 9999 trajectories.

View experimental results

We use Aim to store and visualize results. Aim is an experiment logger that is easy to manage thousands of experiments. For more details, see aim.

To visualize results in this repository:

cd offlinerl_tmp
aim up

Then you can see the results on http://127.0.0.1:43800.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages