Simplified Rating and Preference RL

This repository is meant to help improve ease of access to Reinforcement Learning from Human Feedback (RLHF) and provides a simplified and modernized implementation of Rating-based and Preference-based Reinforcement Learning (RbRL and PbRL). It uses the latest versions of dm_control, stable-baselines3, and gymnasium, ensuring compatibility with modern systems including Apple Silicon.

For more information see the respective papers:

Rating-Based Reinforcement Learning:

📄 Paper: Rating-Based Reinforcement Learning Paper
💻 Code: Rating-Based Reinforcement Learning Code

BPref: Benchmarking Preference-Based Reinforcement Learning:

📄 Paper: BPref Paper
💻 Code: BPref Code

✨ Recent Updates ✨

Implemented Entropy Reward for Initial Rollout
Adjusted to Use Hyperparameters from Rating-Based Reinforcement Learning Paper
Simplified Hyperparameter Tuning for easier experimentation.
Achieved Results Similar to Rating-Based Reinforcement Learning Paper in a single run.
Can now do any number of rating classes. Want to do 100 rating classes, try it.

See it for yourself

Videos like this are generated after 4,000,000 timesteps using 1,000 ratings! This only took 30 minutes on Mac M3 Max!

Experimental Results:

The figure below demonstrates the performance of this implementation on RbRL (2–6 ratings) and PbRL, achieving results similar to those in the original RbRL paper in a single run:

Key Features:

Simplified RbRL and PbRL: Easy-to-understand implementation of Rating-based and Preference-based RL algorithms.
Modernized Codebase: Utilizes the latest versions of dm_control, stable-baselines3, gymnasium, and mujoco.
Apple Silicon Compatibility: Designed to work seamlessly on Apple Silicon.
Stable-baselines 3 Integration: Leverages the structure and functionalities of stable-baselines3.
Custom Wrappers: Includes custom wrappers for DeepMind Control Suite to Gymnasium and DeepMind Control Suite Vectorized Environment.
Performance Visualization: Generates videos showcasing the model's performance after training.
Reward Correlation Analysis: Calculates the correlation between predicted and actual rewards.

Installation:

conda create -n simple_rlhf python=3.9
conda activate simple_rlhf
pip install -r requirements.txt

How to run

Choosing the environment you would like to run:

At the top of run_ppo.py, run_pref.py and run_ratings.py you will see:

env_name = 'walker'
task_name = 'walk'

This is the way you can set the environment name like cheetah, walker, quadruped, etc. with the corresponding task like run, walk, etc.

Rating-Based Reinforcement Learning:

For ratings you also are given the ability to change the number of rating classes this can be done by changing num_ratings in run_ratings.py, this can be a number from 2 classes to 6 classes in this implementation.

Once you have initialized the environment you would like, just run this command:

python run_ratings.py

NOTE: You may need to adjust the max_reward in reward predictor to get better results.

Preference-Based Reinforcement Learning:

Once you have initialized the environment you would like, just run this command:

python run_pref.py

PPO:

Once you have initialized the environment you would like, just run this command:

python run_ppo.py

Contributing:

Contributions are welcome! Feel free to open issues or submit pull requests.

Citing

@inproceedings{white2024rating,
  title={Rating-Based Reinforcement Learning},
  author={White, Devin and Wu, Mingkang and Novoseller, Ellen and Lawhern, Vernon J and Waytowich, Nicholas and Cao, Yongcan},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={9},
  pages={10207--10215},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Demos		Demos
a2c		a2c
common		common
ddpg		ddpg
dqn		dqn
her		her
ppo		ppo
ppo_reward		ppo_reward
sac		sac
td3		td3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dmc2gym.py		dmc2gym.py
pref_reward_predictor.py		pref_reward_predictor.py
requirements.txt		requirements.txt
reward_predictor.py		reward_predictor.py
run_ppo.py		run_ppo.py
run_pref.py		run_pref.py
run_ratings.py		run_ratings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplified Rating and Preference RL

✨ Recent Updates ✨

See it for yourself

Experimental Results:

Key Features:

Installation:

How to run

Choosing the environment you would like to run:

Rating-Based Reinforcement Learning:

Preference-Based Reinforcement Learning:

PPO:

Contributing:

Citing

About

Languages

License

Dev1nW/Simplified-Rating-and-Preference-RL

Folders and files

Latest commit

History

Repository files navigation

Simplified Rating and Preference RL

✨ Recent Updates ✨

See it for yourself

Experimental Results:

Key Features:

Installation:

How to run

Choosing the environment you would like to run:

Rating-Based Reinforcement Learning:

Preference-Based Reinforcement Learning:

PPO:

Contributing:

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages