Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft Actor-Critic (SAC) Trainer #2517

Open
3 tasks
AMindToThink opened this issue Dec 23, 2024 · 0 comments
Open
3 tasks

Soft Actor-Critic (SAC) Trainer #2517

AMindToThink opened this issue Dec 23, 2024 · 0 comments
Labels
✨ enhancement New feature or request

Comments

@AMindToThink
Copy link

AMindToThink commented Dec 23, 2024

Method description

Soft Actor-Critic is popular reinforcement learning algorithm that meets or exceeds the performance of PPO on a variety of tasks. Oddly, it is not used for LLM post-training, and I have not been able to find a satisfactory explanation as to why that is. I intend to do a research project to investigate how the Soft Actor-Critic performs for RL-based optimization. The hope is that SAC's entropy maximization results in better exploration, more varied responses, and perhaps make the LLM more robust to jailbreaking due to having more varied experience. If this hypothesis is true, then it would allow the community to create more interesting and robust LLMs that maintain alignment.

The SAC algorithm is given in this paper.

I plan to first implement and evaluate the algorithm as written. I'll compare it against PPO on time to train, performance, robustness to jailbreaks, and output diversity.

Then, I will try to improve the algorithm. Notably, SAC (as written) has 4 additional models, which would be prohibitively expensive in vRAM cost for most users since each would require a large set of parameters. Switching from Clipped Double Q-learning to Double DQN or regularization techniques like CQL may maintain performance and reduce the models to 1 or 2 additional models.

If I find that SAC or my variations offer a useful improvement along any of the dimensions described, I'll offer a pull request.

image

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

No response

@August-murr August-murr added the ✨ enhancement New feature or request label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants