-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mamba-ssm support #4830
Comments
I found a repo with basic support here: https://github.com/trap20/text-generation-webui/tree/mamba-ssm Since the owner didn't make a PR here and the branch had merge conflicts, I just took the changes and manually merged them into the recent main. Here is the pull request: #5228 |
Training support has been added too. |
@IggoOnCode , do you have any plans to support this? I'd love to see Mamba make its way to text-gen-webui, or just a fork with support. With all the benefits is has over transformers I suspect it'll grow in popularity. Especially considering it's better efficiency in both space and compute a low parameter counts it'll likely be the best option for local LLMs and edge AI. |
@hchasens Luckily, I don't need to. At least not as stand-alone solution. Mamba support got merged into transformers two days ago. I just tried the transformers main branch in text-generation-webui and inference of the demo mamba models from ArthurZ works out of the box. For the original mamba model from state-spaces I'm trying to find to the correct config now. Then I'll try training After the update to the next transformer release text-generation-webui will get Mamba support. |
This is awesome news! Would you know if there were any API changes or will it work out of the box with text-gen? |
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
Description
Recently new SSM-based mamba was released trained on 300B tokens. It already has weights on HF. The issue is it's weights only. Repo has no tokenizer(uses neox), doesn't have custom
modeling_mamba
to usetrust_remote_code
with the standard loader.So request it to add new loader mamba-ssm to be able to use it.
Additional Context
Example of generation from the official repo
The text was updated successfully, but these errors were encountered: