-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models #4
Comments
Yep! It would be great to see the 3B and 7B or 8B models... |
Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲 |
Nice! Please release the base models and smaller models! |
This architecture looks great. |
I used Microsoft's Deberta-V3 models A LOT in different projects, because they are so small and I can run quick experiments with them at home. So I am really looking forward to a new small model :) |
After the internal business review, we are sorry to hear that we can not release the Samba 421M and 1.3B models trained on SlimPajama. This is because the SlimPajama dataset contains the Books3 dataset which has copyright infringement.🥲 We will continue to push the releasing of Samba 1.7B and 3.8B models trained on the Phi datatsets. |
It depends on how many tokens we want to train it on. The Samba 3.8B model takes around the same amount of GPU time as the Phi3 models. I personally think it is definitely possible to beat Llama3 70B on benchmarks with better data mixtures customized for a 14B model. |
Hi, thanks for releasing Samba! Are there any plans to release the pretrained models? Thanks!
The text was updated successfully, but these errors were encountered: