Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models #4

Open
fakerybakery opened this issue Jun 13, 2024 · 7 comments
Open

Models #4

fakerybakery opened this issue Jun 13, 2024 · 7 comments

Comments

@fakerybakery
Copy link

Hi, thanks for releasing Samba! Are there any plans to release the pretrained models? Thanks!

@0wwafa
Copy link

0wwafa commented Jun 14, 2024

Yep! It would be great to see the 3B and 7B or 8B models...

@renll
Copy link
Collaborator

renll commented Jun 14, 2024

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

@fakerybakery
Copy link
Author

Nice! Please release the base models and smaller models!

@AshD
Copy link

AshD commented Jun 15, 2024

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great.
How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

@maksymdolgikh
Copy link

Samba-421M

I used Microsoft's Deberta-V3 models A LOT in different projects, because they are so small and I can run quick experiments with them at home. So I am really looking forward to a new small model :)

@renll
Copy link
Collaborator

renll commented Jun 28, 2024

After the internal business review, we are sorry to hear that we can not release the Samba 421M and 1.3B models trained on SlimPajama. This is because the SlimPajama dataset contains the Books3 dataset which has copyright infringement.🥲 We will continue to push the releasing of Samba 1.7B and 3.8B models trained on the Phi datatsets.

@renll
Copy link
Collaborator

renll commented Jun 28, 2024

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great. How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

It depends on how many tokens we want to train it on. The Samba 3.8B model takes around the same amount of GPU time as the Phi3 models. I personally think it is definitely possible to beat Llama3 70B on benchmarks with better data mixtures customized for a 14B model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants