Models #4

fakerybakery · 2024-06-13T00:12:58Z

Hi, thanks for releasing Samba! Are there any plans to release the pretrained models? Thanks!

0wwafa · 2024-06-14T18:26:54Z

Yep! It would be great to see the 3B and 7B or 8B models...

renll · 2024-06-14T22:46:45Z

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

fakerybakery · 2024-06-14T22:49:03Z

Nice! Please release the base models and smaller models!

AshD · 2024-06-15T18:58:25Z

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great.
How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

maksymdolgikh · 2024-06-18T12:22:14Z

Samba-421M

I used Microsoft's Deberta-V3 models A LOT in different projects, because they are so small and I can run quick experiments with them at home. So I am really looking forward to a new small model :)

renll · 2024-06-28T21:27:03Z

After the internal business review, we are sorry to hear that we can not release the Samba 421M and 1.3B models trained on SlimPajama. This is because the SlimPajama dataset contains the Books3 dataset which has copyright infringement.🥲 We will continue to push the releasing of Samba 1.7B and 3.8B models trained on the Phi datatsets.

renll · 2024-06-28T21:40:22Z

Releasing of the 3.8B insturction-tuned model is on the plan! We may also release smaller base models like Samba-421M and Samba-1.3B trained on SlimPajama (hopefully in a week or two). We currently don't have the plan to train a larger 7B model due to the shortage of GPUs. 🥲

This architecture looks great. How much GPU time would be required to train a 7B and 14B model? In your opinion, would it be able to beat a Llama 3 70B transformer model in the benchmarks?

It depends on how many tokens we want to train it on. The Samba 3.8B model takes around the same amount of GPU time as the Phi3 models. I personally think it is definitely possible to beat Llama3 70B on benchmarks with better data mixtures customized for a 14B model.

helldog-star mentioned this issue Jun 19, 2024

weight access #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models #4

Models #4

fakerybakery commented Jun 13, 2024

0wwafa commented Jun 14, 2024

renll commented Jun 14, 2024

fakerybakery commented Jun 14, 2024

AshD commented Jun 15, 2024

maksymdolgikh commented Jun 18, 2024

renll commented Jun 28, 2024 •

edited

Loading

renll commented Jun 28, 2024

Models #4

Models #4

Comments

fakerybakery commented Jun 13, 2024

0wwafa commented Jun 14, 2024

renll commented Jun 14, 2024

fakerybakery commented Jun 14, 2024

AshD commented Jun 15, 2024

maksymdolgikh commented Jun 18, 2024

renll commented Jun 28, 2024 • edited Loading

renll commented Jun 28, 2024

renll commented Jun 28, 2024 •

edited

Loading