You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saw a blog post where together.ai is advertising 3x inference performance via their API, I'm sure there are some optimization techniques they are using this repo can benefit from https://www.together.ai/blog/together-inference-engine-v1
Motivation
Faster inference!
Your contribution
Happy to help if there is overlap with my skillset
The text was updated successfully, but these errors were encountered:
They are using medusa version, which is just a different model.
It's going to get support very soon #1308, but it will require creating those medusa models to make it fast, and there are very little open source ones currently (although we hope people will add more since the speedup is quite significant, even more than advertised here).
The PR is also going to add support for regular speculative decoding, which should get you a significant speedup on any model too actually without needing any modifications.
@nikshepsvn it seems together-inference-engine-v1 is not a open source project? We can only host our model to its cloud but can not deploy them locally.
Feature request
Saw a blog post where together.ai is advertising 3x inference performance via their API, I'm sure there are some optimization techniques they are using this repo can benefit from
https://www.together.ai/blog/together-inference-engine-v1
Motivation
Faster inference!
Your contribution
Happy to help if there is overlap with my skillset
The text was updated successfully, but these errors were encountered: