Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Together AI Inference Engine #1271

Closed
nikshepsvn opened this issue Nov 20, 2023 · 3 comments
Closed

Together AI Inference Engine #1271

nikshepsvn opened this issue Nov 20, 2023 · 3 comments
Labels

Comments

@nikshepsvn
Copy link

nikshepsvn commented Nov 20, 2023

Feature request

Saw a blog post where together.ai is advertising 3x inference performance via their API, I'm sure there are some optimization techniques they are using this repo can benefit from
https://www.together.ai/blog/together-inference-engine-v1

Motivation

Faster inference!

Your contribution

Happy to help if there is overlap with my skillset

@Narsil
Copy link
Collaborator

Narsil commented Dec 5, 2023

They are using medusa version, which is just a different model.

It's going to get support very soon #1308, but it will require creating those medusa models to make it fast, and there are very little open source ones currently (although we hope people will add more since the speedup is quite significant, even more than advertised here).

The PR is also going to add support for regular speculative decoding, which should get you a significant speedup on any model too actually without needing any modifications.

@fancyerii
Copy link

@nikshepsvn it seems together-inference-engine-v1 is not a open source project? We can only host our model to its cloud but can not deploy them locally.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 14, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants