-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vision models #150
Comments
We had intended on merging vllm support soon, we started it here: this is what we think an outline of what it should look like, basically we want to introduce a --runtime flag, kinda like like the podman one that switches between crun, runc, krun, but in this case allows one to switch between llama.cpp, vllm, and whatever other runtimes people would like to integrate in future. Above is a key feature we want, it's one of the reasons we don't simply use Ollama. Now that we have a vllm v0.6.1 , we are ready to complete that work: Vision models like this would be useful for sure. Personally I'm gonna be out a little bit in the next week or two, have a wedding and other things I need to take some time for. Anybody who wants to pick up --runtime, vllm support, vision model support, like you @p5 or others, be my guest. |
@p5 still interested in this? |
Hey Dan, Eric My free time is very limited at the minute. Starting a new job in 2 weeks and there's a lot to get in order. I still feel vision models would be a great addition to ramalama, but I'm going to be in a Windows-only environment :sigh: so unsure how much I'll be able to help out. |
Thanks @p5, good luck with the new job. |
Best of luck @p5 @bmahabirbu did have success running on Windows recently: https://github.com/containers/ramalama/tree/main/docs/readme |
This comment has been minimized.
This comment has been minimized.
Indirectly maybe, we inherit from the same backend llama.cpp, we don't actually use any Ollama stuff directly even though to a user it might appear that way! |
Oh, apologies. I thought Ramalama used both llama.cpp and ollama runtimes 🤦 |
And we wrote the Ollama transport from scratch, so we use zero Ollama code. What a lot of people don't realize is it's llama.cpp that does most of the heavy lifting for Ollama. |
Value Statement
As someone who wants a boring way to use AI
I would like to expose an image/PDF/document to the LLM
So that I can make requests and extract information, all within Ramalama
Notes
Various models now contain vision functionality, where they can ingest data from images, and answer questions about those images. Recently, the accuracy of these LLM-based OCR text extractions can exceed that of dedicated OCR tooling (even paid products like AWS Textract). The same vision models can also be used to extract information from PDF documents fairly easily after converting them to images.
We can use a similar interface to the planned Whisper.cpp implementation, since both are just contexts or data we provide to the LLMs. This has not been detailed anywhere, so below is a proposal/example of how it could look.
The primary issue is neither ollama or llama.cpp support vision models at this moment, so would either need a custom implementation, or would require adding something like vllm.
The text was updated successfully, but these errors were encountered: