-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mlx-vlm examples in swift #132
Comments
e.g.
|
Currently, I am working on porting Llama 3.2 VLM to Swift. It would be great if we could make the vlm a separate package so that people can easily pull it down as a dependency and integrate it into their applications, for example, add vlm support for ChatMLX. |
If someone can put together the basic pipeline for one vision model, I can probably port the others to Swift fairly quickly. |
I am working on it right now and have paligemma done (well, not debugged but callable). I am working on how to structure the code with regard to the LLM library -- they should share code where possible. I will try and put up the branch with what I have today. Next week will be busy so it might be two weeks from now before it is really ready. |
Fantastic, thank you! Once that's in place, I'll start working on some of the other models (and will post here first to avoid duplication of work). |
- based on models from https://github.com/Blaizzy/mlx-vlm - for #132
- based on models from https://github.com/Blaizzy/mlx-vlm - for #132
OK, you can see what I have -- more work to be done but the eval loop is worked out. |
This continues -- I have most of the refactoring done and As mentioned before this will be a breaking change in the API (so I will do a major version bump) but it should be pretty easy to adopt. Hopefully a new |
Thanks @davidkoski, your work is much appreciated! Once the API is stable, I'll try to port some of the other VLMs. |
- based on models from https://github.com/Blaizzy/mlx-vlm - for #132
- based on models from https://github.com/Blaizzy/mlx-vlm - for #132
@davidkoski @DePasqualeOrg did either of you get qwen 2 vl working in swift? |
It is implemented in the branch right now but still lacks the image processor -- that is what I am starting on next. |
you are doing god's work @davidkoski ! If you need help lmk! Also do you know what would be necessary to go from image processing to video processing? |
@davidkoski Blaizzy/mlx-vlm#97 here is a PR from mlx-vlm that might help! |
Yes, this first version won't have it but it should be straightforward to add. Qwen2VL treats an array of images and a video roughly the same but handles them slightly different in the processor. The video ends up with a different value in the |
yes youre right about the array of image handling! I tried out a rough version of Qwen2VL and the memory usage on any reasonably sized video is absurd! Seems like this might not be the architecture to support practical on device video processing... btw, @davidkoski is there a way to set up a LLM api on MLX as is done with llama.cpp or tools like LM Studio? I have done this with llama.cpp but want to have the performance boost of MLX to see whats possible :) Thanks again for all your great work I know you have been really involved with MLX from the start! |
I am not sure what kind of API you mean -- certainly there is an API for preparing a prompt and generating tokens, but I think you mean something different. Probably the answer is yes, but it might be something you would have to build, e.g. if you wanted a web service. |
Consider porting some models from https://github.com/Blaizzy/mlx-vlm to swift
The text was updated successfully, but these errors were encountered: