Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal-flash-attention support #22

Open
czkoko opened this issue Aug 21, 2023 · 5 comments
Open

metal-flash-attention support #22

czkoko opened this issue Aug 21, 2023 · 5 comments

Comments

@czkoko
Copy link

czkoko commented Aug 21, 2023

Can this project help for you? https://github.com/philipturner/metal-flash-attention

So far, metal-flash-attention can indeed provide the fastest generation speed for stable diffusion on MacOS.

@leejet
Copy link
Owner

leejet commented Aug 21, 2023

Thank you for the feedback. I'm currently focusing on making it run faster, and I'll make time to take a look at this project and see if I can offer any assistance.

@sroussey
Copy link

I came here to suggest the same thing.

@Green-Sky
Copy link
Contributor

linking this here for reference ggerganov/ggml#293

@GaidamakUA
Copy link

What needs to be done to make this happen? I'm not very good with cpp, but I want to help.

@Green-Sky
Copy link
Contributor

Green-Sky commented Sep 12, 2024

#386
I am in the dark on metal flash attention support, or metal support in general.
So would be nice if someone with the hardware could test the pr. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants