Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported op 'MUL_MAT' #4998

Closed
NeevJewalkar opened this issue Jan 17, 2024 · 12 comments
Closed

unsupported op 'MUL_MAT' #4998

NeevJewalkar opened this issue Jan 17, 2024 · 12 comments

Comments

@NeevJewalkar
Copy link

ggml_metal_graph_compute_block_invoke: error: unsupported op 'MUL_MAT'
GGML_ASSERT: ggml-metal.m:779: !"unsupported op"

system: Mac Book Air (intel)
Happens when i try to run phi-2

@XieWeikai
Copy link

I encountered the same problem

@ggerganov
Copy link
Owner

Likely your device is missing Apple7 family feature set (more info: #4794)

Show the logs of ggml_metal_init to confirm:

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ggerganov/development/github/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true      <---- this is required for llama.cpp
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 154618.82 MB

@NeevJewalkar
Copy link
Author

NeevJewalkar commented Jan 18, 2024

yep, that value is set to false:

ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) UHD Graphics 617
ggml_metal_init: picking default device: Intel(R) UHD Graphics 617
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/neevjewalkar/Documents/Dev/llama/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Intel(R) UHD Graphics 617
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = false            <----
ggml_metal_init: simdgroup matrix mul. support = false
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1610.61 MB

is there any fix?

@MBarti
Copy link

MBarti commented Jan 18, 2024

Same problem on Macbook Pro 2018 16GB (Intel, AMD Radeon Pro 555X)
ML Model: mistral-7b-dpo-v5.Q6_K.gguf

ggml_metal_graph_compute_block_invoke: error: unsupported op 'MUL_MAT'
GGML_ASSERT: ggml-metal.m:779: !"unsupported op"
Abort trap: 6

also:
ggml_metal_graph_compute_block_invoke: error: unsupported op 'RMS_NORM'

@ggerganov
Copy link
Owner

The only way is to implement the respective Metal kernels without using simd_ calls. It's not very difficult, but I don't plan on officially supporting it as it will increase the Metal code by a lot and I'm not convinced it will result in significant gains compared to CPU-only for these machines.

If somebody implements the kernels, we can put them in ggml-metal-intel.metal and have them build as a separate backend for Intel machines

@NeevJewalkar
Copy link
Author

NeevJewalkar commented Jan 20, 2024

From what i understood, this error occurs due to the fact that all this is running on gpu. This may seem dumb but how do i run llama.cpp on the cpu instead?
Edit: I tried running the model on Langchain using llamacpp and it works, so why doesnt it work when i try to run the model using llama.cpp in the terminal?

@ggerganov
Copy link
Owner

Most of the examples support -ngl 0 argument which would make llama.cpp not use the GPU

@0xez
Copy link

0xez commented Feb 13, 2024

Most of the examples support -ngl 0 argument which would make llama.cpp not use the GPU

This worked for me, thank you!

I was running llama-2-7b-chat.Q4_K_M on my macbook pro 2016 (8G DRAM), and got an error said:
ggml_metal_graph_compute_block_invoke: error: unsupported op 'RMS_NORM'
The -ngl 0 param solved the problem.

It was running successfully right now, but very slow, almost take 2~3min to predict each word. I could see very high IO load via iostat command and also the cpu-sys was high too, which means it was trying to swap data between disk and memory.

Conclusion: I need a new macbook definitely!

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 3, 2024
@lalitya-sawant
Copy link

passing param -ngl 0 param solved the problem.

@Umkus
Copy link

Umkus commented Jun 18, 2024

This used to help be on my mid 2015 macbook, but no more 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants