Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unite test "test-backend-ops" crashed on MacOS #4672

Closed
nguoithichkhampha opened this issue Dec 28, 2023 · 13 comments · Fixed by #4794
Closed

Unite test "test-backend-ops" crashed on MacOS #4672

nguoithichkhampha opened this issue Dec 28, 2023 · 13 comments · Fixed by #4794

Comments

@nguoithichkhampha
Copy link
Contributor

I'm using MacOS 13.6 (Intel chip). Here is stack trace
Screenshot 2023-12-29 at 12 16 00 AM

@slaren
Copy link
Collaborator

slaren commented Dec 29, 2023

There is a lot of missing output that would hint at the issue. I assume this is because the buffer allocation failed. I will add more checks so that these cases are detected and reported instead of crashing, but actually fixing this would require someone with an intel mac to figure what is the issue.

@nguoithichkhampha
Copy link
Contributor Author

LastTest.log
@slaren , I have uploaded log file when run ctest in verbose mode.
and you are right, seems there is an issue when alloc buffer
MOE(n_experts=8,n_experts_per_tok=2,n_tokens=1,n_embd=4096,n_ff=8192): ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer, size = 3072.58 MiB

@slaren
Copy link
Collaborator

slaren commented Jan 2, 2024

Thank you. The out of memory issue in the MoE test is not really a concern, it requires a larger buffer than can be allocated in your system. The log also shows that many MUL_MAT and MUL_MAT_ID tests are failing, and that's a problem since it may cause the Metal backend to produce wrong results silently. I think there are already some checks for the GPU family in the metal matrix multiplication, but it may not be enough.

@nguoithichkhampha
Copy link
Contributor Author

so, to debug this issue. I should look at the first time failed of MUL_MAT ?
MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.04 MiB, ( 17.11 / 1536.00) 19: [MUL_MAT] NMSE = 3.110048 FAIL

@slaren
Copy link
Collaborator

slaren commented Jan 2, 2024

All the failed MUL_MAT tests are important, not just the first one.

@ggerganov
Copy link
Owner

ggerganov commented Jan 6, 2024

@nguoithichkhampha Please checkout #4794 and try again:

make clean
make -j tests && ./tests/test-backend-ops -b Metal

If the matrix multiplication tests continue to fail, please run the following and post the output:

MTL_DEBUG_LAYER=1 ./tests/test-backend-ops -b Metal

@nguoithichkhampha
Copy link
Contributor Author

nguoithichkhampha commented Jan 7, 2024

test-metal-backend.txt
I see no more test failed but still getting the crash.
and I also get more clear stack trace

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x7ff80338b1e2 __pthread_kill + 10
1 libsystem_pthread.dylib 0x7ff8033c2ee6 pthread_kill + 263
2 libsystem_c.dylib 0x7ff8032e9b45 abort + 123
3 libsystem_c.dylib 0x7ff8032e8e5e __assert_rtn + 314
4 Metal 0x7ff80cbbd182 MTLReportFailure.cold.1 + 43
5 Metal 0x7ff80cb98bef MTLReportFailure + 529
6 Metal 0x7ff80cb8d4e0 _MTLMessageContextEnd + 1282
7 MetalTools 0x7ff803bcbefd -[MTLDebugDevice newBufferWithBytesNoCopy:length:options:deallocator:] + 237
8 test-backend-ops 0x107ee691c ggml_backend_metal_buffer_type_alloc_buffer + 252
9 test-backend-ops 0x107ecc44e ggml_backend_alloc_ctx_tensors_from_buft + 158
10 test-backend-ops 0x107e61155 test_case::eval(ggml_backend*, ggml_backend*, char const*) + 549
11 test-backend-ops 0x107e609d9 test_backend(ggml_backend*, test_mode, char const*) + 28505
12 test-backend-ops 0x107e598b1 main + 465
13 dyld 0x7ff80306941f start + 1903

Seems there is an assertion from OS to prevent alloc buffer more than 2048 MB

@nguoithichkhampha
Copy link
Contributor Author

I think this is make sense when my gpu only 1536 MB VRAM.
So, we should check max buffer length before call
ctx->buffers[0].metal = [device newBufferWithBytesNoCopy:ctx->all_data length:size_aligned options:MTLResourceStorageModeShared deallocator:nil];
in function ggml_backend_metal_buffer_type_alloc_buffer

@ggerganov
Copy link
Owner

ggerganov commented Jan 7, 2024

Yes, the MOE test is expected to fail due to out of memory - that's not a big concern.
The main problem is that your GPU should support the Metal3 feature set as defined by Apple's documentation:

https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

However, we currently fail to detect that:

Backend 2/2 (Metal)
ggml_metal_init: allocating
2024-01-07 17:36:34.077 test-backend-ops[2294:105408] Metal API Validation Enabled
ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: picking default device: Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/Emotiv/llama.cpp/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name:   Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = false
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1610.61 MB
ggml_metal_init: maxTransferRate               = built-in GPU

There should be a log message stating:

ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)

I just pushed another change to #4794 that would hopefully fix this.

@nguoithichkhampha
Copy link
Contributor Author

tried with latest commit. I see the message GPU family: MTLGPUFamilyMetal3 as the expectation but seems get another error and then an assertion

ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: picking default device: Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/Emotiv/llama.cpp/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name:   Intel(R) Iris(TM) Plus Graphics 650
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1610.61 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_metal_init: error: load pipeline error: Error Domain=CompilerError Code=2 "AIR builtin function was called but no definition was found." UserInfo={NSLocalizedDescription=AIR builtin function was called but no definition was found.}
GGML_ASSERT: /Users/Emotiv/llama.cpp/tests/test-backend-ops.cpp:1703: backend != NULL

@ggerganov
Copy link
Owner

Thanks! I think it should work now. When you get the chance - please give it another try with the latest version and if it fails, post the output again. It will be more verbose now

@nguoithichkhampha
Copy link
Contributor Author

ok, I read your code change. seems that my gpu does not support mul_mat.
But it still crashing and seems back to first issue
output.txt

@ggerganov
Copy link
Owner

Yup, it is unexpected that the MUL_MAT tests fail. Even though SIMD matrix multiplications are not available, it fallbacks to the other kernels which only use SIMD reductions - these should be supported and should work correctly. Not sure what is the issue in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants