Check CUDA memory pool support #3931

young-developer · 2023-11-03T12:41:22Z

Some devices dont support memory pools but still override mempool array element from nullptr to some garbage value. I added additional check from device properties.
Includes multiple GPU pool access support and memory pools access check.

cebtenzzre

This fixes the error I was seeing.

… memory pool

…sic error check.

young-developer · 2023-11-03T14:52:53Z

@cebtenzzre Please could you check with the latest changes because device property is not in CUDA 17 and it works only for 12+.

cebtenzzre · 2023-11-03T15:14:56Z

That does work, but the command line I need to build unmodified llama.cpp is getting long:

cmake -B build \
  -DLLAMA_CUBLAS=ON \
  -DCMAKE_CUDA_HOST_COMPILER=gcc-12 -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 \
  -DLLAMA_CUDA_FORCE_MMQ=ON \
  -DCMAKE_CUDA_FLAGS='-DGGML_CUDA_FORCE_CUSTOM_MEMORY_POOL' \
&& make -C build

Force gcc 12 because a recent CUDA update broke support for gcc 13
LLAMA_CUDA_FORCE_MMQ to avoid a massive performance hit since I don't have tensor cores
GGML_CUDA_FORCE_CUSTOM_MEMORY_POOL since one of my GPUs doesn't have memory pool support

If there is any benefit to the built-in memory pool, I'd like to have it enabled when I have my GTX 970 disabled with CUDA_VISIBLE_DEVICES. I'd also like to avoid the extra compile-time option, especially if we don't clearly document that this option is needed on older cards (MMQ isn't documented either). Could we check CUDART_VERSION and use prop.memoryPoolsSupported if available?

young-developer · 2023-11-03T15:27:29Z

@cebtenzzre Try without GGML_CUDA_FORCE_CUSTOM_MEMORY_POOL

cebtenzzre · 2023-11-03T15:31:48Z

Without that flag I get a similar error to what I was getting originally:

CUDA error 801 at /home/cebtenzzre/src/forks/llama.cpp/ggml-cuda.cu:6807: operation not supported
current device: 0

young-developer · 2023-11-03T16:20:09Z

So it somehow loaded the memory pool but fails on allocation to it. I can add an option based on CUDA version as you ve mentioned.

Ph0rk0z · 2023-11-03T21:42:44Z

I get cuda buffer pool full error when I try this out. Doesn't crash anymore but outputs one character over and over.

Identical behavior for dual 3090s and dual P40s.

… device properties checked.

young-developer · 2023-11-04T09:28:20Z

@cebtenzzre @Ph0rk0z I added different checks(device props, alloc/dealoc test) during init phase. If one of those checks failed than it uses custom pool implementation. Please retest with new changes.

Ph0rk0z · 2023-11-04T12:14:22Z

Well the check works. Now both with/without GGML_CUDA_FORCE_CUSTOM_MEMORY_POOL
scroll INCREASE_MAX_CUDA_BUFFERS at me and return one character over and over.

FWIW, I'm using cuda 11.8 in my python environment. It's been working for months. Did 12.1 become a requirement at some point, if so why?

young-developer · 2023-11-04T12:26:31Z

@Ph0rk0z So it looks like it is related to another bug (not related to cuda memory pool) if it shows you the same char over and over. Try to use different versions and track when it is stopped to work for you.

Ph0rk0z · 2023-11-04T13:32:18Z

The PR that merged yarn caused this issue for me.

…GPUs and main GPU.

cebtenzzre · 2023-11-04T16:40:12Z

Commit 56e5162 works fine individually on my GTX 970 and Tesla P40, but multi-GPU gives "an illegal memory access was encountered".

On 81931b2 I get this with multi-GPU:

main: build = 1487 (81931b2e)
main: built with gcc-12 (GCC) 12.3.0 for x86_64-pc-linux-gnu
main: seed  = 1699115827
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla P40, compute capability 6.1, CUDA memory pool is supported
  Device 1: NVIDIA GeForce GTX 970, compute capability 5.2Warning: Device 1 doesnt support CUDA memory pool, skipping pool access config
Cant give access for main device memory pool to device 1

CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:5929: operation not supported
current device: 0

ggml-cuda.cu

cebtenzzre · 2023-11-04T16:46:20Z

The PR that merged yarn caused this issue for me.

If you can reproduce the problem on latest master then you should open a new issue. There were some mistakes when I merged YaRN, but CUDA and Metal seem to be working fine for most people after the fixup PRs.

edit: I see that you already opened an issue. I don't think what you're seeing is directly related to the YaRN PR.

young-developer · 2023-11-04T16:51:59Z

Commit 56e5162 works fine individually on my GTX 970 and Tesla P40, but multi-GPU gives "an illegal memory access was encountered".

On 81931b2 I get this with multi-GPU:

main: build = 1487 (81931b2e)
main: built with gcc-12 (GCC) 12.3.0 for x86_64-pc-linux-gnu
main: seed  = 1699115827
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla P40, compute capability 6.1, CUDA memory pool is supported
  Device 1: NVIDIA GeForce GTX 970, compute capability 5.2Warning: Device 1 doesnt support CUDA memory pool, skipping pool access config
Cant give access for main device memory pool to device 1

CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:5929: operation not supported
current device: 0

Please retest once again.

cebtenzzre · 2023-11-04T17:01:20Z

Please retest once again.

The latest commit seems to work fine with either GPU or with both. Thanks!

young-developer · 2023-11-04T17:04:52Z

@cebtenzzre hm, so one uses CUDA pool and another custom pool , hm interesting. I have only one GPU so it is blind fixing :)

staviq · 2023-11-04T17:09:15Z

Still broken for me

RTX 2070
plus
p106-100

@ 4ff1046
works
@ d606905
CUDA error 1 at ggml-cuda.cu:7036: invalid argument

This PR
@ 863166b
CUDA error 217 at ggml-cuda.cu:6881: peer access is not supported between these two devices

cebtenzzre · 2023-11-04T17:12:18Z

Actually, I just tested with multi-GPU and full GPU offload instead of partial offload and got this console spam:

WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
...

So, not fixed yet.

staviq · 2023-11-04T17:14:57Z

@young-developer

I have only one GPU so it is blind fixing :)

I have my RTX2070 and p106 together in a VM, if you send me your ssh pub key and IP address I can let you in for testing, staviq at gmail.com

young-developer · 2023-11-04T17:36:50Z

I think I will change CUDA mem pool to optional because different multiple GPU can be used and will take me some time to check most cases.

young-developer · 2023-11-04T17:47:05Z

@cebtenzzre @staviq I added LLAMA_CUDA_USE_CUDA_POOL so you can recompile if you want to test multiple GPUs using CUDA pools. Once it will be stable we can enable it by default.

slaren · 2023-11-04T17:53:04Z

This is getting more complex than I would like for what is supposed to be a temporary solution, so if we have to resort to disabling this by default I would prefer to revert the original PR instead.

cebtenzzre · 2023-11-04T17:54:27Z

I added LLAMA_CUDA_USE_CUDA_POOL so you can recompile if you want to test multiple GPUs using CUDA pools. Once it will be stable we can enable it by default.

With 2b0303a I am back to this with multi-GPU (LLAMA_CUDA_USE_CUDA_POOL is OFF):

CUDA error 700 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:7178: an illegal memory access was encountered
current device: 1

young-developer · 2023-11-04T17:57:23Z

This is getting more complex than I would like for what is supposed to be a temporary solution, so if we have to resort to disabling this by default I would prefer to revert the original PR instead.

Yeap. Definitely an option. Then I will close this PR.

Ph0rk0z · 2023-11-04T23:30:46Z

Actually, I just tested with multi-GPU and full GPU offload instead of partial offload and got this console spam:
WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
WARNING: cuda buffer pool full, increase MAX_CUDA_BUFFERS
...
So, not fixed yet.

This is the error I was talking about. Did it also output nonsense?

After yarn I had the allocation error I mentioned and the model would never load. Subsequent commits brought it to this point. I just reference it because that is the point I stopped being able to span a model across multiple GPUs.

cebtenzzre · 2023-11-05T00:19:55Z

This is the error I was talking about. Did it also output nonsense?

If #3944 works for you then YaRN is no longer a problem. I can't check it right now because my GPU is busy, but I've basically reverted the same commits without issue on my local fork.

Ph0rk0z · 2023-11-05T12:30:27Z

It's working now finally. Also pascal prompt processing is fixed without: #3816

Speed is mostly the same, only a few .10 difference. Not for the better but oh well.

Check CUDA memory support in device properties.

ce4df17

young-developer mentioned this pull request Nov 3, 2023

CUDA memory pool with async memory allocation/deallocation #3903

Merged

set nullptr to memory pool element if it failed during initialization.

bd56886

cebtenzzre approved these changes Nov 3, 2023

View reviewed changes

young-developer added 2 commits November 3, 2023 15:06

GGML_CUDA_FORCE_CUSTOM_MEMORY_POOL was added to force use only custom…

c42ca8f

… memory pool

prop.memoryPoolsSupported cant be found in cuda 17. Revert back to ba…

815bf1a

…sic error check.

young-developer mentioned this pull request Nov 3, 2023

Multi-GPU has been broken for me recently. ggml-cuda.cu:7068: invalid argument #3930

Closed

All memory pool operation are checked during init phase. For CUDA 12+…

56e5162

… device properties checked.

young-developer changed the title ~~Check CUDA memory pool support in device properties.~~ Check CUDA memory pool support Nov 4, 2023

Multi GPU memory pool access + Check memory pool support of multiple …

81931b2

…GPUs and main GPU.

cebtenzzre reviewed Nov 4, 2023

View reviewed changes

ggml-cuda.cu Outdated Show resolved Hide resolved

Skip GPUs without mem pool support.

863166b

CUDA pool is optional now.

2b0303a

young-developer closed this Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check CUDA memory pool support #3931

Check CUDA memory pool support #3931

young-developer commented Nov 3, 2023 •

edited

Loading

cebtenzzre left a comment

young-developer commented Nov 3, 2023

cebtenzzre commented Nov 3, 2023

young-developer commented Nov 3, 2023

cebtenzzre commented Nov 3, 2023

young-developer commented Nov 3, 2023

Ph0rk0z commented Nov 3, 2023 •

edited

Loading

young-developer commented Nov 4, 2023

Ph0rk0z commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading

Ph0rk0z commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023

young-developer commented Nov 4, 2023 •

edited

Loading

staviq commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 •

edited

Loading

staviq commented Nov 4, 2023

young-developer commented Nov 4, 2023

young-developer commented Nov 4, 2023 •

edited

Loading

slaren commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading

Ph0rk0z commented Nov 4, 2023

cebtenzzre commented Nov 5, 2023

Ph0rk0z commented Nov 5, 2023

Check CUDA memory pool support #3931

Check CUDA memory pool support #3931

Conversation

young-developer commented Nov 3, 2023 • edited Loading

cebtenzzre left a comment

Choose a reason for hiding this comment

young-developer commented Nov 3, 2023

cebtenzzre commented Nov 3, 2023

young-developer commented Nov 3, 2023

cebtenzzre commented Nov 3, 2023

young-developer commented Nov 3, 2023

Ph0rk0z commented Nov 3, 2023 • edited Loading

young-developer commented Nov 4, 2023

Ph0rk0z commented Nov 4, 2023 • edited Loading

young-developer commented Nov 4, 2023 • edited Loading

Ph0rk0z commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 • edited Loading

young-developer commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023

young-developer commented Nov 4, 2023 • edited Loading

staviq commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 • edited Loading

staviq commented Nov 4, 2023

young-developer commented Nov 4, 2023

young-developer commented Nov 4, 2023 • edited Loading

slaren commented Nov 4, 2023

cebtenzzre commented Nov 4, 2023 • edited Loading

young-developer commented Nov 4, 2023 • edited Loading

Ph0rk0z commented Nov 4, 2023

cebtenzzre commented Nov 5, 2023

Ph0rk0z commented Nov 5, 2023

young-developer commented Nov 3, 2023 •

edited

Loading

Ph0rk0z commented Nov 3, 2023 •

edited

Loading

Ph0rk0z commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading

cebtenzzre commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading

cebtenzzre commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading

cebtenzzre commented Nov 4, 2023 •

edited

Loading

young-developer commented Nov 4, 2023 •

edited

Loading