ggml : add support for dynamic loading of backends #10469

slaren · 2024-11-23T23:08:37Z

Adds support for loading backends dynamically at load time, without needing to link to them in the build.

Building the backends as dynamic libraries can be enabled by building with cmake with GGML_BACKEND_DL enabled
Adds the function ggml_backend_load(const char * path) to load a backend dynamically
Adds the convenience function ggml_backend_load_all(void) to load all the known backends
Adds the function ggml_backend_unload(ggml_backend_reg_t reg) to unregister and unload a backend
Adds the optional function ggml_backend_get_features to obtain a list of flags of a backend. This replaces the calls to the ggml_cpu_has_xx functions from the CPU backend in llama.cpp
In addition to the CPU backend, the CUDA backend also implements ggml_backend_get_features, which returns the list of archs included in the build and the build flags used such as GGML_CUDA_FORCE_MMQ. Other backends should also implement this function to report compile-time flags and features.

TODO

Version checking to avoid loading incompatible backends
Fix ggml_backend_load_all search paths

ggml-ci

ggml/src/ggml-backend-impl.h

Co-authored-by: Georgi Gerganov <[email protected]>

use MODULE target type for dl backend set backend output directory to the runtime directory ggml_backend_load_all searches backends in the system path first, then in the executable directory ggml-ci

…e executable directory

slaren · 2024-11-25T02:40:21Z

Makefile

@@ -251,7 +251,7 @@ endif
 #

 # keep standard at C11 and C++11
-MK_CPPFLAGS  = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon
+MK_CPPFLAGS  = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -DGGML_USE_CPU


GGML_USE_CPU now needs to be defined to use the CPU backend with the backend registry. This is necessary because the CPU backend now may be loaded dynamically, so it cannot be assumed that it is linked in the build. This may break other build scripts.

In Linux, it may also be necessary to link to dl for dlopen.

May want to mention this change in: #9289

I spent a few hours scratching my head on why I had no devices.

On the side, when no devices are loaded, this causes a segfault due to cpu_dev being a nullptr:

llama.cpp/src/llama.cpp

Lines 7291 to 7292 in c9b00a7

auto * cpu_dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);

auto * cpu_reg = ggml_backend_dev_backend_reg(cpu_dev);

We probably should assert or something here, or perhaps anywhere when 0 devices are present. Let the user know something is wrong.

MaggotHATE · 2024-12-06T19:34:41Z

Is ggml_backend_load_all() supposed to be called in static builds too? If I don't use it, there's a noticeable reduction in quality of generated answers. When used, it tries to load any backend .dll it can find, which probably shouldn't happen on a static build (especially with a backend - OPENBLAS, for example). Am I doing something wrong?

slaren · 2024-12-12T19:52:07Z

It is ok to call ggml_backend_load_all even on static builds since it allows loading external backends. If it cannot find dynamically loadable backends present in the search paths, it won't do anything. The reduction in quality that you are observing is not likely to be caused by this.

MaggotHATE · 2024-12-12T19:59:23Z

it allows loading external backends.

It does, but allocates memory again, essentially duplicating total memory usage. I suppose it's a mistake on my end? It shouldn't behave like that on a combination of a static build with a dynamic backend?

slaren · 2024-12-12T20:01:02Z

It could happen if you have a static backend and the same backend as a dynamic backend, but that does not happen normally, because backends build without GGML_BACKEND_DL enabled cannot be loaded dynamically.

* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml : add support for dynamic loading of backends

d5a3beb

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs examples ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 23, 2024

slaren added 2 commits November 24, 2024 00:59

add ggml_backend_unload

ccd8df8

link to libdl on linux

1605605

slaren force-pushed the sl/dl-backend-2 branch from a55c1e9 to 1605605 Compare November 24, 2024 00:09

fixes

808d434

slaren force-pushed the sl/dl-backend-2 branch from 8af20e3 to 808d434 Compare November 24, 2024 01:05

metal : export ggml_backend_get_features()

ad1e27a

ggml-ci

ggerganov approved these changes Nov 24, 2024

View reviewed changes

ggml/src/ggml-backend-impl.h Outdated Show resolved Hide resolved

slaren and others added 3 commits November 24, 2024 19:12

Update ggml/src/ggml-backend-impl.h

402a0e9

Co-authored-by: Georgi Gerganov <[email protected]>

refactor cmake build

bd9f7b4

use MODULE target type for dl backend set backend output directory to the runtime directory ggml_backend_load_all searches backends in the system path first, then in the executable directory ggml-ci

add version checking

53d7f4f

slaren force-pushed the sl/dl-backend-2 branch from 5c04fb1 to 53d7f4f Compare November 24, 2024 22:54

slaren added 2 commits November 25, 2024 03:29

suppress error dialogs on windows

ae99c8f

add cpu backend to the swift build

6d19135

slaren force-pushed the sl/dl-backend-2 branch from ad04995 to 6d19135 Compare November 25, 2024 02:35

remove eval-callback test hack since the backend loader now checks th…

b81e5ca

…e executable directory

slaren commented Nov 25, 2024

View reviewed changes

slaren merged commit 5931c1f into master Nov 25, 2024
55 checks passed

slaren deleted the sl/dl-backend-2 branch November 25, 2024 14:13

slaren mentioned this pull request Nov 25, 2024

Introduce llama-run #10291

Merged

4 tasks

Animaxx mentioned this pull request Nov 29, 2024

Eval bug: fail to load model in MacOS #10585

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

ggml : add support for dynamic loading of backends (ggerganov#10469)

fa4365c

* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add support for dynamic loading of backends #10469

ggml : add support for dynamic loading of backends #10469

slaren commented Nov 23, 2024 •

edited

Loading

slaren Nov 25, 2024

Vali-98 Nov 27, 2024 •

edited

Loading

MaggotHATE commented Dec 6, 2024

slaren commented Dec 12, 2024 •

edited

Loading

MaggotHATE commented Dec 12, 2024

slaren commented Dec 12, 2024 •

edited

Loading

	auto * cpu_dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
	auto * cpu_reg = ggml_backend_dev_backend_reg(cpu_dev);

ggml : add support for dynamic loading of backends #10469

ggml : add support for dynamic loading of backends #10469

Conversation

slaren commented Nov 23, 2024 • edited Loading

TODO

slaren Nov 25, 2024

Choose a reason for hiding this comment

Vali-98 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

MaggotHATE commented Dec 6, 2024

slaren commented Dec 12, 2024 • edited Loading

MaggotHATE commented Dec 12, 2024

slaren commented Dec 12, 2024 • edited Loading

slaren commented Nov 23, 2024 •

edited

Loading

Vali-98 Nov 27, 2024 •

edited

Loading

slaren commented Dec 12, 2024 •

edited

Loading

slaren commented Dec 12, 2024 •

edited

Loading