-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : add support for dynamic loading of backends #10469
Conversation
a55c1e9
to
1605605
Compare
8af20e3
to
808d434
Compare
Co-authored-by: Georgi Gerganov <[email protected]>
use MODULE target type for dl backend set backend output directory to the runtime directory ggml_backend_load_all searches backends in the system path first, then in the executable directory ggml-ci
5c04fb1
to
53d7f4f
Compare
ad04995
to
6d19135
Compare
…e executable directory
@@ -251,7 +251,7 @@ endif | |||
# | |||
|
|||
# keep standard at C11 and C++11 | |||
MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon | |||
MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -DGGML_USE_CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_USE_CPU
now needs to be defined to use the CPU backend with the backend registry. This is necessary because the CPU backend now may be loaded dynamically, so it cannot be assumed that it is linked in the build. This may break other build scripts.
In Linux, it may also be necessary to link to dl
for dlopen
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to mention this change in: #9289
I spent a few hours scratching my head on why I had no devices.
On the side, when no devices are loaded, this causes a segfault due to cpu_dev
being a nullptr:
Lines 7291 to 7292 in c9b00a7
auto * cpu_dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU); | |
auto * cpu_reg = ggml_backend_dev_backend_reg(cpu_dev); |
We probably should assert or something here, or perhaps anywhere when 0 devices are present. Let the user know something is wrong.
Is |
It is ok to call |
It does, but allocates memory again, essentially duplicating total memory usage. I suppose it's a mistake on my end? It shouldn't behave like that on a combination of a static build with a dynamic backend? |
It could happen if you have a static backend and the same backend as a dynamic backend, but that does not happen normally, because backends build without |
* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <[email protected]>
Adds support for loading backends dynamically at load time, without needing to link to them in the build.
GGML_BACKEND_DL
enabledggml_backend_load(const char * path)
to load a backend dynamicallyggml_backend_load_all(void)
to load all the known backendsggml_backend_unload(ggml_backend_reg_t reg)
to unregister and unload a backendggml_backend_get_features
to obtain a list of flags of a backend. This replaces the calls to theggml_cpu_has_xx
functions from the CPU backend in llama.cppggml_backend_get_features
, which returns the list of archs included in the build and the build flags used such asGGML_CUDA_FORCE_MMQ
. Other backends should also implement this function to report compile-time flags and features.TODO
ggml_backend_load_all
search paths