llama : refactor `src/llama.cpp` #10902

ggerganov · 2024-12-19T14:54:36Z

Attempting to split the src/llama.cpp into a few separate modules. Very work-in-progress, mainly opening this PR for people to keep track and suggest improvements as we move along. This part does not involve functional changes, just code reorganization and decoupling to make it easier to work with the codebase. The batch and KV cache abstractions and reimplementations will be done in follow-up PRs.

graph TD;
chat;
model_loader;
model   --> arch[<b>arch </b>];
model   --> hparams[<b>hparams </b>];
model   ----> mmap[<b>mmap </b> <br><br> llama_file <br> llama_mmap <br> llama_mlock];
model   -.-> model_loader;
model   --> vocab;
vocab   --> unicode;
adapter -.-> model;
kv_cache -.-> batch;
kv_cache -.-> cparams;
kv_cache -.-> model;
context --> adapter[<b>adapter</b> <br><br> llama_adapter_cvec <br> llama_adapter_lora];
context -.-> batch;
context --> cparams;
context --> kv_cache;
context --> model;

style adapter fill:green
style arch fill:green
style batch fill:green
style chat fill:green
style cparams fill:green
style hparams fill:green
style kv_cache fill:green
style mmap fill:green
style model fill:green
style model_loader fill:green
style unicode fill:green
style vocab fill:green

TODO

~~move the llama_mmaps and llama_mlocks from llama_model to llama_context?~~ (no)
change _internal suffix to _impl (next PR)
add llama_tensor_loader ?
model loading
quantization

Conflicts

llama : add support for Cohere2ForCausalLM #10900

ngxson · 2024-12-19T16:04:32Z

I think control_vector and lora related stuff should be re-grouped into a module, maybe called adapters (if someone has a better naming, feel free to comment). That's because they work kinda the same way, by "adding things" on top of the original cgraph.

ggml-ci

ggerganov · 2025-01-02T20:11:20Z

I think this is a good place to merge this change. The project builds faster now and hopefully the code is organized a bit better. Will continue refactoring in follow-up PRs and any suggestions and recommendations are welcome. I've left some TODOs around the code and will try to address those next. After that will be looking for ways to separate the KV cache from the llama_context and enable support for multiple KV cache implementations.

ngxson

LGTM overall, thanks for taking time during the holidays to finish this. Happy new year btw 🎉

ngxson · 2025-01-02T20:18:36Z

common/common.h

@@ -24,13 +24,12 @@

 #define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"

+// TODO: "lora_adapter" is tautology


Note sure what do you mean by this. I think "lora_adapter" is not tautology because there can be multiple types of adapter, and there can also be "lora_a", "lora_b", "lora_scale"

I thought that "lora" already implies "adapter", since it means comes from "LOw-Rank Adapter". So it seems to me that common_lora_adapter_info should be simply called common_lora_info.

Hmm no, the "A" means "adaptation", not "adapter". Quoting from this article:

LoRA, which stands for “Low-Rank Adaptation”, distinguishes itself by training and storing the additional weight changes in a matrix while freezing all the pre-trained model weights. LoRA is not called an “adapter” because it does not add adapters. Instead, it is referred to as “adaptation” to describe the process of fine-tuning the domain data and tasks.

Funny enough, I've just found out that "adapter" is technically a different technique than LoRA, firstly introduced in this paper. But the way they work are quite similar, adding nodes to the existing cgraph. So, I guess the term "adapter" is being used correctly in our context in llama.cpp, since both LoRA and cvector are just additions on top of model's cgraph.

ggerganov force-pushed the gg/llama-refactor-0 branch 8 times, most recently from 524886b to 7ab08d5 Compare December 22, 2024 16:24

github-actions bot added examples devops improvements to build systems and github actions labels Dec 22, 2024

ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from be8f568 to dcbfda1 Compare December 22, 2024 20:30

github-actions bot added the server label Dec 22, 2024

ggerganov force-pushed the gg/llama-refactor-0 branch 7 times, most recently from ba48e37 to 0ccae21 Compare December 23, 2024 17:22

lexasub mentioned this pull request Dec 30, 2024

llama/ggml: add LLM training support #10544

Open

ggerganov force-pushed the gg/llama-refactor-0 branch from 1e7e338 to 597ae05 Compare January 2, 2025 10:39

ggerganov added 7 commits January 2, 2025 16:55

llama : scatter llama.cpp into multiple modules (wip)

498b68f

llama : control-vector -> adapter

844660b

llama : arch

cf899ea

llama : mmap

6b24e6e

ggml-ci

ci : remove BUILD_SHARED_LIBS=OFF

e9c9209

ggml-ci

llama : arch (cont)

6c22ce1

ggml-ci

llama : chat

a2dc93e

ggml-ci

ggerganov added 14 commits January 2, 2025 16:55

llama : model

7a3065f

ggml-ci

llama : hparams

a25ff12

ggml-ci

llama : adapter

30e0c88

ggml-ci

examples : fix

2ebe8fe

ggml-ci

rebase

2a3aa05

ggml-ci

minor

55791c1

llama : kv cache

8ab668e

ggml-ci

llama : impl

5f79493

ggml-ci

llama : batch

add3bfe

ggml-ci

cont

5bf9dc5

ggml-ci

llama : context

007064f

ggml-ci

minor

4b39d70

llama : context (cont)

736e692

ggml-ci

llama : model loader

8d117a5

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from 1521f9e to c16630e Compare January 2, 2025 15:29

common : update lora

272cd0e

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from 391a111 to 089cf4a Compare January 2, 2025 19:37

llama : quant

e06d267

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from 089cf4a to e06d267 Compare January 2, 2025 19:40

llama : quant (cont)

69dd1e8

ggml-ci

ggerganov marked this pull request as ready for review January 2, 2025 20:02

ggerganov requested a review from ngxson as a code owner January 2, 2025 20:02

ngxson approved these changes Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : refactor `src/llama.cpp` #10902

llama : refactor `src/llama.cpp` #10902

ggerganov commented Dec 19, 2024 •

edited

Loading

ngxson commented Dec 19, 2024 •

edited

Loading

ggerganov commented Jan 2, 2025

ngxson left a comment

ngxson Jan 2, 2025

ggerganov Jan 2, 2025

ngxson Jan 2, 2025 •

edited

Loading

		@@ -24,13 +24,12 @@

		#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"

		// TODO: "lora_adapter" is tautology

llama : refactor src/llama.cpp #10902

Are you sure you want to change the base?

llama : refactor src/llama.cpp #10902

Conversation

ggerganov commented Dec 19, 2024 • edited Loading

TODO

Conflicts

ngxson commented Dec 19, 2024 • edited Loading

ggerganov commented Jan 2, 2025

ngxson left a comment

Choose a reason for hiding this comment

ngxson Jan 2, 2025

Choose a reason for hiding this comment

ggerganov Jan 2, 2025

Choose a reason for hiding this comment

ngxson Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

llama : refactor `src/llama.cpp` #10902

llama : refactor `src/llama.cpp` #10902

ggerganov commented Dec 19, 2024 •

edited

Loading

ngxson commented Dec 19, 2024 •

edited

Loading

ngxson Jan 2, 2025 •

edited

Loading