Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : refactor src/llama.cpp #10902

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

llama : refactor src/llama.cpp #10902

wants to merge 24 commits into from

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Dec 19, 2024

Attempting to split the src/llama.cpp into a few separate modules. Very work-in-progress, mainly opening this PR for people to keep track and suggest improvements as we move along. This part does not involve functional changes, just code reorganization and decoupling to make it easier to work with the codebase. The batch and KV cache abstractions and reimplementations will be done in follow-up PRs.

graph TD;
chat;
model_loader;
model   --> arch[<b>arch </b>];
model   --> hparams[<b>hparams </b>];
model   ----> mmap[<b>mmap </b> <br><br> llama_file <br> llama_mmap <br> llama_mlock];
model   -.-> model_loader;
model   --> vocab;
vocab   --> unicode;
adapter -.-> model;
kv_cache -.-> batch;
kv_cache -.-> cparams;
kv_cache -.-> model;
context --> adapter[<b>adapter</b> <br><br> llama_adapter_cvec <br> llama_adapter_lora];
context -.-> batch;
context --> cparams;
context --> kv_cache;
context --> model;

style adapter fill:green
style arch fill:green
style batch fill:green
style chat fill:green
style cparams fill:green
style hparams fill:green
style kv_cache fill:green
style mmap fill:green
style model fill:green
style model_loader fill:green
style unicode fill:green
style vocab fill:green
Loading

TODO

  • move the llama_mmaps and llama_mlocks from llama_model to llama_context? (no)
  • change _internal suffix to _impl (next PR)
  • add llama_tensor_loader ?
  • model loading
  • quantization

Conflicts

@ngxson
Copy link
Collaborator

ngxson commented Dec 19, 2024

I think control_vector and lora related stuff should be re-grouped into a module, maybe called adapters (if someone has a better naming, feel free to comment). That's because they work kinda the same way, by "adding things" on top of the original cgraph.

@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 8 times, most recently from 524886b to 7ab08d5 Compare December 22, 2024 16:24
@github-actions github-actions bot added examples devops improvements to build systems and github actions labels Dec 22, 2024
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from be8f568 to dcbfda1 Compare December 22, 2024 20:30
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 7 times, most recently from ba48e37 to 0ccae21 Compare December 23, 2024 17:22
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch from 1e7e338 to 597ae05 Compare January 2, 2025 10:39
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch from 1521f9e to c16630e Compare January 2, 2025 15:29
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from 391a111 to 089cf4a Compare January 2, 2025 19:37
ggml-ci
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch from 089cf4a to e06d267 Compare January 2, 2025 19:40
@ggerganov ggerganov marked this pull request as ready for review January 2, 2025 20:02
@ggerganov ggerganov requested a review from ngxson as a code owner January 2, 2025 20:02
@ggerganov
Copy link
Owner Author

I think this is a good place to merge this change. The project builds faster now and hopefully the code is organized a bit better. Will continue refactoring in follow-up PRs and any suggestions and recommendations are welcome. I've left some TODOs around the code and will try to address those next. After that will be looking for ways to separate the KV cache from the llama_context and enable support for multiple KV cache implementations.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks for taking time during the holidays to finish this. Happy new year btw 🎉

@@ -24,13 +24,12 @@

#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"

// TODO: "lora_adapter" is tautology
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure what do you mean by this. I think "lora_adapter" is not tautology because there can be multiple types of adapter, and there can also be "lora_a", "lora_b", "lora_scale"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that "lora" already implies "adapter", since it means comes from "LOw-Rank Adapter". So it seems to me that common_lora_adapter_info should be simply called common_lora_info.

Copy link
Collaborator

@ngxson ngxson Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm no, the "A" means "adaptation", not "adapter". Quoting from this article:

LoRA, which stands for “Low-Rank Adaptation”, distinguishes itself by training and storing the additional weight changes in a matrix while freezing all the pre-trained model weights. LoRA is not called an “adapter” because it does not add adapters. Instead, it is referred to as “adaptation” to describe the process of fine-tuning the domain data and tasks.

Funny enough, I've just found out that "adapter" is technically a different technique than LoRA, firstly introduced in this paper. But the way they work are quite similar, adding nodes to the existing cgraph. So, I guess the term "adapter" is being used correctly in our context in llama.cpp, since both LoRA and cvector are just additions on top of model's cgraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants