-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Microsoft Phi-4 model #10817
Conversation
…4 model llama : use regular (not a sliding window) attention mask for Phi-4 model
src/llama.cpp
Outdated
@@ -12839,7 +12839,13 @@ struct llm_build_context { | |||
struct ggml_tensor * inp_pos = build_inp_pos(); | |||
|
|||
// KQ_mask (mask for 1 head, it will be broadcasted to all heads) | |||
struct ggml_tensor * KQ_mask_swa = build_inp_KQ_mask_swa(); | |||
struct ggml_tensor * KQ_mask = nullptr; | |||
if (model.name == "Phi 4") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better solution would be to check if hparams.n_swa != 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified my patch to explicitly store zero sliding_window in case it's null in config.json and use the zero value to distinguish Phi-4 from other PHI3-based models.
convert_hf_to_gguf.py
Outdated
if self.metadata.name == "Phi 4": | ||
return self._set_vocab_gpt2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, self._set_vocab_gpt2()
could be called when tokenizer.model
is missing here, regardless of the model name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the solution to check value of tokenizer_class from tokenizer_config.json and call self._set_vocab_gpt2()
if it's GPT2Tokenizer.
I tested with https://huggingface.co/JackCloudman/Phi-4-jackterated and it works |
I tried and failed using the latest master.
|
How does your free RAM look during conversion? Do you have enough RAM? The convert script requires RAM for at least the biggest tensor in memory. The biggest tensor is usually the token embeddings tensor, which is usually the first read and written. For Phi-4, the shape of that tensor is Since the model files for Phi-4 are in That is, assuming memory mapping works correctly on Windows (hopefully it does, I don't know). If it doesn't, then you would need at least 64GB of RAM. Also, if you do have enough RAM, make sure your Python interpreter is a 64-bit build. |
My Rx6900XT usually does fine converting these things. I also have 32gb system ram. I tried converting the jackerated model and it gets up to 19gb\29gb before it hangs, plenty of space on NVME. I can run the gguf provided by matteogeniaccio. I just prefer converting them myself. |
* convert : use GPT2 vocab for Phi-4 model * convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models * llama : do not use sliding window attention mask for Phi-4 model --------- Co-authored-by: Stanisław Szymczyk <[email protected]>
This PR adds support for Microsoft Phi-4 model. Fixes #10814.
Current solution is to:
sliding_window
hparam if it's null. This allows the old Phi-3n_swa
validation logic to work without any changes. Ifn_swa
is 0 a regular KQ mask is used instead of sliding window KQ mask inbuild_phi3()
.A model name value from general.name ("Phi 4") was used to trigger behavior specific to Phi-4 model:1. Using GPT2 vocab during model conversion2. Ignoringsliding_window
hparam during model conversion3. Skipping sliding window length value check (n_swa == 0
) inbuild_phi3()
4. Creating regular KQ mask instead of sliding window KQ mask inbuild_phi3()
Let me know if there is any better way to differentiate Phi 4 from other models based on PHI3 architecture.