Add support for Microsoft Phi-4 model #10817

fairydreaming · 2024-12-13T19:03:22Z

This PR adds support for Microsoft Phi-4 model. Fixes #10814.

Current solution is to:

Use tokenizer_class value from tokenizer_config.json as a condition to use GPT2 vocab during model conversion.
Store explicit 0 value of sliding_window hparam if it's null. This allows the old Phi-3 n_swa validation logic to work without any changes. If n_swa is 0 a regular KQ mask is used instead of sliding window KQ mask in build_phi3().

~~A model name value from general.name ("Phi 4") was used to trigger behavior specific to Phi-4 model:~~

~~1. Using GPT2 vocab during model conversion~~
~~2. Ignoring sliding_window hparam during model conversion~~
~~3. Skipping sliding window length value check (n_swa == 0) in build_phi3()~~
~~4. Creating regular KQ mask instead of sliding window KQ mask in build_phi3()~~

~~Let me know if there is any better way to differentiate Phi 4 from other models based on PHI3 architecture.~~

…4 model llama : use regular (not a sliding window) attention mask for Phi-4 model

slaren · 2024-12-14T02:30:35Z

src/llama.cpp

@@ -12839,7 +12839,13 @@ struct llm_build_context {
        struct ggml_tensor * inp_pos = build_inp_pos();

        // KQ_mask (mask for 1 head, it will be broadcasted to all heads)
-        struct ggml_tensor * KQ_mask_swa = build_inp_KQ_mask_swa();
+        struct ggml_tensor * KQ_mask = nullptr;
+        if (model.name == "Phi 4") {


I think a better solution would be to check if hparams.n_swa != 0.

I modified my patch to explicitly store zero sliding_window in case it's null in config.json and use the zero value to distinguish Phi-4 from other PHI3-based models.

convert_hf_to_gguf.py

compilade · 2024-12-14T04:43:34Z

convert_hf_to_gguf.py

+        if self.metadata.name == "Phi 4":
+            return self._set_vocab_gpt2()


Alternatively, self._set_vocab_gpt2() could be called when tokenizer.model is missing here, regardless of the model name.

I modified the solution to check value of tokenizer_class from tokenizer_config.json and call self._set_vocab_gpt2() if it's GPT2Tokenizer.

…om other PHI3 models

…models

JackCloudman · 2024-12-17T21:24:03Z

I tested with https://huggingface.co/JackCloudman/Phi-4-jackterated and it works

3Simplex · 2024-12-19T22:06:20Z

I tried and failed using the latest master.

INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:C:\Users\3simplex\Llama.Cpp-Toolbox\Converted\Microsoft_Phi-4-f16.gguf: n_tensors = 243, total_size = 29.3G
Writing:   0%|                                                                          | 0.00/29.3G [00:00<?, ?byte/s]Traceback (most recent call last):
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\convert_hf_to_gguf.py", line 4682, in <module>
    main()
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\convert_hf_to_gguf.py", line 4676, in main
    model_instance.write()
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\convert_hf_to_gguf.py", line 442, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\gguf_writer.py", line 453, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\lazy.py", line 210, in tofile
    eager = LazyNumpyTensor.to_eager(self)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\lazy.py", line 169, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
           ^^^^^
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\lazy.py", line 160, in simple_to_eager
    _t._data = _t._func(*_t._args, **_t._kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\3simplex\Llama.Cpp-Toolbox\llama.cpp\gguf-py\gguf\lazy.py", line 207, in <lambda>
    return type(self)(meta=meta, args=full_args, kwargs=kwargs, func=(lambda a, *args, **kwargs: a.astype(*args, **kwargs)))
                                                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 980. MiB for an array with shape (100352, 5120) and data type float16

compilade · 2024-12-19T22:44:16Z

@3Simplex

How does your free RAM look during conversion? Do you have enough RAM?
Do you have enough disk space too?

The convert script requires RAM for at least the biggest tensor in memory. The biggest tensor is usually the token embeddings tensor, which is usually the first read and written. For Phi-4, the shape of that tensor is (100352, 5120), around 514M elements.

Since the model files for Phi-4 are in BF16, and that Numpy doesn't support that type, the tensors are losslessly converted to F32 before being converted to F16 (because that's the target type in your case). This means at least 4GB of free RAM is required to convert that 29.3GB model.

That is, assuming memory mapping works correctly on Windows (hopefully it does, I don't know). If it doesn't, then you would need at least 64GB of RAM.

Also, if you do have enough RAM, make sure your Python interpreter is a 64-bit build.

3Simplex · 2024-12-19T23:15:29Z

@3Simplex

How does your free RAM look during conversion? Do you have enough RAM? Do you have enough disk space too?

The convert script requires RAM for at least the biggest tensor in memory. The biggest tensor is usually the token embeddings tensor, which is usually the first read and written. For Phi-4, the shape of that tensor is (100352, 5120), around 514M elements.

Since the model files for Phi-4 are in BF16, and that Numpy doesn't support that type, the tensors are losslessly converted to F32 before being converted to F16 (because that's the target type in your case). This means at least 4GB of free RAM is required to convert that 29.3GB model.

That is, assuming memory mapping works correctly on Windows (hopefully it does, I don't know). If it doesn't, then you would need at least 64GB of RAM.

Also, if you do have enough RAM, make sure your Python interpreter is a 64-bit build.

My Rx6900XT usually does fine converting these things. I also have 32gb system ram.
My python is 3.11.9 (64bit) and I use pyenv-win to manage them.
I recently updated transformers to 4.47.1 if that matters.
numpy 1.26.4

I tried converting the jackerated model and it gets up to 19gb\29gb before it hangs, plenty of space on NVME.

I can run the gguf provided by matteogeniaccio. I just prefer converting them myself.

* convert : use GPT2 vocab for Phi-4 model * convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models * llama : do not use sliding window attention mask for Phi-4 model --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

convert-hf : use GPT2 vocab and ignore sliding_window hparam for Phi-…

7555ab1

…4 model llama : use regular (not a sliding window) attention mask for Phi-4 model

github-actions bot added the python python script changes label Dec 13, 2024

slaren reviewed Dec 14, 2024

View reviewed changes

compilade reviewed Dec 14, 2024

View reviewed changes

sszymczy added 3 commits December 14, 2024 11:28

convert-hf : do not use model name to distinguish Phi-4 from Phi-3

520e8a0

convert-hf : use zero value of sliding_window to distinguish Phi-4 fr…

c7fdbd3

…om other PHI3 models

llama : use zero value of n_swa to distinguish Phi-4 from other PHI3 …

046c0d7

…models

JackCloudman approved these changes Dec 17, 2024

View reviewed changes

slaren approved these changes Dec 19, 2024

View reviewed changes

fairydreaming merged commit 7585edb into ggerganov:master Dec 19, 2024
51 checks passed

rick-github mentioned this pull request Jan 8, 2025

Phi-4 support ollama/ollama#8347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Microsoft Phi-4 model #10817

Add support for Microsoft Phi-4 model #10817

fairydreaming commented Dec 13, 2024 •

edited

Loading

slaren Dec 14, 2024

fairydreaming Dec 14, 2024

compilade Dec 14, 2024

fairydreaming Dec 14, 2024

JackCloudman commented Dec 17, 2024

3Simplex commented Dec 19, 2024

compilade commented Dec 19, 2024

3Simplex commented Dec 19, 2024

		if self.metadata.name == "Phi 4":
		return self._set_vocab_gpt2()

Add support for Microsoft Phi-4 model #10817

Add support for Microsoft Phi-4 model #10817

Conversation

fairydreaming commented Dec 13, 2024 • edited Loading

slaren Dec 14, 2024

Choose a reason for hiding this comment

fairydreaming Dec 14, 2024

Choose a reason for hiding this comment

compilade Dec 14, 2024

Choose a reason for hiding this comment

fairydreaming Dec 14, 2024

Choose a reason for hiding this comment

JackCloudman commented Dec 17, 2024

3Simplex commented Dec 19, 2024

compilade commented Dec 19, 2024

3Simplex commented Dec 19, 2024

fairydreaming commented Dec 13, 2024 •

edited

Loading