-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: Add support for PhiMoE arch #11003
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I am not particularly good at coding, but I can try running your gguf and check, if I notice something. No time today, but tomorrow I can do so. |
Thanks, no hurry as the model is quite old and phi4 has been released already. Will see if it gains enthousiasm, I am having a look to the Vision model in //. |
Co-authored-by: ThiloteE <[email protected]>
Co-authored-by: ThiloteE <[email protected]>
The Q4_0 with 4096 context does not fit into 32GB of RAM on Windows 10. Output is reasonable. Sometimes I have seen typos (e.g. instead of Successful run with 32768 allocated tokens for context (Prompt was 16883 tokens)
|
PhiMoE
Overview
Phi-3.5-MoE is a lightweight, open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data.
The model supports multilingual and comes with 128K context length (in tokens).
The PhiMoE model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.
Mixtral
with the main difference of [Phi3LongRoPEScaledRotaryEmbedding
], where they are used to extend the context of the rotary embeddings. The query, key and values are fused, and the MLP's up and gate projection layers are also fused.LlamaTokenizer
], with additional tokens.License
MIT
Implementation details
The convert script reuses the
Phi3MiniModel
class as parameter names and long rope scaling logic is the same.The MOE branch is included in the phi3 model graph implementation with missing bias tensors.
It would be possible to merge phi3 and phimoe into a single arch, but I kept the spirit of separated moe arch as in granite recently. Also, since Microsoft introduced a dedicated architecture, it can evolve independently in the future.
Testing
full output
Check that phi3 is still working
full output
Links