-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate support for Phi-1, Phi-1.5, and Phi-2 models #4552
Conversation
- Created the `initialize_writer` function to set up GGUF writer with model metadata - Included validation for file type and architecture - Default hyperparameter values sourced from MixFormerSequentialConfig - Function annotations and documentation added for clarity - Prepared groundwork for MixFormer architecture integration
Signed-off-by: teleprint-me <[email protected]>
Signed-off-by: teleprint-me <[email protected]>
Signed-off-by: teleprint-me <[email protected]>
Signed-off-by: teleprint-me <[email protected]>
- Replaced LLM_ARCH_PHI2 with LLM_ARCH_PHI to unify the handling of different Phi model variants (Phi-1, Phi-1.5, Phi-2). - Updated architecture names map to reflect the consolidated architecture name from "phi2" to "phi". - Adjusted the tensor names mapping to use the new architecture name "phi" for consistent tensor loading and processing. - Modified hyperparameter loading to include a case for 24 layers under LLM_ARCH_PHI, classifying it as MODEL_1B. This change accommodates different layer counts for various Phi model variants. - Updated tensor loading sections to use the new architecture enum, ensuring proper tensor creation based on the model architecture. - Renamed build_phi2() to build_phi() in the graph building section, aligning with the new architecture name and ensuring correct computational graph construction for Phi models. - Adjusted graph construction calls to use the renamed build_phi() function, ensuring seamless integration and functionality for different Phi model variants. These changes aim to streamline the handling of various Phi models within `llama.cpp`, enhancing the application's capability to work effectively with these models while maintaining code clarity and consistency.
Signed-off-by: teleprint-me <[email protected]>
Signed-off-by: teleprint-me <[email protected]>
@slaren @ebeyabraham Do you know why the warning |
I think it means that some of the tokens cannot be tokenized, but are not tagged as special. It's probably not a big deal, but @staviq may know more about this. |
This change seems would break existing models due to "phi2" -> "phi". Is it worth it? |
Yes, it's true that this change would break existing conversions and quants. However, I'd like to highlight why I believe it's a valuable modification. All three models - Phi-1, Phi-1.5, and Phi-2 - share identical architectures, differing primarily in the number of layers they possess. While these architectural differences may seem subtle, they offer significant advantages. Since all three architectures are identical, any other models created in the future using the By accommodating Phi-1, Phi-1.5, and Phi-2, we establish a unified implementation that can seamlessly adapt to both future Microsoft and By adding support for Phi-1, Phi-1.5, and Phi-2 enhances llama.cpp's usability, accessibility, and adaptability. It's a worthwhile enhancement that promotes diversity in hardware usage and fosters innovation in AI research. This change not only benefits current users but also sets a foundation for accommodating potential future models with greater ease. It's a valuable addition to llama.cpp's capabilities. |
llama.cpp
Outdated
// backwards compatibility with pre-#4552 | ||
// TODO: remove after Mar 2024 | ||
if (arch_name == "phi2") { | ||
arch_name = "phi"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teleprint-me Could you give this a test and see if it solves backwards compatibility with "phi2"?
I'm a bit worried if we don't handle the old name we'll get lot's of complaints and issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nvm, it's actually not going to work because there are other parameters like phi2.context_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Yeah, that's why I broke it. I wish I had caught this sooner, but I've been preoccupied with a bunch of other stuff and I'm multitasking the best I can.
I actually started working on the conversion scripts too and I still have a bunch of other stuff, but this seemed like it needed attention sooner than later.
If you have a better idea, I'm open to it. I went with this because it seemed like the most pragmatic approach. I prefer simplicity and if making a simple choice will break something, then that's what I'll go with.
44ac215
to
b0583f7
Compare
I've given some more thought on this and prefer not to merge the change. It's more likely to causes issues with broken support for existing models that anything, so I think it is not worth it. Thanks for the effort though |
@ggerganov I think there's another way to do this without breaking the existing models. I can adapt the code accordingly. That is, if you're open to it? |
I think this is quite important. Phi 1.5 and Phi 2 are the best small models and right now it's not simple to convert these to gguf. Phi 1.5 is quite useful on embedded systems because it strikes a good balance between quality and performance on a small 4 core cpu. @teleprint-me are you considering another way to make these changes? |
@walter-cavinaw It was accepted in #4847. Use |
Overview
This Pull Request introduces changes to
llama.cpp
for unified handling of different Phi model variants (Phi-1, Phi-1.5, Phi-2). The modifications aim to simplify the architecture handling, tensor mapping, and computational graph construction for these models.Changes
LLM_ARCH_PHI2
withLLM_ARCH_PHI
across the codebase to create a singular reference for all Phi models.MODEL_1B
. This addition caters to the different layer counts found in Phi model variants.build_phi2()
function tobuild_phi()
, aligning it with the unified architecture name and ensuring appropriate computational graph construction for all Phi models.build_phi()
function, maintaining functionality and integration across different Phi model variants.Impact
These changes enhance
llama.cpp
's flexibility and adaptability in working with various Phi models. By consolidating the handling of these models under a single architecture enumeration and updating relevant sections of the code, we improve the maintainability and clarity of the codebase. This unified approach also facilitates future extensions or modifications related to Phi models.Testing
The changes have been tested with Phi-1 and Phi-1.5 models, successfully converting and running inference. The results indicate that the unified handling approach is effective and does not introduce any regressions in the functionality for these models.
Looking forward to your feedback and suggestions on these changes.