-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convert.py and add support for Metas official Llama 3 model #6819
Comments
I'm a newcomer to the project so can't comment about past design decisions. Before #6144, I think One note about conversion of the native Llama 3 tokenizer. It is indeed based on tiktoken, which should just be a fast BPE implementation. However, there are some quirks in the implementation that make it a bit tricky to export to pure BPE.
The first 2 points (extracting the full list of merges) should be easy once the transformers conversion script is published, which should happen soon. The third point (shortcut BPE for subwords in the vocabulary) should be explored. If these details are not in place, tokenization will result in slightly different sequences of token ids, especially for a few words in some languages. |
It already works as far as I can tell:
Add |
@pcuenca Thanks! I really appreciate your input and feedback. I'll check it out when I have some time. @ryao That's the Huggingface model created with the |
I confirm, I downloaded full model directly from Meta and I can't convert it with llama.cpp to gguf. I'm on Mac, but I guess it doesn't really matter. Tried convert and convert-hf-to-gguf, with and without vocab-type and pad-vocab, f16/f32, different python envs etc :)
Since you're not in a hurry, I guess I'll just get HF version. Probably will also be easier to use that later in mergekit (?). |
What makes the |
Use huggingface-cli to download it. |
Aren't the weights the same? |
The answer to that question is that it's nuanced. The issue stems from the file formats and tokenizers (aka vocabularies). The So the model weights themselves might be the same (I would expect some differences), the file formats are completely different; e.g. A pickled file is not the same as a safetensor file. It should be noted that the HuggingFace repository contains both formats, so I can understand the confusion. The issue I'm addressing is related to the models vocabularies as the /mnt/scsm/models/facebook/llama-3/Meta-Llama-3-8B-Instruct
├── checklist.chk # Checkpoint
├── consolidated.00.pth # Model
├── params.json # Hyperparameters
└── tokenizer.model # Plaintext BPE depends on tiktoken
1 directory, 4 files What the /mnt/valerie/models/meta-llama/Meta-Llama-3-8B-Instruct-HF
├── config.json # transformers hyperparamters
├── generation_config.json # Hyperparameters generation
├── LICENSE
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── model.safetensors.index.json
├── original # Ignored
│ ├── consolidated.00.pth
│ ├── params.json
│ └── tokenizer.model
├── README.md
├── special_tokens_map.json # Added tokens
├── tokenizer_config.json # Tokenizers generation
├── tokenizer.json # tokenizers vocabulary
└── USE_POLICY.md
2 directories, 16 files The vocabulary and language models are seperate entities in other libraries and frameworks while GGML unifies the vocabulary and language model into a single file with metadata relevant to the model file. The GGUF file format is standardized, so it's easier to follow and comprehend as a result. |
Confirming convert.py doesn't work for Llama3 70b-instruct direct from Meta (not HF), on macOS (14.4.1) with latest llama.cpp using --vocab-type=bpe:
edit: clarified I'm referring to model directly from Meta not HF |
Did you try the latest version, it works on my macOS(14.2.1) |
I did try the latest version, It does not work on my MacOS 14.4.1. It works fine, without any problems and very fast if you downloaded the weights from Huggingface. It does not work at all if you got it from Meta. |
Yep I used HF as well. I haven't tried the original from Meta. |
EDIT: everything works now, just had to reset the repo, had some old version...
this works for me |
Regarding the conversion of the original tokenizer to pure BPE, the transformers implementation is now available as a PR. If anyone decides to tackle this, keep in mind the tiktoken deviation from BPE that was mentioned in a previous comment , and that should be replicated if you want to ensure 100% compatibility. |
@pcuenca Yeah, that makes me glad I'm waiting. Patience usually wins out. I can implement it later on once I have some more time. I did the vocab factory for the original convert script, so I already understand how it works. Something to keep in mind about the huggingface conversion scripts is that they're not part of the standard hf api. They're compatibility based cli tools intended to convert external formats from outside sources to the desired huggingface format. So it's unwise to rely on them. This is why I explicitly recommended migrating any hf reliant code from Last time I tried to leverage those tools, it did not go as planned and they were limited as a result. I never got around to fixing it because I only have so much time and bandwidth. I already prototyped a few iterations, but I don't like any of them.
I still need time to think about how to go about this. Input and feedback is welcome. It's easier to iterate over ideas than it is to iterate over code. |
Given the importance of llama3 shouldn’t this be considered a serious bug, not an enhancement? |
A bug is when you have code and it isnt working as intended; A bug can also be something that is working as intended, but can be exploited in an unintended way. An enhancement can be the application of a pattern or some form of improvement or new addition to the code base. I suppose this could be both? 🤔😅 I would say it's more of a design flaw as well as a common misunderstanding. Everyone always tries to use the convert.py script to convert huggingface models and then theyre confused as to why it doesn't work. It doesnt work because it was originally designed to convert a raw torch model. I labeled it as an enhancement because the code base needs to be rethought due to the fact that using tiktoken became a wrench thrown into the pipeline. Technically, I could just include llama3 or tiktoken and call it a day, but I would prefer to not pile onto the technical debt. If momentum is preferred, then its probably better to just include the llama3 tokenizer directly and call it a day. This would be the path of least resistance. It would take me (I think) about a day, so probably (double the time I think it would take) 2 days to include tiktoken. A few hours if I used the llama3 tokenizer directly. This is a bandaide on something that needs more attention and is a growing issue as more and more models are added over time. The earlier its handled, the better off everyone in the future will be as a result. |
Any update on converting RAW meta models to HF?? |
You can use the conversion script that was merged yesterday to transformers @ |
This is going to get closed after 14 days of inactivity. I'm waiting on some PRs to get merged and I'll work on this when I have some more time. I'm spread to thin at the moment, unfortunately. I'll eventually open a PR once the major changes settle down because a bunch of breaking changes were introduced due to the Llama 3 tokenizer. I appreciate everyone that participated as it added value to this thread. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
Support the Official Llama 3 PyTorch model distributed by Meta.
Motivation
The
convert.py
supports converting the raw Llama 1 and 2torch
models distributed by Facebook Research Labs, but not the Llama 3 rawtorch
models. PR #6745 implemented the conversion process for Huggingface'stransformers
andtokenizers
framework implementations, but not the rawtorch
models themselves.There are issues conflicting with the current
convert.py
implementation due to feature creep based on the desire to support Huggingface's formats. These features are now blocking and interfering with the implementation for Llama 3.Possible Implementation
The Official Llama 3 is distributed with a plaintext BPE
tokenizer.model
file which utilizes the GPT-2 tokenizer format distributed by OpenAI. This means it requires the use oftiktoken
in order to convert the model to a compatible GGUF format.We would need to integrate this into the
BpeVocab
class which solely supports HuggingFace's tokenizers at the moment.We already have the implementation details given to us by Meta which released the official source code in their meta-llama org repo. See https://github.com/meta-llama/llama3 for more information.
The
Tokenizer
class implementation is already fleshed out, but needs to be refactored and integrated into theVocab
factory in a reasonable way. This is no small feat because it breaks the currently existing pattern and deviates from the previous releases as a result.We already have support for most of these models and vocabularies are far and few inbetween, but there's enough abstractions as well as implementations that the complexity is increasing over time.
Some ideas I'm currently considering are to follow a series of steps over time to reduce the complexity, maintaince, and extension of the
convert.py
script over time.This means removing any unnecessary and unrelated code from the
convert.py
script and migrating all HuggingFace source code to theconvert-hf-to-gguf.py
script. This is long term proposal that requires everyone to be on the same page in order to effectively and efficiently pull this off.I outlined my rationale in the link above referencing PR #6745. A potentially related issue in ISS #6690.
I'm open to any feedback and suggestions here. I'm in no rush to implement this and I believe its wise we don't rush to implement this as enough technical debt has piled up. It might be better to discuss this first and determine the best steps to take before progressing forward.
@cebtenzzre @ngxson @pcuenca
The text was updated successfully, but these errors were encountered: