-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to convert Llama-v2 models #4493
Comments
Same here @HighTemplar-wjiang. I found an alternative by reverting to commit f4d973c and running the README instructions. It did not solve the problema but can give us some tips about what happened. I attach an image below of the quantized model being tested |
I am running into this same issues, Can you post if you find a resolution? |
As a temporary fix you can simply checkout an older commit, such as:
This is the latest commit before this bug appears. |
the same error, it worked before |
SPM and BPE vocabularies were removed, so if you're using a non-huggingface model, you'll get this error. If you use facebooks hf model, it will work.
This is simply due to the way the vocabularies are handled. I'll be looking into it over the next few days to see if I can fix the regression. |
@teleprint-me Thanks for looking into this. Pinging also @strutive07 for any extra insight on this |
Here is how I have gone through it in the last two weeks; basically you can try to convert your model to the huggingface format to get the config.json file.
One trick is you don't need to install |
python convert.py ./models/34B/ |
Between work, holidays, and helping out my cousins, I got sick over the holidays, so that's why I went MIA. Spent the last few days fighting off the fever. Starting to feel better now. Wondering if any progress was made, if not, I can pick up where I left off. I'm skipping work over the weekend to recoup, so I'll need something to keep me busy. I did start on a new project named to_gguf which was supposed to isolate and reorganize a lot the client facing code. It was mostly experimental so I could play around with it without messing with upstream source code. Any progress I make there, I could push upstream if there was any interest in it. If not, it would be educational for me regardless. |
I've been reviewing the My efforts were aimed at maintaining backward compatibility while looking forward. This approach may have seemed like digression, but it was strategic, anticipating a more integrated solution which unfortunately didn't materialize as expected. I have a script that follows what I envisioned for PR #3633. It's currently non-functional but fixable. Integrating all three vocab classes ( My goal is to have a working fix in the next few days, though realistically, it might take a week of focused effort. I appreciate everyone's patience and welcome any suggestions or collaborative efforts. |
I attempted to simplify the process by trying "the easy way," but unfortunately, it didn't behave as expected. As a result, I opened issue #28370 in the Transformers repository to address the underlying problem. The previously utilized implementation, Here's a snippet that showcases the idea: class VocabLoader:
def __init__(self, params: Params, fname_tokenizer: Path) -> None:
try:
from transformers import AutoTokenizer
except ImportError as e:
raise ImportError(
"To use VocabLoader, please install the `transformers` package. "
"You can install it with `pip install transformers`."
) from e
try:
self.tokenizer = AutoTokenizer.from_pretrained(str(fname_tokenizer), trust_remote_code=True)
except ValueError:
self.tokenizer = AutoTokenizer.from_pretrained(str(fname_tokenizer), use_fast=False, trust_remote_code=True)
except OSError:
# Handle SPM model here Exploring the convert_slow_tokenizer.py script from the Transformers repository allowed me to capture the differences and learn more about the problem. I'm uncertain whether it's best to wait for a resolution, revert to the previous code in convert.py for now, or explore other potential solutions. Nevertheless, the exploration was worthwhile, and I appreciate the learning experience it provided. What's convenient about this approach is that we can convert to the fast tokenizer using the Just exploring all of my options before jumping in and changing anything. |
@teleprint-me Thank you for looking into this and appreciate the effort. I'm honestly a bit lost myself here - didn't expect this to be such a difficult change and I really don't understand what is the issue now. In any case, don't want you to spend too much effort looking into this. |
@ggerganov I appreciate your feedback and understanding. I believe there's a middle ground that can address this issue effectively without compromising the existing setup. The core problem here is straightforward: User Expectation: Users expect the Current Behavior: Previously, the Consequently, the original models fail to convert now simply because they were initially created with Here's a glimpse of the directory structure highlighting the difference: Original Models: (.venv) git:(convert.py | Δ) λ tree local/models/facebook/llama-2-7b-chat
local/models/facebook/llama-2-7b-chat
├── checklist.chk
├── consolidated.00.pth # original model
├── params.json # original hyperparameters
├── tokenizer_checklist.chk
└── tokenizer.model # llamas vocab: The SPM model Transformers-Supported Variation: (.venv) git:(convert.py | Δ) λ tree local/models/facebook/llama-2-7b-chat-hf
local/models/facebook/llama-2-7b-chat-hf
├── config.json # transformers hyperparameters
├── generation_config.json
├── ggml-model-f16.gguf # successfully created with convert.py
├── LICENSE.txt
├── model-00001-of-00002.safetensors # safetensors model part
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── pytorch_model-00001-of-00002.bin # torch model part
├── pytorch_model-00002-of-00002.bin
├── pytorch_model.bin.index.json
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json # llamas vocab: the transformers model
├── tokenizer.model # original sentencepiece model
└── USE_POLICY.md I'm actively working on resolving this issue over the next couple of days. Let's aim for a solution that allows users to continue utilizing their existing models seamlessly while also aligning with I appreciate your patience and support. |
The
Of course, I just don't want to have invalid instructions in the README, so a temporary solution is to go back to the working version until we figure out the proper way to implement this. |
I completely understand. The integration in PR #3633 was primarily aimed at enhancing support for multilingual models. While the outcomes weren't as anticipated, the thoughtfulness and forward-thinking behind these efforts are invaluable. I'm in the process of restoring the original functionality of Technical debt is a natural part of any rapidly evolving project, and I believe adopting a more collaborative strategy for managing these updates could be beneficial as you continue progress. It's about finding the right balance and ensuring that no single person bears the whole weight of this dynamic environment. Your leadership and the community's passion are the driving forces behind the success of I'm working towards a preliminary PR. It should be up soon. I'll keep you in the loop. |
Hi |
Screencast.from.2024-01-07.15-30-21.webmIt's working! I was also able to preserve support for transformers as well! 🥳 |
@ggerganov It's up: #4818 |
@HighTemplar-wjiang and all, please confirm that conversion works with latest |
Note on In the I observed that in some instances, To safeguard against such uncertainties and to maintain the integrity and reliability of the conversion process, I decided to raise a I recognize that this decision may necessitate an additional step for users, but I think it is a necessary measure to ensure the correctness and stability of the converted models. We can further investigate the impact and significance of the Thanks @ggerganov. |
the conversion fails with the latest in master. Reverting to 0353a18 allows convert.py to progress. the new backtrace with master is
|
This is a known issue with the ValueError: The model's vocab size is set to -1 in params.json. Please update it manually. Maybe 32000? You'll need to manually fix it by updating the value from |
Maybe we should issue a warning and revert to the fallback instead in the future? Commit f36a777 simply used a fallback without any warning. Maybe doing something similar would reduce the amount false negatives reported? I'm hesitant to support this approach because it isn't as concise as the exception and would be more difficult to spot as a result. Commit 0353a1 uses commit 799a1cb for Please report here if you experience any issues during inference. I suspect you will, but they may range from subtle to obvious which could be misconstrued as an issue with the model. Personally, I'm not really a fan of the assertion as I consider this to be a brittle approach, but it's the only way to prevent false positives/negatives and avoid strange bugs throughout the pipeline. It's a time saver in the long run. Given the trade-offs, I made the decision to enforce it. |
It's fine as it is since there is a workaround suggested. The old method created issues with other models where |
Was there a solution to this?
And getting the error:
I also tried 2 of the commits mentioned (0353a18, 799a1cb) and got the following error:
|
As for whether it was fixed or not, @cebtenzzre partially rolled back the patch I applied in PR #5041 due to conflicts with the code style and preference over the actual functional utility of the script. I don't care about this, I just care about whether or not the script functions as advertised. Personally, I just use formatters for whatever language I use and don't even think about it outside of my initial preferred settings which is very limited because I'd rather put energy into the code than worry about the way it looks. My primary concerns are always readability, maintainability, and functionality. I attempted to find a middle ground within the patch so that the work previously applied by both @strutive07 and @TheBloke would continue to function and allow all 3 vocabulary styles to operate depending on user preferences, e.g. Byte-Pair Encoding, Googles SentencePiece, and the Hugging Face Fast Tokenizer used by the The only way the I never got a chance to update the docs, implement the complete changes, or the other stuff it needed. It has been modified since and I haven't had time to really look into it as it requires a much more in-depth review and I have other projects that also need my attention. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Successful converting Llama models using the following command:
python convert.py models/xxx
where xxx is the original trained Llama model downloaded from Facebook.
Current Behavior
Cannot convert with errors (detailed below).
I've found there was an update in the following commit, after checkout an older version, the conversion could be done:
873637afc7924f435ac44c067630a28e82eefa7b
It seems that after the above commit, convert.py does not support the BPE vocab format anymore (--vocabtype param has been removed). While README did not reflect such change. This causes confusion.
Environment and Context
Macbook M3 Max
MacOS - 14.1 (23B2073)
Python 3.10.13
transformers 4.36.1
Failure Information (for bugs)
The text was updated successfully, but these errors were encountered: