Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUFTokenizerSkeleton AttributeError during conversion #31553

Closed
1 of 4 tasks
mo-arvan opened this issue Jun 22, 2024 · 3 comments · Fixed by #31575
Closed
1 of 4 tasks

GGUFTokenizerSkeleton AttributeError during conversion #31553

mo-arvan opened this issue Jun 22, 2024 · 3 comments · Fixed by #31575

Comments

@mo-arvan
Copy link

System Info

  • transformers version: 4.42.0.dev0
  • Platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.34
  • Python version: 3.8.19
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.2+cu121 (False)
  • Tensorflow version (GPU?): 2.13.1 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
  • Jax version: 0.4.13
  • JaxLib version: 0.4.13
  • Using distributed or parallel set-up in script?: no, no

Who can help?

@ArthurZucker
@younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Dockerfile:

FROM python:3.8-slim


RUN apt update

RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg build-essential

RUN python3 -m pip install --no-cache-dir --upgrade pip

COPY . /app
WORKDIR /app

RUN pip install -e ".[dev]"

RUN python3 -m pip install --no-cache-dir gguf


WORKDIR /app

Code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename, legacy=False)

Error:

Traceback (most recent call last):
  File "0m0/gguf_test.py", line 7, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename, legacy=False)
  File "/app/src/transformers/models/auto/tokenization_auto.py", line 899, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/app/src/transformers/tokenization_utils_base.py", line 2163, in from_pretrained
    return cls._from_pretrained(
  File "/app/src/transformers/tokenization_utils_base.py", line 2397, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/app/src/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__
    super().__init__(
  File "/app/src/transformers/tokenization_utils_fast.py", line 124, in __init__
    fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)
  File "/app/src/transformers/integrations/ggml.py", line 692, in convert_gguf_tokenizer
    fast_tokenizer = converter.converted()
  File "/app/src/transformers/integrations/ggml.py", line 635, in converted
    tokenizer = super().converted()
  File "/app/src/transformers/convert_slow_tokenizer.py", line 628, in converted
    for token in [p.piece for p in self.proto.pieces if p.type == 4]
AttributeError: 'GGUFTokenizerSkeleton' object has no attribute 'pieces'

Expected behavior

It should load the tokenizer.

@mo-arvan
Copy link
Author

I believe this is due to the changes added in this issue.

It should be a simple fix, we can use hasattr to check whether proto has pieces.

@amyeroberts
Copy link
Collaborator

cc @SunMarc too

@SunMarc
Copy link
Member

SunMarc commented Jun 24, 2024

Hi @mo-arvan, thanks for raising the issue. This should be fixed in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants