Non-consecutive added token '<unk>' found for llama 2 fine tune model #1181

chintanshrinath · 2023-10-19T23:56:36Z

Dear
I have fine tune llama 2 model. And then I am using below merge and upload functionality to merge model
`
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = "meta-llama/Llama-2-13b-hf"
adapters_name = 'Llama-13b_17_10'

print(f"Starting to load the model {model_name} into memory")

m = AutoModelForCausalLM.from_pretrained(
model_name,
#load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()

tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1
stop_token_ids = [0]
`
which merge succesfully, but when I am using text generation inference below code. it shows an error
ValueError: Non-consecutive added token '' found. Should have index 32000 but has index 0 in saved vocabulary.

docker run --gpus all --shm-size 1g -p 8080:80 -v /datadrive:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id '/data/Azure_Backup/shrinath_merged_model_20_10' --quantize bitsandbytes-nf4 --env --num-shard 1

can you help me?
Thanks

The text was updated successfully, but these errors were encountered:

ssmi153 · 2023-10-22T22:34:19Z

Try updating to the latest version of Transformers and repeating the merge. There was a recent PR that might fix this issue: huggingface/transformers#26570

chintanshrinath · 2023-10-23T07:16:14Z

Hi @ssmi153
It is working now expected.

Thanks you

chintanshrinath closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-consecutive added token '<unk>' found for llama 2 fine tune model #1181

Non-consecutive added token '<unk>' found for llama 2 fine tune model #1181

chintanshrinath commented Oct 19, 2023

ssmi153 commented Oct 22, 2023

chintanshrinath commented Oct 23, 2023

Non-consecutive added token '<unk>' found for llama 2 fine tune model #1181

Non-consecutive added token '<unk>' found for llama 2 fine tune model #1181

Comments

chintanshrinath commented Oct 19, 2023

ssmi153 commented Oct 22, 2023

chintanshrinath commented Oct 23, 2023