[`CodeLlamaTokenizer`] Nit, update init to make sure the AddedTokens are not normalized because they are special #27359

ArthurZucker · 2023-11-08T07:45:01Z

What does this PR do?

Bridges the gap between the slow and fast version follow the updates in #26570 (similar updates were done to Llama)

HuggingFaceDocBuilderDev · 2023-11-08T08:11:47Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik · 2023-11-09T08:35:12Z

src/transformers/models/code_llama/tokenization_code_llama.py

@@ -33,13 +33,17 @@
 PRETRAINED_VOCAB_FILES_MAP = {
    "vocab_file": {
        "hf-internal-testing/llama-code-tokenizer": "https://huggingface.co/hf-internal-testing/llama-tokenizer/resolve/main/tokenizer.model",
+        "codellama/CodeLlama-34b-Instruct-hf": "https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf/resolve/main/tokenizer.model",


remove these three lines and we can merge :)

(These are just here for backwards-compatibility)

LysandreJik

great

…ens are not normalized because they are special (huggingface#27359) * make sure tokens are properly initialized for codellama slow * add m ore pretrained models * style * test more tokenizers checkpoints

make sure tokens are properly initialized for codellama slow

d7f572c

ArthurZucker marked this pull request as ready for review November 9, 2023 08:24

ArthurZucker requested a review from LysandreJik November 9, 2023 08:25

ArthurZucker added 2 commits November 9, 2023 09:26

add m ore pretrained models

5b147d3

style

4d7ae8c

LysandreJik approved these changes Nov 9, 2023

View reviewed changes

test more tokenizers checkpoints

fea6f59

LysandreJik approved these changes Nov 9, 2023

View reviewed changes

ArthurZucker merged commit 085ea7e into main Nov 9, 2023
3 checks passed

ArthurZucker deleted the nit-codellama branch November 9, 2023 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`CodeLlamaTokenizer`] Nit, update init to make sure the AddedTokens are not normalized because they are special #27359

[`CodeLlamaTokenizer`] Nit, update init to make sure the AddedTokens are not normalized because they are special #27359

ArthurZucker commented Nov 8, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 8, 2023 •

edited

Loading

LysandreJik Nov 9, 2023

LysandreJik left a comment

[CodeLlamaTokenizer] Nit, update __init__ to make sure the AddedTokens are not normalized because they are special #27359

[CodeLlamaTokenizer] Nit, update __init__ to make sure the AddedTokens are not normalized because they are special #27359

Conversation

ArthurZucker commented Nov 8, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 8, 2023 • edited Loading

LysandreJik Nov 9, 2023

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

[`CodeLlamaTokenizer`] Nit, update init to make sure the AddedTokens are not normalized because they are special #27359

[`CodeLlamaTokenizer`] Nit, update init to make sure the AddedTokens are not normalized because they are special #27359

ArthurZucker commented Nov 8, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 8, 2023 •

edited

Loading