-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading GGUF files support #30391
Loading GGUF files support #30391
Changes from 1 commit
fb00288
81e4324
8a0d5b8
08534f3
ebd9944
5c913ec
8b81bfb
c49f1a8
7fa538b
5485327
074f05e
ca8363e
2a0c9b0
fac7bb3
45983db
e6c6f6c
a6cd08c
6611877
455163b
42d5815
1d3acec
af3c42c
a27db0c
14ad10c
ab621a7
207820a
1fef8ad
9ae7363
3ed384f
55eb860
f754335
a449078
3bdbb2e
0ab79f6
65433c4
1b5ae54
d6b67c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -513,6 +513,9 @@ def __init__(self, dict_): | |
else: | ||
self.merges = [tuple(merge.split(" ")) for merge in self.merges] | ||
|
||
if not hasattr(self, "added_tokens"): | ||
self.added_tokens = [] | ||
|
||
|
||
class GGUFLlamaConverter(LlamaConverter): | ||
def __init__(self, tokenizer_dict): | ||
|
@@ -539,6 +542,12 @@ def tokenizer(self, proto): | |
AddedToken("</s>", normalized=False, special=True), | ||
] | ||
) | ||
|
||
if len(self.proto.added_tokens) != 0: | ||
tokenizer.add_special_tokens( | ||
[AddedToken(added_token, normalized=False, special=False) for added_token in self.added_tokens] | ||
) | ||
|
||
return tokenizer | ||
|
||
def decoder(self, replacement, add_prefix_space): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add prefix space is defined in the gguf? Might not be good to always take it from the class (which is what's happening now) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not defined from what I read in the GGML docs + when inspecting various checkpoints from the Hub There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it's always adding a prefix space I suppose? |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all of them are special here. You can add them all as special
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younesbelkada this just means that added tokens that are not special will be skipped when decoding.