You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding this function, I found the following error case.
Even though this may be a minor error, just for your information.
from transformers import AutoTokenizer
# preparing an example
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
untokenized_sent = 'pretrained language models prone to learn domain-specific spurious correlations between input and output .'.split()
tokenized_sent = tokenizer.tokenize(tokenizer.cls_token + ' '.join(untokenized_sent) + tokenizer.sep_token)
# exactly the same as `match_tokenized_to_untokenized` for generating `mapping`
mapping = defaultdict(list)
untokenized_sent_index = 0
tokenized_sent_index = 1
while (untokenized_sent_index < len(untokenized_sent) and tokenized_sent_index < len(tokenized_sent)):
while (tokenized_sent_index + 1 < len(tokenized_sent) and tokenized_sent[tokenized_sent_index + 1].startswith('##')):
mapping[untokenized_sent_index].append(tokenized_sent_index)
tokenized_sent_index += 1
mapping[untokenized_sent_index].append(tokenized_sent_index)
untokenized_sent_index += 1
tokenized_sent_index += 1
# verifying the mapping is correct or not
for i in mapping:
j = mapping[i]
print(untokenized_sent[i], tokenized_sent[j[0]:j[-1]+1])
Result:
pretrained ['pre', '##train', '##ed']
language ['language']
models ['models']
prone ['prone']
to ['to']
learn ['learn']
**domain-specific ['domain']**
spurious ['-']
correlations ['specific']
between ['spur', '##ious']
input ['correlation', '##s']
and ['between']
output ['input']
. ['and']
The text was updated successfully, but these errors were encountered:
description-length-probing/control_tasks/control_tasks/data.py
Line 325 in 2696af0
Regarding this function, I found the following error case.
Even though this may be a minor error, just for your information.
Result:
The text was updated successfully, but these errors were encountered: