Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue about CharFeaturizer #277

Open
yiqiaoc11 opened this issue Dec 10, 2022 · 1 comment
Open

Issue about CharFeaturizer #277

yiqiaoc11 opened this issue Dec 10, 2022 · 1 comment

Comments

@yiqiaoc11
Copy link

yiqiaoc11 commented Dec 10, 2022

Can anyone explain why the code below work? It seems to just extract first letter of the tokens. Thanks.

class CharFeaturizer(TextFeaturizer):

__def __init_vocabulary(self):
lines = []
if self.decoder_config.vocabulary is not None:
with codecs.open(self.decoder_config.vocabulary, "r") as fin:
lines.extend(fin.readlines())
else:
lines = ENGLISH_CHARACTERS
self.blank = 0 if self.decoder_config.blank_at_zero else None
self.tokens2indices = {}
self.tokens = []
index = 1 if self.blank == 0 else 0

    for line in lines:
        line = self.preprocess_text(line)
        if line.startswith("#") or not line:
            continue
        self.tokens2indices[line[0]] = index
        self.tokens.append(line[0])
        index += 1
    if self.blank is None:
        self.blank = len(self.tokens)  # blank not at zero
    self.non_blank_tokens = self.tokens.copy()
    self.tokens.insert(self.blank, "")  # add blank token to tokens
    self.num_classes = len(self.tokens)
    self.tokens = tf.convert_to_tensor(self.tokens, dtype=tf.string)
    self.upoints = tf.strings.unicode_decode(self.tokens, "UTF-8").to_tensor(shape=[None, 1])__
@Hassan-Zaib
Copy link

@yiqiaoc11 yes, it extracts the first letter.
have you found the solution to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants