Skip to content

Issue about CharFeaturizer #277

Description

@yiqiaoc11

Can anyone explain why the code below work? It seems to just extract first letter of the tokens. Thanks.

class CharFeaturizer(TextFeaturizer):

__def __init_vocabulary(self):
lines = []
if self.decoder_config.vocabulary is not None:
with codecs.open(self.decoder_config.vocabulary, "r") as fin:
lines.extend(fin.readlines())
else:
lines = ENGLISH_CHARACTERS
self.blank = 0 if self.decoder_config.blank_at_zero else None
self.tokens2indices = {}
self.tokens = []
index = 1 if self.blank == 0 else 0

    for line in lines:
        line = self.preprocess_text(line)
        if line.startswith("#") or not line:
            continue
        self.tokens2indices[line[0]] = index
        self.tokens.append(line[0])
        index += 1
    if self.blank is None:
        self.blank = len(self.tokens)  # blank not at zero
    self.non_blank_tokens = self.tokens.copy()
    self.tokens.insert(self.blank, "")  # add blank token to tokens
    self.num_classes = len(self.tokens)
    self.tokens = tf.convert_to_tensor(self.tokens, dtype=tf.string)
    self.upoints = tf.strings.unicode_decode(self.tokens, "UTF-8").to_tensor(shape=[None, 1])__

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions