Skip to content

训练文档理解时日文文字转token时出现很多[UNK],如何解决 #214

@c-avan

Description

@c-avan

加载默认tokenizer 进行字符转换时,会出现部分字符为【UNK】
TOKENIZER = TOKENIZER = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=True)
tokenizer = TOKENIZER

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions