Skip to content

How does mitie deal with the segmentation of OOV #205

@rookiebird

Description

@rookiebird

Expected Behavior

Hi,I want to know how does mitie deal with the segmentation of OOV.
In fact, two of my train example like this:
1.The daily life of the [League Of Legends](name) on November 10 (chinese: [英雄联盟](name)11.10的日活)
2. The daily life of the [Tomb Raider3](name) on November 10 (chinese: [古墓丽影3](name)11.10的日活)
My training sample is in Chinese which contains many entities related to the game name. Some game names contain numbers, some have no numbers,like "古墓丽影3" and ”英雄联盟“.In the example above , I want mitie to identify the entities as "古墓丽影3" and the ”英雄联盟“. 11.10 is a simple representation of the date,which should not be include.

Current Behavior

I label the entity correctly.However, the first sample is often identified as ”英雄联盟11" rather than ”英雄联盟". How can I deal with this problem? I try to add several data,but It's work. Should I add more data ?

  • Version: 0.7.0
  • Where did you get MITIE: pip install
  • Platform: windows64 and linux64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions