Skip to content

The downside of ExtPos in the features #1252

@AngledLuffa

Description

@AngledLuffa

So, I've found what I think is a meaningful downside to having ExtPos in the features instead of the Misc column. It's really difficult for a model to learn this in imbalanced situations. So for example, a in Spanish occurs 13,000 times in AnCora, 1000 times in an MWT. This is actually not an unreasonable ratio to learn. But then de occurs about 40,000 times, only 400 of which are MWT. Not surprisingly, Stanza's models completely punt on this.

@dan-zeman - I remember you saying you didn't like ExtPos in the features either...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions