Skip to content

"X and Y" names are getting tagged as a single entity #142

@alexahaushalter

Description

@alexahaushalter

Priority Level

Medium (Annoying but has workaround)

Describe the bug

I notice the majority of the time in the demo dataset, it tags the kids names - Aria and Leo - as a single first name. Occasionally it correctly identifies them as two separate first names.

In either case, it does the replace part well (i.e. "Aria and Leo" even with tagged as one name still becomes "Claire and Joe").

However, when tagged as one name, it won't properly replace e.g. just "Aria" if she were referenced separately from Leo at a later point. We aren't sure whether this is happening due to GLiNER or the LLM phase of detection. It would be great to add to the base prompt to ensure that values like this get tagged separately.

Steps/Code to reproduce bug

Run the Replace or Your First Anonymization notebook on the first example of the demo dataset about Bobby Watford.

Observe that "Aria and Leo" usually gets tagged as a single entity. Rerun if they happen to get tagged separately.

Expected behavior

"Aria" should get tagged separately from "Leo".

Additional context

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions