Priority Level
Medium (Annoying but has workaround)
Describe the bug
I notice the majority of the time in the demo dataset, it tags the kids names - Aria and Leo - as a single first name. Occasionally it correctly identifies them as two separate first names.
In either case, it does the replace part well (i.e. "Aria and Leo" even with tagged as one name still becomes "Claire and Joe").
However, when tagged as one name, it won't properly replace e.g. just "Aria" if she were referenced separately from Leo at a later point. We aren't sure whether this is happening due to GLiNER or the LLM phase of detection. It would be great to add to the base prompt to ensure that values like this get tagged separately.
Steps/Code to reproduce bug
Run the Replace or Your First Anonymization notebook on the first example of the demo dataset about Bobby Watford.
Observe that "Aria and Leo" usually gets tagged as a single entity. Rerun if they happen to get tagged separately.
Expected behavior
"Aria" should get tagged separately from "Leo".
Additional context

Priority Level
Medium (Annoying but has workaround)
Describe the bug
I notice the majority of the time in the demo dataset, it tags the kids names - Aria and Leo - as a single first name. Occasionally it correctly identifies them as two separate first names.
In either case, it does the replace part well (i.e. "Aria and Leo" even with tagged as one name still becomes "Claire and Joe").
However, when tagged as one name, it won't properly replace e.g. just "Aria" if she were referenced separately from Leo at a later point. We aren't sure whether this is happening due to GLiNER or the LLM phase of detection. It would be great to add to the base prompt to ensure that values like this get tagged separately.
Steps/Code to reproduce bug
Run the Replace or Your First Anonymization notebook on the first example of the demo dataset about Bobby Watford.
Observe that "Aria and Leo" usually gets tagged as a single entity. Rerun if they happen to get tagged separately.
Expected behavior
"Aria" should get tagged separately from "Leo".
Additional context