Skip to content

LLM validator response rejected when numeric entity values get returned as JSON numbers instead of strings #129

@andreatgretel

Description

@andreatgretel

Priority Level

Medium (Annoying but has workaround)

Describe the bug

The entity validator's output schema declares value: str, so DataDesigner's pre-validation demands type: "string" for every decision's value field. Some LLMs (observed with GPT-5.4-mini) occasionally strip the quotes from numeric-looking entity values when filling in the skeleton, returning "value": 42 instead of "value": "42". DD rejects the response and the record is dropped from the dataset with Record missing from workflow output.

This happens probabilistically — in a small smoke-test run against real endpoints, ~1 in 6 records containing an age entity (or any other numeric-looking value) got dropped this way. It affects both the sync and async engines identically.

Steps/Code to reproduce bug

Run detection on any record containing a numeric quasi-identifier like age, using GPT-5.4-mini (or a similarly liberal model) as entity_validator:

from anonymizer import Anonymizer
from anonymizer.config.anonymizer_config import AnonymizerConfig, AnonymizerInput, Detect
from anonymizer.config.replace_strategies import Redact

# (model config wiring GPT-5.4-mini as validator elided)

anonymizer.run(
    config=AnonymizerConfig(detect=Detect(), replace=Redact()),
    data=AnonymizerInput(
        source="input.csv",  # single row: "Patient Bob Smith, 42, was admitted..."
        text_column="text",
    ),
)

Representative DD warning (from a real run):

Non-retryable failure on _validation_decisions[rg=0, row=1]:
  | Cause: The model output from 'openai/openai/gpt-5.4-mini' could not be
  |   parsed into the requested format while running generation for column
  |   '_validation_decisions'. Validation detail: Response doesn't match
  |   requested <response_schema> 42 is not of type 'string' Failed
  |   validating 'type' in schema['properties']['decisions']['items']
  |   ['properties']['value']: {'default': '', 'description': 'Entity value
  |   (echoed from skeleton)', 'title': 'Value', 'type': 'string'}
  |   On instance['decisions'][2]['value']: 42.

Expected behavior

Records should survive the validator regardless of whether the LLM echoes back a numeric value as a JSON string ("42") or a JSON number (42). The value field is purely echoed context — enrich_validation_decisions in src/anonymizer/engine/detection/custom_columns.py overwrites it from the candidate lookup before any downstream consumer reads it — so its type shouldn't gate record survival.

Additional context

Suggested fix (not attempted in the PR that surfaced this):

Drop value and label from ValidationDecisionSchema entirely. They're never read downstream, and asking the LLM to echo them is pure cost and failure surface. Concretely:

  • src/anonymizer/engine/schemas/detection.py: remove value and label fields from ValidationDecisionSchema.
  • src/anonymizer/engine/detection/detection_workflow.py::_get_validation_prompt: update the few-shot Output: line so the example no longer includes value/label — the skeleton (Template: line) still carries them as context for the LLM, just not the output.
  • Add a regression test covering a numeric-looking entity value.

Approach that does NOT work (flagging it so we don't repeat the mistake):

Loosening the pydantic field to str | int | float + a coercion validator passes DD's pre-validation, but DD stores the raw LLM dict (not the pydantic-validated object) in the dataframe. Once some records have "42" (string) and others have 42 (int) for the same column, PyArrow can't pick a single Arrow dtype at parquet checkpoint time and the whole batch fails with Could not convert 'Alice' with type str: tried to convert to int64.

Environment:

  • data-designer >= 0.5.7
  • Observed on gpt-5.4-mini served via an internal API gateway. Likely reproducible on any LLM that isn't strictly JSON-schema compliant on numeric-string fields.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions