Skip to content

Schema validation rejects small-model output before coercion can normalize it #569

@andreatgretel

Description

@andreatgretel

Summary

Anonymizer PR NVIDIA-NeMo/Anonymizer#130 identified a pattern where jsonschema.validate() runs on raw LLM output before pydantic's before-validators can normalize drift from small models. The same root cause exists in DD, though the impact is lower due to DD's correction loop and the fact that most LLM-facing schemas are user-defined rather than internal.

Root cause

Two validation paths are affected:

LLM-Structured columns (StructuredResponseRecipe): gsonschema.validate() validates raw LLM JSON against the user's schema at response_recipes.py:139. Any strict enum, required, type, or minLength constraint in the JSON schema becomes an un-coercible gate for small-model drift. The user's Pydantic BaseModel is converted to JSON schema via model_json_schema() (column_configs.py:317), preserving all strict constraints.

LLM-Judge columns (PydanticResponseRecipe): judge_score_factory.py:38 creates a strict DynamicScaleEnum with no before-validators. If the model returns 1 (int) instead of "1" (string), or a slightly different casing, validation fails at model_validate().

Impact

Lower than Anonymizer's case (60% -> 100% pass rate fix) because:

  • DD has a correction loop (max_correction_steps + max_conversation_restarts) that gives the model multiple chances
  • Judge enums are typically simple (1-5 scale), so drift is less frequent than Anonymizer's 16-24 value enums
  • Structured column schemas are user-defined, not a fixed internal pipeline

Still, the correction loop wastes tokens/latency, and users running DD on DGX Spark or Ollama with small local models will see unnecessary failures.

Suggested fix

Judge columns (low effort, clear win): Add a @model_validator(mode="before") to BaseJudgeResponse that coerces int/float to string and does fuzzy enum matching.

Structured columns (larger scope): Extend gsonschema validators (same pattern as the existing pruning extension) to add lenient enum/type coercion when strict validation fails. Consider gating behind a config flag.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions