Summary
Anonymizer PR NVIDIA-NeMo/Anonymizer#130 identified a pattern where jsonschema.validate() runs on raw LLM output before pydantic's before-validators can normalize drift from small models. The same root cause exists in DD, though the impact is lower due to DD's correction loop and the fact that most LLM-facing schemas are user-defined rather than internal.
Root cause
Two validation paths are affected:
LLM-Structured columns (StructuredResponseRecipe): gsonschema.validate() validates raw LLM JSON against the user's schema at response_recipes.py:139. Any strict enum, required, type, or minLength constraint in the JSON schema becomes an un-coercible gate for small-model drift. The user's Pydantic BaseModel is converted to JSON schema via model_json_schema() (column_configs.py:317), preserving all strict constraints.
LLM-Judge columns (PydanticResponseRecipe): judge_score_factory.py:38 creates a strict DynamicScaleEnum with no before-validators. If the model returns 1 (int) instead of "1" (string), or a slightly different casing, validation fails at model_validate().
Impact
Lower than Anonymizer's case (60% -> 100% pass rate fix) because:
- DD has a correction loop (
max_correction_steps + max_conversation_restarts) that gives the model multiple chances
- Judge enums are typically simple (1-5 scale), so drift is less frequent than Anonymizer's 16-24 value enums
- Structured column schemas are user-defined, not a fixed internal pipeline
Still, the correction loop wastes tokens/latency, and users running DD on DGX Spark or Ollama with small local models will see unnecessary failures.
Suggested fix
Judge columns (low effort, clear win): Add a @model_validator(mode="before") to BaseJudgeResponse that coerces int/float to string and does fuzzy enum matching.
Structured columns (larger scope): Extend gsonschema validators (same pattern as the existing pruning extension) to add lenient enum/type coercion when strict validation fails. Consider gating behind a config flag.
References
Summary
Anonymizer PR NVIDIA-NeMo/Anonymizer#130 identified a pattern where
jsonschema.validate()runs on raw LLM output before pydantic's before-validators can normalize drift from small models. The same root cause exists in DD, though the impact is lower due to DD's correction loop and the fact that most LLM-facing schemas are user-defined rather than internal.Root cause
Two validation paths are affected:
LLM-Structured columns (
StructuredResponseRecipe):gsonschema.validate()validates raw LLM JSON against the user's schema atresponse_recipes.py:139. Any strictenum,required,type, orminLengthconstraint in the JSON schema becomes an un-coercible gate for small-model drift. The user's Pydantic BaseModel is converted to JSON schema viamodel_json_schema()(column_configs.py:317), preserving all strict constraints.LLM-Judge columns (
PydanticResponseRecipe):judge_score_factory.py:38creates a strictDynamicScaleEnumwith no before-validators. If the model returns1(int) instead of"1"(string), or a slightly different casing, validation fails atmodel_validate().Impact
Lower than Anonymizer's case (60% -> 100% pass rate fix) because:
max_correction_steps+max_conversation_restarts) that gives the model multiple chancesStill, the correction loop wastes tokens/latency, and users running DD on DGX Spark or Ollama with small local models will see unnecessary failures.
Suggested fix
Judge columns (low effort, clear win): Add a
@model_validator(mode="before")toBaseJudgeResponsethat coerces int/float to string and does fuzzy enum matching.Structured columns (larger scope): Extend
gsonschemavalidators (same pattern as the existing pruning extension) to add lenient enum/type coercion when strict validation fails. Consider gating behind a config flag.References