The Format Tax — Schemas and Validators

Schemas are a separate layer from serialization.

That sounds obvious, but many format debates blur the two:

a format is blamed for generation failures that are really schema failures
a validator library is credited for generation quality when the real win came from JSON Schema or constrained decoding
teams compare Zod, Pydantic, and Valibot as if they were serialization formats

This document keeps those layers separate.

The Three Jobs Schemas Do

1. Contract definition

What shape is allowed?

Examples:

JSON Schema
Pydantic models
Zod schemas
TypeBox schemas
Protobuf .proto

2. Runtime validation

Did the actual payload conform?

Examples:

Zod parse / safeParse
Pydantic validation
AJV against JSON Schema

3. Generation guidance

Can the model be steered toward the contract at generation time?

Examples:

JSON Schema passed to structured-output systems
schema text embedded in the prompt
constrained decoding

The third job is the one most directly relevant to generation quality.

What We Know Today

Strongest current claim

Schema-guided output strategies improve structural reliability more clearly than they improve semantic correctness.

This is the core distinction the project now enforces.

What follows from that

if you need valid structured output, schema guidance matters
if you need correct answers, schema guidance helps only part of the problem
choosing a validator library is not the same thing as choosing a generation strategy

Practical Taxonomy

JSON Schema

Best treated as the canonical interchange schema for model output.

Why it matters:

it is the most important bridge between application schemas and structured-output tooling
it is the common target that other validators often export toward
it is the cleanest way to talk about constrained decoding today

Zod

Best treated as a TypeScript-first source-of-truth schema that can feed JSON Schema-oriented workflows.

Why it matters:

good developer ergonomics
widely used in TypeScript app stacks
strong operational pairing with JSON output workflows

TypeBox

Best treated as the most direct TypeScript-to-JSON-Schema path.

Why it matters:

if your generation stack wants JSON Schema, TypeBox keeps the translation gap small

Pydantic

Best treated as the Python source-of-truth model layer with JSON Schema export.

Why it matters:

strong fit for Python-first LLM systems
natural bridge from application model to validation and structured output

Valibot and Effect Schema

Best treated as validator choices whose relevance to generation depends on whether they cleanly interoperate with JSON Schema-oriented workflows.

Why it matters:

runtime ergonomics and bundle/runtime tradeoffs may be excellent
direct evidence that either library independently improves model generation remains limited

What We Should Not Claim

The project should not claim:

“Zod improves generation quality”
“Valibot beats Pydantic for LLM output”
“Effect Schema produces more semantically correct JSON than TypeBox”

unless those claims are benchmarked directly.

At present, the safer claim is:

the strongest generation benefit comes from the schema strategy
library choice matters mostly through:
- export quality
- runtime ergonomics
- integration cost

Current Recommendation by Boundary

Boundary	Best schema layer	Why
LLM output to software	JSON Schema	strongest structured-output target
TypeScript app with model output	Zod or TypeBox -> JSON Schema	developer ergonomics plus output contract
Python app with model output	Pydantic -> JSON Schema	one model layer for validation and export
Human-authored config	parse first, then validate	schema is downstream of the file format
Model-facing input serialization	pre-validate source data	validators protect the source before conversion

Format Pairing Guidance

JSON output

Preferred schema layer: JSON Schema
Typical source schema: Zod, TypeBox, or Pydantic
Why: strongest operational support for structured output and downstream validation

YAML input or config

Preferred schema layer: validate after parsing
Typical validators: JSON Schema, Zod, Pydantic
Why: YAML itself is not the validator; it is the transport syntax

Markdown + Frontmatter

Preferred schema layer: validate the frontmatter only
Why: the prose body is documentation, not a single typed payload

Markdown-KV

Preferred schema layer: template checks or source validation
Why: this format is useful precisely because it is lightweight and model-readable, not because it plugs into a rich validation ecosystem

Benchmark Implications

The schema question should be tested in layers:

no schema guidance
schema in prompt
post-parse validation
native constrained decoding

That is why the benchmark suite now includes an experimental schema-guidance track.

High-Confidence Operational Rules

If the output must be machine-consumable, think in terms of JSON Schema even if your app authoring layer starts in Zod or Pydantic.
Validators matter most at generation time when they affect the generation strategy, not just the post-hoc parse step.
For input serialization, validate the source data before conversion; the model-facing format and the schema layer solve different problems.

Open Questions

These remain benchmark questions rather than settled doctrine:

Does schema-in-prompt materially improve semantic correctness, or mostly syntax?
How much does native constrained decoding outperform prompt-only schema guidance for real tasks?
Does library choice matter after normalizing to the same JSON Schema contract?
Are there use cases where validator strictness harms reasoning quality by overconstraining the response space?

Those are exactly the questions the project should keep open until the benchmark data exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Format Tax — Schemas and Validators

The Three Jobs Schemas Do

1. Contract definition

2. Runtime validation

3. Generation guidance

What We Know Today

Strongest current claim

What follows from that

Practical Taxonomy

JSON Schema

Zod

TypeBox

Pydantic

Valibot and Effect Schema

What We Should Not Claim

Current Recommendation by Boundary

Format Pairing Guidance

JSON output

YAML input or config

Markdown + Frontmatter

Markdown-KV

Benchmark Implications

High-Confidence Operational Rules

Open Questions

FilesExpand file tree

schemas.md

Latest commit

History

schemas.md

File metadata and controls

The Format Tax — Schemas and Validators

The Three Jobs Schemas Do

1. Contract definition

2. Runtime validation

3. Generation guidance

What We Know Today

Strongest current claim

What follows from that

Practical Taxonomy

JSON Schema

Zod

TypeBox

Pydantic

Valibot and Effect Schema

What We Should Not Claim

Current Recommendation by Boundary

Format Pairing Guidance

JSON output

YAML input or config

Markdown + Frontmatter

Markdown-KV

Benchmark Implications

High-Confidence Operational Rules

Open Questions