Skip to content

Latest commit

 

History

History
217 lines (148 loc) · 6.81 KB

File metadata and controls

217 lines (148 loc) · 6.81 KB

The Format Tax — Recommendation Policy

This document defines the policy behind the homepage recommendation engine.

It is intentionally not the paper. The paper stays descriptive and conservative. The homepage is allowed to recommend a primary format, but every recommendation must expose:

  • a primary choice
  • a fallback
  • a confidence level
  • an evidence basis
  • the schema or validator pairing that makes the recommendation hold
  • the parser pairing where streaming matters

Design Principle

Users do not want a pile of formats. They want:

  1. the most practical answer for their use case
  2. why it wins
  3. what would change the answer

That means the old category tree is replaced with a decision policy.

Evidence Ladder

Homepage recommendations are labeled with one of three evidence bases:

benchmark-backed

Used when the project has locally reproduced benchmark evidence for the relevant decision boundary.

literature-backed

Used when the strongest available support comes from published work, official project benchmarks, or official platform documentation, but local reproduction is incomplete.

operational-heuristic

Used when the recommendation is mainly an engineering judgment based on ecosystem maturity, ergonomics, and deployment reality rather than a clean benchmark win.

Confidence Levels

high

  • strong ecosystem support
  • direct operational fit
  • little disagreement in the available evidence

medium

  • plausible winner
  • evidence is incomplete, contested, or not yet locally reproduced

low

  • speculative
  • emerging format or narrow benchmark support

Low-confidence winners should be avoided on the homepage unless the user explicitly opts into experimental paths.

Inputs

The engine ranks candidates from four inputs:

  1. Boundary
    • LLM input
    • LLM output
    • streaming UI
    • config
    • backend transport
    • storage
  2. Data shape
    • uniform tabular
    • nested structured
    • mixed prose + data
    • state / memory
    • simple key-value
  3. Priority
    • accuracy
    • token cost
    • generation validity
    • maintainability
    • latency
  4. Hard constraints
    • constrained decoding
    • progressive streaming
    • comments / embedded docs
    • broad interoperability
    • schema evolution

Hard Rules

These are non-negotiable overrides.

If the boundary is LLM output and the target is software

  • Primary: JSON + constrained decoding
  • Fallback: YAML
  • Confidence: High
  • Evidence: Literature-backed

Reason:

  • JSON Schema-backed structured output tooling is the strongest current ecosystem.
  • The relevant advantage is not “JSON alone”; it is JSON + schema-aware generation.

If the boundary is streaming UI and the granularity is property-level

  • Primary: YAML
  • Fallback: JSONL
  • Confidence: High
  • Evidence: Literature-backed

Reason:

  • meaningful partial prefixes matter more than universal interchange

If the boundary is config and the shape is simple key-value

  • Primary: TOML
  • Fallback: YAML
  • Confidence: High
  • Evidence: Operational heuristic

Reason:

  • maintainability beats universality for human-authored simple config

If the boundary is config and the artifact mixes prose with structure

  • Primary: Markdown + Frontmatter
  • Fallback: YAML
  • Confidence: High
  • Evidence: Operational heuristic

Reason:

  • the artifact is partly documentation, not just a payload

Current Recommendation Table

Use case Primary Fallback Confidence Evidence Schema / Parser pairing
LLM output to software pipeline JSON + constrained decoding YAML High Literature-backed JSON Schema
Agent state / memory for model retrieval Markdown-KV YAML Medium Literature-backed Template checks or pre-validation
Mixed prose + structured instructions Markdown + Frontmatter YAML High Operational heuristic Zod or Pydantic on frontmatter
Simple model-facing key-value input YAML TOML Medium Operational heuristic Parse then validate
Flat repeated records, optimize for token cost TOON CSV Medium Literature-backed Validate source data before conversion
Flat repeated records, optimize for retrieval accuracy CSV TOON Medium Literature-backed External validation before conversion
Property-level streaming UI YAML JSONL High Literature-backed json-render-style YAML streaming
Element-level streaming UI JSONL YAML High Benchmark-backed line-by-line parser
Simple human-maintained config TOML YAML High Operational heuristic Parse then validate
Nested operational config YAML TOML High Operational heuristic JSON Schema or Pydantic after parse
Low-latency backend transport FlatBuffers Protobuf Medium Literature-backed .fbs + codegen
General inter-service transport Protobuf JSON High Operational heuristic .proto + compatibility rules

Tiebreakers

When more than one candidate survives the hard rules, rank them in this order:

  1. hard constraint satisfaction
  2. operational maturity
  3. evidence strength
  4. fit to the selected priority
  5. migration cost

This ordering is deliberate. A slightly more efficient format does not win if it forces a fragile or poorly supported workflow.

Why Not JSON?

The recommendation engine should explain this explicitly whenever JSON is not the winner.

Common reasons:

  • repeated keys waste tokens on model-facing input
  • monolithic documents are poor for progressive streaming
  • JSON is hostile to comments and embedded explanation
  • binary protocols are better for hot service-to-service paths

Important:

  • JSON often is the winner for model output
  • JSON is not the villain; uniform JSON usage across all stages is

Schema Policy

The recommendation engine treats schemas as a separate layer.

  • Validators do not automatically make a format better for a model
  • The strongest current generation benefit comes from schema-guided output strategies
  • In practice, this usually means:
    • emit JSON Schema directly, or
    • generate JSON Schema from Zod, TypeBox, or Pydantic

The homepage should not imply that Zod, Valibot, Effect Schema, or Pydantic each independently improve generation quality by virtue of being those libraries. That is a benchmark question, not an assumption.

Non-Goals

The engine should not:

  • return five equally weighted answers
  • rank a format highly just because its own repository claims it wins
  • merge local results with external claims into one synthetic score
  • hide uncertainty behind confident copy

Relationship To The Paper

The paper asks:

  • what is measured
  • what has been reproduced
  • what remains uncertain

The homepage answers:

  • what should I use right now?

Those are different products and should remain different.