Skip to content

code-agents/the-format-tax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Format Tax

Benchmark suite, paper draft, and recommendation tool for serialization choices in human-AI systems.

formattax.dev | Paper draft | Recommendation policy | Benchmarks | Schemas


Thesis

Serialization is not just an implementation detail.

The same system may cross several boundaries with incompatible needs:

  • model-facing input
  • model-facing output
  • progressive UI streaming
  • human-maintained config
  • inter-service transport
  • storage

Using one format for all of them imposes a Format Tax: unnecessary structural tokens, weaker streaming behavior, poorer ergonomics, or a less reliable output path than the stage actually requires.

What This Repo Is

This repository contains three related artifacts:

  1. A paper draft An academic-facing document that stays conservative about evidence and separates local results from external claims.
  2. A benchmark suite Canonical datasets, question sets, encoders, and runner tracks for measuring format behavior.
  3. A homepage and recommendation engine A practitioner-facing tool that returns a primary format, fallback, confidence, and schema/parser pairing.

Evidence Policy

Every claim in the project should fit one of these buckets:

  • Locally measured — produced by this repository’s benchmark code
  • Externally reported — from papers, official project benchmarks, or official docs
  • Open hypothesis — worth testing, not yet proven here

This matters because the public serialization discourse is noisy:

  • some results are academic
  • some are official format-project benchmarks
  • some are independent reproductions
  • some are placeholder numbers used for site development

The project is being rewritten to keep those apart.

Current Benchmark Coverage

Implemented or in-progress runner tracks:

  • Token efficiency
  • Retrieval accuracy
  • Streaming readiness
  • Generation quality
  • Schema guidance

Planned but not yet implemented as a local runner track:

  • Binary throughput / transport benchmarking

Recommendation Layer

The homepage is intentionally more operational than the paper.

It answers:

  • what boundary am I at?
  • what shape is the data?
  • what is the main optimization target?
  • what hard constraints apply?

It then returns:

  • a primary recommendation
  • a fallback
  • confidence
  • evidence basis
  • schema pairing
  • parser pairing if relevant

The homepage is allowed to be decisive. The paper is not allowed to smuggle those decisions in as settled research results.

Schema Position

Validators and schemas are treated as a separate layer.

Current project stance:

  • validator libraries are not serialization formats
  • the strongest generation benefit usually comes from schema strategy, especially JSON Schema-oriented workflows
  • library choice mostly matters through:
    • JSON Schema export quality
    • runtime ergonomics
    • ecosystem fit

Important Status Note

Some benchmark content under site/src/content/benchmarks/ and format metadata still exists primarily to support site UI development. Those placeholders should not be read as final reproduced benchmark results unless the corresponding run metadata is populated and dummy is false.

Repository Layout

the-format-tax/
├── README.md
├── docs/
│   ├── paper.md
│   ├── decision-tree.md
│   ├── benchmarks.md
│   ├── schemas.md
│   └── superpowers/
├── benchmarks/
│   ├── datasets/
│   ├── questions/
│   ├── references/
│   └── runner/
├── paper/
└── site/

Development

Site

cd site
bun install
bun run dev

Benchmarks

cd benchmarks/runner
bun run src/index.ts --track=1

The runner currently expects Bun.

Contributing

Useful contributions:

  • independent reproductions of public format claims
  • better harness normalization
  • schema-guidance experiments
  • binary transport benchmark implementation
  • clearer separation between dummy site content and measured benchmark output

License

MIT for code. Documentation and paper material follow the project’s existing documentation license conventions.

About

The Format Tax

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors