Skip to content

Add regression benchmark gate for context compression quality #879

@gltanaka

Description

@gltanaka

Parent: #1411

Goal

Add a benchmark/test gate proving compression reduces context size without reducing pass rate on representative PDD tasks.

Prototype links

Acceptance criteria

  • Benchmark compares full tests, AST tests, AST+contracts, full few-shot, and compressed few-shot.
  • Reports pass rate, token counts, wall-clock time, output churn, and missing-contract failures.
  • Fails CI or benchmark gate if compression loses required contract symbols or regresses pass rate on frozen fixtures.
  • Includes at least one fixture that previously failed without contract-source preservation.

Parent epic: #873.


Migrated from gltanaka/pdd#1418 (originally filed 2026-05-08 by gltanaka). gltanaka/pdd is deprecated.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions