Landing Copy

Hero

Title

Harness Engineering for Coding Agents

One-line verdict

Your model is not the bottleneck. Your harness is.

Side code block

if your agent only works when AGENTS.md grows:
  you do not have alignment
  you have prompt debt

fix:
  smaller root router
  harder verification gates
  repo-owned evals

Subhead

Stop shipping giant AGENTS files and calling it alignment. This Codex skill turns repo chaos into an enforceable operating model: short root router, hard proof, migration governance, and review-loop measurement grounded in real repo history.

CTA labels

Read the skill
See the benchmark
Read the release page

Proof Strip

OpenClaw benchmark: 92.0 review-loop vs 47.3 single-pass
Root router reduced from 229 lines to 66
Migration verification now runs cross-platform and remote D1 confirms 0091 / 0135 are historical registry exceptions
Optional CI guard can verify remote registry state when Cloudflare secrets are present

Five Breakdown Modes

1. Context Debt

The root context stops routing and starts narrating everything.

2. Verification Theater

The diff looks good, but nothing meaningful actually ran.

3. Review Collapse

Humans stop reviewing code and start reviewing confidence.

4. Migration Drift

SQL files, remote registry, docs, and prod truth stop matching.

5. Benchmark Cosplay

Teams compare models on public toy tasks while their own repo keeps failing locally.

Seven Mandatory Checks

Root context acts as a router
Real commands are explicit
One truth source is named
Legacy paths are classified
Proof actually runs
A reusable guardrail is encoded
The next run inherits less ambiguity

Scoreboard Block

Section title

Measured on a real repo, not benchmark theater

Metrics

92.0 historical-review-loop
47.3 historical-single-pass
229 -> 66 root router shrink
0 critical migration audit after documented exceptions and gaps

Short Body Copy

Most teams still respond to agent failure with more prompt.

That is not a scaling strategy.

The durable pattern is:

smaller root
harder proof
narrower autonomy
repo-specific evals

This skill packages that pattern into something a repo can actually enforce.

GitHub Description

Harness engineering for coding agents: smaller root routers, hard verification gates, migration governance, and repo-specific evals that make review-loop discipline measurable.

Social Preview Text

If your agent only works when AGENTS.md keeps getting larger, you do not have alignment. You have prompt debt. This skill turns that debt into machinery: smaller root context, explicit repo maps, hard proof, migration governance, and benchmark runs grounded in your own git history.

Launch Post

Most teams do not have a model problem.

They have a harness problem.

When quality drops, they add more words to AGENTS.md, ship larger diffs, and hope review catches it.

We turned the opposite pattern into a Codex skill:

root router instead of prompt sprawl
hard proof instead of diff vibes
repo-specific evals instead of benchmark theater
migration governance instead of production folklore

On OpenClaw's first historical benchmark:

review-loop governance: 92.0
single-pass shipping: 47.3

That gap is the product.

HN Titles

Show HN: Your model is not the bottleneck. Your harness is.
Show HN: Harness Engineering for Coding Agents
Show HN: We benchmarked review-loop governance vs single-pass agent shipping on a real repo

Reddit Titles

Your model is not the bottleneck. Your harness is. I turned that into a Codex skill.
I built a Codex skill for harness engineering and benchmarked it on real repo history.
Short root router plus repo-specific evals beat giant AGENTS files on our coding-agent benchmark.

One-line Taglines

Your model is not the bottleneck. Your harness is.
Smaller root. Harder proof. Better agents.
Stop scaling prompt debt.
Measure the harness, not just the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Landing Copy

Hero

Title

One-line verdict

Side code block

Subhead

CTA labels

Proof Strip

Five Breakdown Modes

1. Context Debt

2. Verification Theater

3. Review Collapse

4. Migration Drift

5. Benchmark Cosplay

Seven Mandatory Checks

Scoreboard Block

Section title

Metrics

Short Body Copy

GitHub Description

Social Preview Text

Launch Post

HN Titles

Reddit Titles

One-line Taglines

FilesExpand file tree

landing-copy.md

Latest commit

History

landing-copy.md

File metadata and controls

Landing Copy

Hero

Title

One-line verdict

Side code block

Subhead

CTA labels

Proof Strip

Five Breakdown Modes

1. Context Debt

2. Verification Theater

3. Review Collapse

4. Migration Drift

5. Benchmark Cosplay

Seven Mandatory Checks

Scoreboard Block

Section title

Metrics

Short Body Copy

GitHub Description

Social Preview Text

Launch Post

HN Titles

Reddit Titles

One-line Taglines