feat(optimize): enforce CodSpeed CLI as single source of truth for all measurements

art049 · claude · art049 · commit 70e2ecec7729 · 2026-03-16T11:36:07.000+01:00
The optimize skill now explicitly requires all benchmarks to run through
the CodSpeed CLI, including walltime. Never fall back to raw benchmark
execution — ask the user for help if CodSpeed can't run. This ensures
all results are comparable, trackable, and analyzable with flamegraphs.

Also fix the plugin name casing in plugin.json.

Co-Authored-By: Claude &lt;noreply@anthropic.com&gt;
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,5 +1,5 @@
 {
-  "name": "CodSpeed",
+  "name": "codspeed",
   "description": "CodSpeed plugin for Claude Code helping with performance measurement and optimization.",
   "version": "1.0.0",
   "author": {
diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json
@@ -1,5 +1,5 @@
 {
-  "name": "CodSpeed",
+  "name": "codspeed",
   "description": "CodSpeed plugin for Cursor helping with performance measurement and optimization.",
   "version": "1.0.0",
   "author": {
diff --git a/skills/optimize/SKILL.md b/skills/optimize/SKILL.md
@@ -7,6 +7,8 @@ description: "Autonomously optimize code for performance using CodSpeed benchmar
 
 You are an autonomous performance engineer. Your job is to iteratively optimize code using CodSpeed benchmarks and flamegraph analysis. You work in a loop: measure, analyze, change, re-measure, compare — and you keep going until there's nothing left to gain or the user tells you to stop.
 
+**All measurements must go through CodSpeed.** Always use the CodSpeed CLI (`codspeed run`, `codspeed exec`) to run benchmarks — never run benchmarks directly (e.g., `cargo bench`, `pytest-benchmark`, `go test -bench`) outside of CodSpeed. The CodSpeed CLI and MCP tools are your single source of truth for all performance data. If you're unable to run benchmarks through CodSpeed (missing auth, unsupported setup, CLI errors), ask the user for help rather than falling back to raw benchmark execution. Results outside CodSpeed cannot be compared, tracked, or analyzed with flamegraphs.
+
 ## Before you start
 
 1. **Understand the target**: What code does the user want to optimize? A specific function, a whole module, a benchmark suite? If unclear, ask.
@@ -213,9 +215,10 @@ You have access to these CodSpeed MCP tools:
 
 ## Guiding principles
 
+- **Everything goes through CodSpeed.** Never run benchmarks outside of the CodSpeed CLI. Never quote timing numbers from raw benchmark output. The CodSpeed MCP tools (`compare_runs`, `query_flamegraph`, `list_runs`) are your source of truth — use them to read results, not terminal output. If CodSpeed can't run, ask the user to fix the setup rather than working around it.
 - **Measure first, optimize second.** Never optimize based on intuition alone — the flamegraph tells you where the time actually goes, and it's often not where you'd guess.
 - **One change at a time.** Isolated changes make it clear what helped and what didn't.
 - **Correctness over speed.** Always run tests. A fast but broken program is useless.
-- **Simulation for iteration, walltime for validation.** Simulation is deterministic and fast for feedback. Walltime is the ground truth.
+- **Simulation for iteration, walltime for validation.** Simulation is deterministic and fast for feedback. Walltime is the ground truth. Both run through CodSpeed.
 - **Know when to stop.** Diminishing returns are real. When gains drop below 1-2%, you're usually done unless the user has a specific target.
 - **Be transparent.** Show the user your reasoning, the numbers, and the tradeoffs. Performance optimization involves judgment calls — the user should be informed.

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`{`
`2`		`- "name": "CodSpeed",`
	`2`	`+ "name": "codspeed",`
`3`	`3`	`"description": "CodSpeed plugin for Claude Code helping with performance measurement and optimization.",`
`4`	`4`	`"version": "1.0.0",`
`5`	`5`	`"author": {`