diff --git a/retro/SKILL.md b/retro/SKILL.md index 1d87ce9c79..b95b9d1bbd 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -1224,6 +1224,22 @@ Deep sessions: 3 → 5 ↑2 **If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends." +**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as: + +- `addressed` — clear evidence in commits/files/metrics (cite it), +- `partial` — some movement but not done, +- `open` — no evidence this window. + +Surface a **Recommendation follow-through** section and feed the verdict into the narrative: +``` +Recommendation follow-through (vs last retro): + [x] testing — E2E fixture reliability pass addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12 + [~] security — gitleaks on a schedule partial: pre-commit hook added, no scheduled run yet + [ ] architecture — extract html_generator open +2 of 3 prior recommendations addressed. +``` +When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise. + ### Step 13: Save Retro History After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot: @@ -1273,6 +1289,11 @@ Use the Write tool to save the JSON file with this schema: "version_range": ["1.16.0.0", "1.16.1.0"], "streak_days": 47, "tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm", + "recommendations": [ + { "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" }, + { "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" }, + { "category": "architecture", "text": "Extract html_generator rendering into per-format modules" } + ], "greptile": { "fixes": 3, "fps": 1, @@ -1284,6 +1305,8 @@ Use the Write tool to save the JSON file with this schema: **Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely. +**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3. + Include test health data in the JSON when test files exist: ```json "test_health": { @@ -1420,6 +1443,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team ### 3 Things to Improve Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..." +Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files. + ### 3 Habits for Next Week Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day"). diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index b0819c8a6b..a549816b23 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -431,6 +431,22 @@ Deep sessions: 3 → 5 ↑2 **If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends." +**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as: + +- `addressed` — clear evidence in commits/files/metrics (cite it), +- `partial` — some movement but not done, +- `open` — no evidence this window. + +Surface a **Recommendation follow-through** section and feed the verdict into the narrative: +``` +Recommendation follow-through (vs last retro): + [x] testing — E2E fixture reliability pass addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12 + [~] security — gitleaks on a schedule partial: pre-commit hook added, no scheduled run yet + [ ] architecture — extract html_generator open +2 of 3 prior recommendations addressed. +``` +When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise. + ### Step 13: Save Retro History After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot: @@ -480,6 +496,11 @@ Use the Write tool to save the JSON file with this schema: "version_range": ["1.16.0.0", "1.16.1.0"], "streak_days": 47, "tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm", + "recommendations": [ + { "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" }, + { "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" }, + { "category": "architecture", "text": "Extract html_generator rendering into per-format modules" } + ], "greptile": { "fixes": 3, "fps": 1, @@ -491,6 +512,8 @@ Use the Write tool to save the JSON file with this schema: **Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely. +**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3. + Include test health data in the JSON when test files exist: ```json "test_health": { @@ -627,6 +650,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team ### 3 Things to Improve Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..." +Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files. + ### 3 Habits for Next Week Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day"). diff --git a/test/regression-1834-retro-recommendations.test.ts b/test/regression-1834-retro-recommendations.test.ts new file mode 100644 index 0000000000..5ebc3974a0 --- /dev/null +++ b/test/regression-1834-retro-recommendations.test.ts @@ -0,0 +1,108 @@ +/** + * Regression tests for #1834 — /retro generated "3 Things to Improve" as + * throwaway prose and never persisted them, so the next run had no structured + * record of what it recommended and could not detect follow-through. A week + * spent acting on retro feedback got mischaracterized as a generic metric swing. + * + * The fix lives in retro/SKILL.md.tmpl: + * - Step 13 snapshot schema gains a `recommendations` array (always written). + * - Step 13 prose mandates populating it from the "3 Things to Improve" items. + * - Step 12 reads a prior snapshot's `recommendations` back and classifies each + * as addressed / partial / open, with a backward-compat skip for older + * snapshots that predate the field. + * - The "3 Things to Improve" narrative is wired to the persisted array. + * + * These are static invariants against the template body (and the regenerated + * SKILL.md). They fail the build if any leg of the persist → read-back loop is + * dropped, so the follow-through capability can't silently regress. + */ +import { describe, expect, test } from "bun:test"; +import * as fs from "node:fs"; +import * as path from "node:path"; + +const ROOT = path.resolve(import.meta.dir, ".."); +const RETRO_TMPL = path.join(ROOT, "retro", "SKILL.md.tmpl"); +const RETRO_MD = path.join(ROOT, "retro", "SKILL.md"); + +function readTmpl(): string { + return fs.readFileSync(RETRO_TMPL, "utf-8"); +} + +function readMd(): string { + return fs.readFileSync(RETRO_MD, "utf-8"); +} + +describe("#1834 retro recommendations — Step 13 snapshot persists them", () => { + test("schema block carries a recommendations array with category + text shape", () => { + const body = readTmpl(); + const schemaStart = body.indexOf("Use the Write tool to save the JSON file with this schema:"); + expect(schemaStart).toBeGreaterThan(-1); + // The recommendations array must live inside the Step 13 JSON schema, not + // somewhere else in the doc. + const schema = body.slice(schemaStart, schemaStart + 2000); + expect(schema).toMatch(/"recommendations"\s*:\s*\[/); + expect(schema).toMatch(/"category"\s*:/); + expect(schema).toMatch(/"text"\s*:/); + }); + + test("recommendations are mandatory, not optional like greptile/backlog/test_health", () => { + const body = readTmpl(); + expect(body).toMatch(/\*\*Always include the `recommendations` array\.\*\*/); + }); + + test("Step 13 ties the array to the 3 Things to Improve items", () => { + const body = readTmpl(); + const anchor = body.indexOf("**Always include the `recommendations` array.**"); + expect(anchor).toBeGreaterThan(-1); + const para = body.slice(anchor, anchor + 700); + expect(para).toMatch(/3 Things to Improve/); + expect(para).toMatch(/follow-through/i); + }); +}); + +describe("#1834 retro recommendations — Step 12 reads them back and scores follow-through", () => { + test("Step 12 has a Recommendation follow-through block before Step 13", () => { + const body = readTmpl(); + const follow = body.indexOf("**Recommendation follow-through.**"); + const step13 = body.indexOf("### Step 13: Save Retro History"); + expect(follow).toBeGreaterThan(-1); + expect(step13).toBeGreaterThan(-1); + expect(follow).toBeLessThan(step13); + }); + + test("follow-through classifies each prior recommendation addressed/partial/open", () => { + const body = readTmpl(); + const follow = body.indexOf("**Recommendation follow-through.**"); + const block = body.slice(follow, follow + 1400); + expect(block).toMatch(/`addressed`/); + expect(block).toMatch(/`partial`/); + expect(block).toMatch(/`open`/); + }); + + test("follow-through is backward compatible with snapshots that predate the field", () => { + const body = readTmpl(); + const follow = body.indexOf("**Recommendation follow-through.**"); + const block = body.slice(follow, follow + 600); + // Older snapshots have no recommendations array — the block must skip, not crash. + expect(block).toMatch(/if it's absent, skip this block/i); + }); +}); + +describe("#1834 retro recommendations — 3 Things to Improve narrative persists the loop", () => { + test("narrative instructs recording the 3 items into the snapshot array", () => { + const body = readTmpl(); + const heading = body.indexOf("### 3 Things to Improve"); + expect(heading).toBeGreaterThan(-1); + const section = body.slice(heading, heading + 900); + expect(section).toMatch(/recommendations` array/); + }); +}); + +describe("#1834 retro recommendations — regenerated SKILL.md carries the loop", () => { + test("generated SKILL.md is not stale relative to the template", () => { + const md = readMd(); + expect(md).toMatch(/\*\*Always include the `recommendations` array\.\*\*/); + expect(md).toMatch(/\*\*Recommendation follow-through\.\*\*/); + expect(md).toMatch(/"recommendations"\s*:\s*\[/); + }); +});