garrytan · jbetala7 · Jun 3, 2026 · Jun 3, 2026
diff --git a/retro/SKILL.md b/retro/SKILL.md
@@ -1224,6 +1224,22 @@ Deep sessions:      3      →    5           ↑2
 
 **If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."
 
+**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as:
+
+- `addressed` — clear evidence in commits/files/metrics (cite it),
+- `partial` — some movement but not done,
+- `open` — no evidence this window.
+
+Surface a **Recommendation follow-through** section and feed the verdict into the narrative:
+```
+Recommendation follow-through (vs last retro):
+  [x] testing — E2E fixture reliability pass        addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12
+  [~] security — gitleaks on a schedule             partial: pre-commit hook added, no scheduled run yet
+  [ ] architecture — extract html_generator         open
+2 of 3 prior recommendations addressed.
+```
+When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise.
+
 ### Step 13: Save Retro History
 
 After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
@@ -1273,6 +1289,11 @@ Use the Write tool to save the JSON file with this schema:
   "version_range": ["1.16.0.0", "1.16.1.0"],
   "streak_days": 47,
   "tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm",
+  "recommendations": [
+    { "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" },
+    { "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" },
+    { "category": "architecture", "text": "Extract html_generator rendering into per-format modules" }
+  ],
   "greptile": {
     "fixes": 3,
     "fps": 1,
@@ -1284,6 +1305,8 @@ Use the Write tool to save the JSON file with this schema:
 
 **Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.
 
+**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3.
+
 Include test health data in the JSON when test files exist:
 ```json
   "test_health": {
@@ -1420,6 +1443,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team
 ### 3 Things to Improve
 Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..."
 
+Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files.
+
 ### 3 Habits for Next Week
 Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day").
 

diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl
@@ -431,6 +431,22 @@ Deep sessions:      3      →    5           ↑2
 
 **If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."
 
+**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as:
+
+- `addressed` — clear evidence in commits/files/metrics (cite it),
+- `partial` — some movement but not done,
+- `open` — no evidence this window.
+
+Surface a **Recommendation follow-through** section and feed the verdict into the narrative:
+```
+Recommendation follow-through (vs last retro):
+  [x] testing — E2E fixture reliability pass        addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12
+  [~] security — gitleaks on a schedule             partial: pre-commit hook added, no scheduled run yet
+  [ ] architecture — extract html_generator         open
+2 of 3 prior recommendations addressed.
+```
+When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise.
+
 ### Step 13: Save Retro History
 
 After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
@@ -480,6 +496,11 @@ Use the Write tool to save the JSON file with this schema:
   "version_range": ["1.16.0.0", "1.16.1.0"],
   "streak_days": 47,
   "tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm",
+  "recommendations": [
+    { "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" },
+    { "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" },
+    { "category": "architecture", "text": "Extract html_generator rendering into per-format modules" }
+  ],
   "greptile": {
     "fixes": 3,
     "fps": 1,
@@ -491,6 +512,8 @@ Use the Write tool to save the JSON file with this schema:
 
 **Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.
 
+**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3.
+
 Include test health data in the JSON when test files exist:
 ```json
   "test_health": {
@@ -627,6 +650,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team
 ### 3 Things to Improve
 Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..."
 
+Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files.
+
 ### 3 Habits for Next Week
 Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day").
 

diff --git a/test/regression-1834-retro-recommendations.test.ts b/test/regression-1834-retro-recommendations.test.ts
@@ -0,0 +1,108 @@
+/**
+ * Regression tests for #1834 — /retro generated "3 Things to Improve" as
+ * throwaway prose and never persisted them, so the next run had no structured
+ * record of what it recommended and could not detect follow-through. A week
+ * spent acting on retro feedback got mischaracterized as a generic metric swing.
+ *
+ * The fix lives in retro/SKILL.md.tmpl:
+ *   - Step 13 snapshot schema gains a `recommendations` array (always written).
+ *   - Step 13 prose mandates populating it from the "3 Things to Improve" items.
+ *   - Step 12 reads a prior snapshot's `recommendations` back and classifies each
+ *     as addressed / partial / open, with a backward-compat skip for older
+ *     snapshots that predate the field.
+ *   - The "3 Things to Improve" narrative is wired to the persisted array.
+ *
+ * These are static invariants against the template body (and the regenerated
+ * SKILL.md). They fail the build if any leg of the persist → read-back loop is
+ * dropped, so the follow-through capability can't silently regress.
+ */
+import { describe, expect, test } from "bun:test";
+import * as fs from "node:fs";
+import * as path from "node:path";
+
+const ROOT = path.resolve(import.meta.dir, "..");
+const RETRO_TMPL = path.join(ROOT, "retro", "SKILL.md.tmpl");
+const RETRO_MD = path.join(ROOT, "retro", "SKILL.md");
+
+function readTmpl(): string {
+  return fs.readFileSync(RETRO_TMPL, "utf-8");
+}
+
+function readMd(): string {
+  return fs.readFileSync(RETRO_MD, "utf-8");
+}
+
+describe("#1834 retro recommendations — Step 13 snapshot persists them", () => {
+  test("schema block carries a recommendations array with category + text shape", () => {
+    const body = readTmpl();
+    const schemaStart = body.indexOf("Use the Write tool to save the JSON file with this schema:");
+    expect(schemaStart).toBeGreaterThan(-1);
+    // The recommendations array must live inside the Step 13 JSON schema, not
+    // somewhere else in the doc.
+    const schema = body.slice(schemaStart, schemaStart + 2000);
+    expect(schema).toMatch(/"recommendations"\s*:\s*\[/);
+    expect(schema).toMatch(/"category"\s*:/);
+    expect(schema).toMatch(/"text"\s*:/);
+  });
+
+  test("recommendations are mandatory, not optional like greptile/backlog/test_health", () => {
+    const body = readTmpl();
+    expect(body).toMatch(/\*\*Always include the `recommendations` array\.\*\*/);
+  });
+
+  test("Step 13 ties the array to the 3 Things to Improve items", () => {
+    const body = readTmpl();
+    const anchor = body.indexOf("**Always include the `recommendations` array.**");
+    expect(anchor).toBeGreaterThan(-1);
+    const para = body.slice(anchor, anchor + 700);
+    expect(para).toMatch(/3 Things to Improve/);
+    expect(para).toMatch(/follow-through/i);
+  });
+});
+
+describe("#1834 retro recommendations — Step 12 reads them back and scores follow-through", () => {
+  test("Step 12 has a Recommendation follow-through block before Step 13", () => {
+    const body = readTmpl();
+    const follow = body.indexOf("**Recommendation follow-through.**");
+    const step13 = body.indexOf("### Step 13: Save Retro History");
+    expect(follow).toBeGreaterThan(-1);
+    expect(step13).toBeGreaterThan(-1);
+    expect(follow).toBeLessThan(step13);
+  });
+
+  test("follow-through classifies each prior recommendation addressed/partial/open", () => {
+    const body = readTmpl();
+    const follow = body.indexOf("**Recommendation follow-through.**");
+    const block = body.slice(follow, follow + 1400);
+    expect(block).toMatch(/`addressed`/);
+    expect(block).toMatch(/`partial`/);
+    expect(block).toMatch(/`open`/);
+  });
+
+  test("follow-through is backward compatible with snapshots that predate the field", () => {
+    const body = readTmpl();
+    const follow = body.indexOf("**Recommendation follow-through.**");
+    const block = body.slice(follow, follow + 600);
+    // Older snapshots have no recommendations array — the block must skip, not crash.
+    expect(block).toMatch(/if it's absent, skip this block/i);
+  });
+});
+
+describe("#1834 retro recommendations — 3 Things to Improve narrative persists the loop", () => {
+  test("narrative instructs recording the 3 items into the snapshot array", () => {
+    const body = readTmpl();
+    const heading = body.indexOf("### 3 Things to Improve");
+    expect(heading).toBeGreaterThan(-1);
+    const section = body.slice(heading, heading + 900);
+    expect(section).toMatch(/recommendations` array/);
+  });
+});
+
+describe("#1834 retro recommendations — regenerated SKILL.md carries the loop", () => {
+  test("generated SKILL.md is not stale relative to the template", () => {
+    const md = readMd();
+    expect(md).toMatch(/\*\*Always include the `recommendations` array\.\*\*/);
+    expect(md).toMatch(/\*\*Recommendation follow-through\.\*\*/);
+    expect(md).toMatch(/"recommendations"\s*:\s*\[/);
+  });
+});