Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions retro/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1224,6 +1224,22 @@ Deep sessions: 3 → 5 ↑2

**If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."

**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as:

- `addressed` — clear evidence in commits/files/metrics (cite it),
- `partial` — some movement but not done,
- `open` — no evidence this window.

Surface a **Recommendation follow-through** section and feed the verdict into the narrative:
```
Recommendation follow-through (vs last retro):
[x] testing — E2E fixture reliability pass addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12
[~] security — gitleaks on a schedule partial: pre-commit hook added, no scheduled run yet
[ ] architecture — extract html_generator open
2 of 3 prior recommendations addressed.
```
When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise.

### Step 13: Save Retro History

After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
Expand Down Expand Up @@ -1273,6 +1289,11 @@ Use the Write tool to save the JSON file with this schema:
"version_range": ["1.16.0.0", "1.16.1.0"],
"streak_days": 47,
"tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm",
"recommendations": [
{ "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" },
{ "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" },
{ "category": "architecture", "text": "Extract html_generator rendering into per-format modules" }
],
"greptile": {
"fixes": 3,
"fps": 1,
Expand All @@ -1284,6 +1305,8 @@ Use the Write tool to save the JSON file with this schema:

**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.

**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3.

Include test health data in the JSON when test files exist:
```json
"test_health": {
Expand Down Expand Up @@ -1420,6 +1443,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team
### 3 Things to Improve
Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..."

Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files.

### 3 Habits for Next Week
Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day").

Expand Down
25 changes: 25 additions & 0 deletions retro/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,22 @@ Deep sessions: 3 → 5 ↑2

**If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."

**Recommendation follow-through.** If the most recent prior snapshot has a `recommendations` array (snapshots written before this field existed won't — if it's absent, skip this block silently), check whether this window acted on each one. For every prior recommendation, scan this window's commit subjects (Step 1 git log), changed files, and the metrics you just computed for evidence it was addressed. Classify each as:

- `addressed` — clear evidence in commits/files/metrics (cite it),
- `partial` — some movement but not done,
- `open` — no evidence this window.

Surface a **Recommendation follow-through** section and feed the verdict into the narrative:
```
Recommendation follow-through (vs last retro):
[x] testing — E2E fixture reliability pass addressed: 4 commits under test/fixtures/, flake ratio 1/4 → 0/12
[~] security — gitleaks on a schedule partial: pre-commit hook added, no scheduled run yet
[ ] architecture — extract html_generator open
2 of 3 prior recommendations addressed.
```
When recommendations were acted on, say so explicitly in the narrative ("2 of 3 prior recommendations addressed") instead of attributing the same work to a generic metric like "fix ratio is high." This is the whole point of persisting recommendations: a week spent on retro feedback should read as follow-through, not noise.

### Step 13: Save Retro History

After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
Expand Down Expand Up @@ -480,6 +496,11 @@ Use the Write tool to save the JSON file with this schema:
"version_range": ["1.16.0.0", "1.16.1.0"],
"streak_days": 47,
"tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm",
"recommendations": [
{ "category": "testing", "text": "E2E fixture reliability pass — the auth setup flakes ~1 run in 4" },
{ "category": "security", "text": "Run gitleaks on a schedule, not just pre-commit" },
{ "category": "architecture", "text": "Extract html_generator rendering into per-format modules" }
],
"greptile": {
"fixes": 3,
"fps": 1,
Expand All @@ -491,6 +512,8 @@ Use the Write tool to save the JSON file with this schema:

**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.

**Always include the `recommendations` array.** Populate it with the exact 3 items you write in the "3 Things to Improve" narrative section (Step 14), one object per item. Each object has a one-word lowercase `category` (e.g. `testing`, `security`, `architecture`, `process`, `docs`, `performance`) and a `text` field carrying the actionable suggestion verbatim. This is what the *next* retro reads back in Step 12 to measure follow-through, so write `text` so it can be matched against future commit subjects and changed files — name the concrete artifact (a file, a script, a check), not a vague aspiration. If you genuinely have fewer than 3 improvements, record what you have; never pad with filler just to reach 3.

Include test health data in the JSON when test files exist:
```json
"test_health": {
Expand Down Expand Up @@ -627,6 +650,8 @@ Identify the 3 highest-impact things shipped in the window across the whole team
### 3 Things to Improve
Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..."

Record these exact 3 items into the `recommendations` array of the Step 13 snapshot (one object each, with a one-word `category`). They are not just prose for this run — next week's retro reads them back in Step 12 to measure follow-through, so keep each one concrete enough to match against future commits and changed files.

### 3 Habits for Next Week
Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day").

Expand Down
108 changes: 108 additions & 0 deletions test/regression-1834-retro-recommendations.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
/**
* Regression tests for #1834 — /retro generated "3 Things to Improve" as
* throwaway prose and never persisted them, so the next run had no structured
* record of what it recommended and could not detect follow-through. A week
* spent acting on retro feedback got mischaracterized as a generic metric swing.
*
* The fix lives in retro/SKILL.md.tmpl:
* - Step 13 snapshot schema gains a `recommendations` array (always written).
* - Step 13 prose mandates populating it from the "3 Things to Improve" items.
* - Step 12 reads a prior snapshot's `recommendations` back and classifies each
* as addressed / partial / open, with a backward-compat skip for older
* snapshots that predate the field.
* - The "3 Things to Improve" narrative is wired to the persisted array.
*
* These are static invariants against the template body (and the regenerated
* SKILL.md). They fail the build if any leg of the persist → read-back loop is
* dropped, so the follow-through capability can't silently regress.
*/
import { describe, expect, test } from "bun:test";
import * as fs from "node:fs";
import * as path from "node:path";

const ROOT = path.resolve(import.meta.dir, "..");
const RETRO_TMPL = path.join(ROOT, "retro", "SKILL.md.tmpl");
const RETRO_MD = path.join(ROOT, "retro", "SKILL.md");

function readTmpl(): string {
return fs.readFileSync(RETRO_TMPL, "utf-8");
}

function readMd(): string {
return fs.readFileSync(RETRO_MD, "utf-8");
}

describe("#1834 retro recommendations — Step 13 snapshot persists them", () => {
test("schema block carries a recommendations array with category + text shape", () => {
const body = readTmpl();
const schemaStart = body.indexOf("Use the Write tool to save the JSON file with this schema:");
expect(schemaStart).toBeGreaterThan(-1);
// The recommendations array must live inside the Step 13 JSON schema, not
// somewhere else in the doc.
const schema = body.slice(schemaStart, schemaStart + 2000);
expect(schema).toMatch(/"recommendations"\s*:\s*\[/);
expect(schema).toMatch(/"category"\s*:/);
expect(schema).toMatch(/"text"\s*:/);
});

test("recommendations are mandatory, not optional like greptile/backlog/test_health", () => {
const body = readTmpl();
expect(body).toMatch(/\*\*Always include the `recommendations` array\.\*\*/);
});

test("Step 13 ties the array to the 3 Things to Improve items", () => {
const body = readTmpl();
const anchor = body.indexOf("**Always include the `recommendations` array.**");
expect(anchor).toBeGreaterThan(-1);
const para = body.slice(anchor, anchor + 700);
expect(para).toMatch(/3 Things to Improve/);
expect(para).toMatch(/follow-through/i);
});
});

describe("#1834 retro recommendations — Step 12 reads them back and scores follow-through", () => {
test("Step 12 has a Recommendation follow-through block before Step 13", () => {
const body = readTmpl();
const follow = body.indexOf("**Recommendation follow-through.**");
const step13 = body.indexOf("### Step 13: Save Retro History");
expect(follow).toBeGreaterThan(-1);
expect(step13).toBeGreaterThan(-1);
expect(follow).toBeLessThan(step13);
});

test("follow-through classifies each prior recommendation addressed/partial/open", () => {
const body = readTmpl();
const follow = body.indexOf("**Recommendation follow-through.**");
const block = body.slice(follow, follow + 1400);
expect(block).toMatch(/`addressed`/);
expect(block).toMatch(/`partial`/);
expect(block).toMatch(/`open`/);
});

test("follow-through is backward compatible with snapshots that predate the field", () => {
const body = readTmpl();
const follow = body.indexOf("**Recommendation follow-through.**");
const block = body.slice(follow, follow + 600);
// Older snapshots have no recommendations array — the block must skip, not crash.
expect(block).toMatch(/if it's absent, skip this block/i);
});
});

describe("#1834 retro recommendations — 3 Things to Improve narrative persists the loop", () => {
test("narrative instructs recording the 3 items into the snapshot array", () => {
const body = readTmpl();
const heading = body.indexOf("### 3 Things to Improve");
expect(heading).toBeGreaterThan(-1);
const section = body.slice(heading, heading + 900);
expect(section).toMatch(/recommendations` array/);
});
});

describe("#1834 retro recommendations — regenerated SKILL.md carries the loop", () => {
test("generated SKILL.md is not stale relative to the template", () => {
const md = readMd();
expect(md).toMatch(/\*\*Always include the `recommendations` array\.\*\*/);
expect(md).toMatch(/\*\*Recommendation follow-through\.\*\*/);
expect(md).toMatch(/"recommendations"\s*:\s*\[/);
});
});
Loading