feat(qa,qa-only): add `--evidence-per-finding` evidence layout by itstimwhite · Pull Request #1484 · garrytan/gstack

itstimwhite · 2026-05-14T03:16:52Z

Why

The default flat `screenshots/issue-001-step-1.png` layout works well for 1-5 findings but gets noisy past that, and it's a chore to hand a single finding to a developer — you have to pluck their three screenshots out of a shared bucket and explain which lines of the report apply.

Companion PR #1483 adds `$B record` (video evidence at the browse layer). This PR adds the QA-side complement so that when an interactive bug is captured on video, the `.webm` lives next to the rest of the finding's evidence in its own folder.

The shape mirrors the report-folder pattern that ships in Vercel Labs' `agent-browser` `dogfood` skill — that skill exists as report-only; ours integrates the same evidence shape into both the report-only `/qa-only` and the fix-loop `/qa`, with our existing severity/health-score model on top.

What

Opt-in flag `--evidence-per-finding` (or natural-language: `evidence per finding`, `one folder per bug`). Writes one self-contained folder per finding under a per-run report dir:

```
.gstack/qa-reports/qa-report-{domain}-{date}/
├── REPORT.md
├── findings/
│ ├── 001-critical-checkout-500-on-submit/
│ │ ├── finding.md # severity, repro, env, expected/actual
│ │ ├── step-1.png
│ │ ├── step-2.png
│ │ ├── result.png
│ │ └── repro.webm # OPTIONAL — present iff `$B record` was active
│ └── 002-high-search-no-results/
│ └── ...
└── baseline.json
```

`finding.md` schema (defined in the shared methodology):

Severity / Category / Page / Detected
What's wrong (one paragraph)
Repro steps (referencing the step-N.png files)
Expected vs actual
Environment (browser, viewport, auth)
Evidence file index

When to use it (also defined in the methodology, so the LLM picks correctly):

Run produces ≥5 findings
Any critical or high severity finding
An interactive bug has video evidence
Findings handed off as Linear/Jira tickets (folder zips to one attachment)

When NOT to use it: quick smoke runs, 1-2 findings, regression-mode reruns. The default flat layout stays exactly as it was.

Implementation note (shape, not size)

Per gstack's prompt-size guidance, the structure and `finding.md` template go through the shared `generateQAMethodology()` resolver (loaded into both /qa and /qa-only via `{{QA_METHODOLOGY}}`). Each leaf `SKILL.md.tmpl` only adds one new Setup-table row and a one-line pointer in the Output Structure section. No copy-paste across leaves.

Commits are bisect-friendly per the contributor guide:

`feat(qa,qa-only): add --evidence-per-finding evidence layout` — resolver + both .tmpl source files (78 insertions).
`docs(qa,qa-only): regenerate SKILL.md` — `bun run gen:skill-docs --host all` output (146 insertions, generated).

Verified

`bun test test/gen-skill-docs.test.ts test/skill-validation.test.ts` — 712 pass, 0 fail. No regressions in resolver or template validation.
`bun run skill:check` — all 10 host-output freshness checks green (one pre-existing `claude/SKILL.md` missing-generated warning reproduces on `main` without these changes; unrelated).
Spot-checked rendered output:
- `qa/SKILL.md` and `qa-only/SKILL.md` both contain the new Setup row, the Document-phase Evidence layout section, and the Output Structure pointer.
- The `{{QA_METHODOLOGY}}` block expands identically in both files.

Out of scope

VERSION bump / CHANGELOG entry — left for the merge so the entry stays in your voice. The flag carries its own discoverability via the Setup table.
Telemetry on which layout users pick. Worth measuring after this ships if you care.
Auto-detection ("pick per-finding when ≥5 findings"). Kept explicit for now — the heuristic is documented in the methodology so the LLM can choose, but the parameter is opt-in.

Pairs with

PR #1483 (`feat(browse): add record command for video evidence of interactive bug repros`). The `record` primitive is what produces the `.webm` that lands in each finding folder. Each PR is reviewable independently; the QA flag works without `record` (the .webm just won't be there).

Adds an opt-in per-finding evidence layout to /qa and /qa-only. When the user passes --evidence-per-finding (or natural-language variants like "evidence per finding" / "one folder per bug"), the run writes one self-contained folder per finding instead of the flat shared-screenshots layout: .gstack/qa-reports/qa-report-{domain}-{date}/ ├── REPORT.md ├── findings/ │ ├── 001-critical-checkout-500-on-submit/ │ │ ├── finding.md (severity, repro, env, expected/actual) │ │ ├── step-1.png │ │ ├── step-2.png │ │ ├── result.png │ │ └── repro.webm (optional — present iff $B record was active) │ └── 002-high-search-no-results/ │ └── ... └── baseline.json The default flat layout is unchanged. When per-finding is the right call (now in the shared methodology): - Run produces ≥5 findings — the flat layout gets noisy past that. - Any finding is critical or high — those tickets travel further and need self-contained evidence. - An interactive bug needs video evidence — pairs with $B record (a separate PR adds the recording primitive at the browse layer). - Findings will be handed off as Linear/Jira tickets — each folder zips into a single attachment. Skip per-finding for quick smoke runs, 1-2 findings, or regression-mode reruns where baseline.json is the canonical artifact. Why a shared resolver: the structure and finding.md template are identical for /qa and /qa-only. Per gstack's "no copy-paste across leaves" prompt-size guidance, the shared content goes through generateQAMethodology() (loaded into both via {{QA_METHODOLOGY}}). Each leaf SKILL.md.tmpl only gets one new Setup-table row and a one-line Output-Structure pointer to the shared section. 712 existing tests in test/gen-skill-docs.test.ts and test/skill-validation.test.ts still pass.

Output of `bun run gen:skill-docs --host all` after the prior commit. Picks up the new Setup-table row + the shared Document-phase Evidence Layout section in both qa/SKILL.md and qa-only/SKILL.md.

Tim White added 2 commits May 13, 2026 20:16

docs(qa,qa-only): regenerate SKILL.md for --evidence-per-finding

07ca46b

Output of `bun run gen:skill-docs --host all` after the prior commit. Picks up the new Setup-table row + the shared Document-phase Evidence Layout section in both qa/SKILL.md and qa-only/SKILL.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qa,qa-only): add `--evidence-per-finding` evidence layout#1484

feat(qa,qa-only): add `--evidence-per-finding` evidence layout#1484
itstimwhite wants to merge 2 commits into
garrytan:mainfrom
itstimwhite:feat/qa-evidence-per-finding

itstimwhite commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

itstimwhite commented May 14, 2026

Why

What

Implementation note (shape, not size)

Verified

Out of scope

Pairs with

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant