feat(qa,qa-only): add --evidence-per-finding evidence layout#1484
Open
itstimwhite wants to merge 2 commits into
Open
feat(qa,qa-only): add --evidence-per-finding evidence layout#1484itstimwhite wants to merge 2 commits into
--evidence-per-finding evidence layout#1484itstimwhite wants to merge 2 commits into
Conversation
added 2 commits
May 13, 2026 20:16
Adds an opt-in per-finding evidence layout to /qa and /qa-only. When the
user passes --evidence-per-finding (or natural-language variants like
"evidence per finding" / "one folder per bug"), the run writes one
self-contained folder per finding instead of the flat shared-screenshots
layout:
.gstack/qa-reports/qa-report-{domain}-{date}/
├── REPORT.md
├── findings/
│ ├── 001-critical-checkout-500-on-submit/
│ │ ├── finding.md (severity, repro, env, expected/actual)
│ │ ├── step-1.png
│ │ ├── step-2.png
│ │ ├── result.png
│ │ └── repro.webm (optional — present iff $B record was active)
│ └── 002-high-search-no-results/
│ └── ...
└── baseline.json
The default flat layout is unchanged.
When per-finding is the right call (now in the shared methodology):
- Run produces ≥5 findings — the flat layout gets noisy past that.
- Any finding is critical or high — those tickets travel further and need
self-contained evidence.
- An interactive bug needs video evidence — pairs with $B record (a
separate PR adds the recording primitive at the browse layer).
- Findings will be handed off as Linear/Jira tickets — each folder zips
into a single attachment.
Skip per-finding for quick smoke runs, 1-2 findings, or regression-mode
reruns where baseline.json is the canonical artifact.
Why a shared resolver: the structure and finding.md template are
identical for /qa and /qa-only. Per gstack's "no copy-paste across
leaves" prompt-size guidance, the shared content goes through
generateQAMethodology() (loaded into both via {{QA_METHODOLOGY}}). Each
leaf SKILL.md.tmpl only gets one new Setup-table row and a one-line
Output-Structure pointer to the shared section.
712 existing tests in test/gen-skill-docs.test.ts and
test/skill-validation.test.ts still pass.
Output of `bun run gen:skill-docs --host all` after the prior commit. Picks up the new Setup-table row + the shared Document-phase Evidence Layout section in both qa/SKILL.md and qa-only/SKILL.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The default flat `screenshots/issue-001-step-1.png` layout works well for 1-5 findings but gets noisy past that, and it's a chore to hand a single finding to a developer — you have to pluck their three screenshots out of a shared bucket and explain which lines of the report apply.
Companion PR #1483 adds `$B record` (video evidence at the browse layer). This PR adds the QA-side complement so that when an interactive bug is captured on video, the `.webm` lives next to the rest of the finding's evidence in its own folder.
The shape mirrors the report-folder pattern that ships in Vercel Labs' `agent-browser` `dogfood` skill — that skill exists as report-only; ours integrates the same evidence shape into both the report-only `/qa-only` and the fix-loop `/qa`, with our existing severity/health-score model on top.
What
Opt-in flag `--evidence-per-finding` (or natural-language: `evidence per finding`, `one folder per bug`). Writes one self-contained folder per finding under a per-run report dir:
```
.gstack/qa-reports/qa-report-{domain}-{date}/
├── REPORT.md
├── findings/
│ ├── 001-critical-checkout-500-on-submit/
│ │ ├── finding.md # severity, repro, env, expected/actual
│ │ ├── step-1.png
│ │ ├── step-2.png
│ │ ├── result.png
│ │ └── repro.webm # OPTIONAL — present iff `$B record` was active
│ └── 002-high-search-no-results/
│ └── ...
└── baseline.json
```
`finding.md` schema (defined in the shared methodology):
When to use it (also defined in the methodology, so the LLM picks correctly):
When NOT to use it: quick smoke runs, 1-2 findings, regression-mode reruns. The default flat layout stays exactly as it was.
Implementation note (shape, not size)
Per gstack's prompt-size guidance, the structure and `finding.md` template go through the shared `generateQAMethodology()` resolver (loaded into both /qa and /qa-only via `{{QA_METHODOLOGY}}`). Each leaf `SKILL.md.tmpl` only adds one new Setup-table row and a one-line pointer in the Output Structure section. No copy-paste across leaves.
Commits are bisect-friendly per the contributor guide:
Verified
Out of scope
Pairs with
PR #1483 (`feat(browse): add record command for video evidence of interactive bug repros`). The `record` primitive is what produces the `.webm` that lands in each finding folder. Each PR is reviewable independently; the QA flag works without `record` (the .webm just won't be there).