|
16 | 16 | | Pre-commit | yes | Full suite locally; see Pre-commit Caveats | |
17 | 17 | | Code review | yes | Diff-mode of own tool: `diffctx --diff <range>` | |
18 | 18 | | CLI smoke | yes | See CLI Smoke Recipes | |
| 19 | +| Real-world test-repo sweep | yes | MANDATORY every round — see Real-World Test-Repo Sweep | |
19 | 20 | | SonarCloud | no | Project NOT registered on SonarCloud | |
20 | 21 | | autoqa pipeline | no | CLI tool, no HTTP API surface | |
21 | 22 | | K8s logs / ArgoCD | no | No deployment to a cluster | |
@@ -121,6 +122,47 @@ over-selection), not recall misses. Fixing it is a benchmark-validated |
121 | 122 | recalibration coupled to the research paper — track on #65, do not blind-edit |
122 | 123 | edge weights mid-QA. |
123 | 124 |
|
| 125 | +## Real-World Test-Repo Sweep (every QA round, MANDATORY) |
| 126 | + |
| 127 | +`test-repos/` (git-ignored, local-only) holds ~15 real upstream clones across |
| 128 | +many languages — `linux`, `pytorch`, `react-native`, `sentry`, `gitpod`, |
| 129 | +`elasticsearch`, `numpy`, `ocaml`, `llama.cpp`, `onnxruntime`, `libc`, |
| 130 | +`luajit2`, `monitoring-observability`, `builder`, `vision`. `TOANALYZE.md` is |
| 131 | +the curated commit "todo"; the clones also carry live upstream HEADs. This is |
| 132 | +the dogfood that synthetic `yaml_cases` can't replace: real diffs, real over- |
| 133 | +dump, real crashes. |
| 134 | + |
| 135 | +**Every QA round, sweep the test repos against a NEW commit each** (one not |
| 136 | +exercised before — `git -C test-repos/<repo> pull` to fetch fresh upstream |
| 137 | +history, then diff its newest commit). Run the **pipx** binary, not a venv |
| 138 | +shadow: |
| 139 | + |
| 140 | +```bash |
| 141 | +cd test-repos/<repo> |
| 142 | +git pull --ff-only |
| 143 | +/Users/nikolay/.local/bin/diffctx . --diff HEAD~1 # or <prev-tested>..HEAD |
| 144 | +``` |
| 145 | + |
| 146 | +**Per repo, judge:** did it (a) finish without panic / non-zero exit / hang, |
| 147 | +(b) honor the range — `changed_files` matches `git diff HEAD~1 --stat`, not a |
| 148 | +whole-tree dump, (c) avoid gross over-selection (mostly `role: changed` plus |
| 149 | +tight context, not 100+ unrelated files — measure as in Diff-Mode Self-Eat), |
| 150 | +(d) leak no secrets/garbage, (e) return in reasonable time. Any of these |
| 151 | +failing = an issue. |
| 152 | + |
| 153 | +**Stopping condition (the whole point):** sweep repos one at a time **until the |
| 154 | +first issue**. |
| 155 | + |
| 156 | +- On the first issue → `gh issue create -R nikolay-e/diffctx` with a generic, |
| 157 | + reproducible report (repo name, commit SHA / range, observed vs expected, |
| 158 | + token/fragment/file counts; **no sensitive repo contents**) → **stop the |
| 159 | + sweep** and move on to the rest of the QA round. |
| 160 | +- If you get through **all** repos clean → also move on. |
| 161 | + |
| 162 | +Either outcome — **all clean OR ≥1 issue filed** — is a valid completion of |
| 163 | +this step; the QA round proceeds to its remaining items either way. Do not let |
| 164 | +a found issue halt the round, and do not keep sweeping after one is filed. |
| 165 | + |
124 | 166 | ## Local `which diffctx` Trap — diffctx specifics |
125 | 167 |
|
126 | 168 | See `/qa` skill: Packaging QA (`which`-vs-pipx). Concretely: when this project's |
|
0 commit comments