|
| 1 | +# twd-cli Test Summary Output — Design Spec |
| 2 | + |
| 3 | +**Date:** 2026-05-20 |
| 4 | +**Status:** Proposed |
| 5 | + |
| 6 | +## Purpose |
| 7 | + |
| 8 | +Make the final output of `twd-cli run` self-describing: at a glance, a developer (or an AI agent piping the output through `grep`) should be able to tell **how many tests passed, how many failed, how many were skipped** — without parsing per-test lines or running the suite again. |
| 9 | + |
| 10 | +Today the run ends with a mock-validation summary like: |
| 11 | + |
| 12 | +``` |
| 13 | +Mocks validated: 128 | Errors: 7 | Warnings: 0 | Skipped: 80 |
| 14 | +``` |
| 15 | + |
| 16 | +That line is about *mocks*, not *tests*. There is no equivalent line for test results. Users reading the tail of the log have to scroll back and visually count `✓ should ...` lines, and they may confuse the yellow `✗ … mock "fetchCart"` contract-warning lines with failing tests (same glyph, similar position). |
| 17 | + |
| 18 | +## Problem (real session) |
| 19 | + |
| 20 | +While running a long suite headless via `npm run test:ci`, the consuming agent re-ran the suite ~5 times trying to confirm "did all tests pass?" because: |
| 21 | + |
| 22 | +1. No final `Tests: N passed, M failed, K skipped` line exists. |
| 23 | +2. The yellow `✗` glyph used for *mock contract validation failures* looks identical to a failed test marker. |
| 24 | +3. ANSI color codes broke naive `grep "✓ should"` patterns, so attempts to count from the log returned 0. |
| 25 | + |
| 26 | +Each re-run was ~1:23, so the cost of "I can't tell if it passed" was ~7 minutes of wall time. |
| 27 | + |
| 28 | +## Scope |
| 29 | + |
| 30 | +**In scope:** |
| 31 | +- A final, single-line test summary printed after all tests complete. |
| 32 | +- Visual disambiguation between *test result* lines and *mock contract validation* lines. |
| 33 | +- A machine-friendly summary line (stable format, easy to grep without ANSI gymnastics). |
| 34 | + |
| 35 | +**Out of scope:** |
| 36 | +- Changing the per-test output format itself. |
| 37 | +- Reworking the mock-validation summary line (the line that exists today is fine — it just needs to not be the *only* summary). |
| 38 | +- A `--summary` / quiet reporter mode — deferred to a follow-up. |
| 39 | +- JUnit XML / JSON reporter output — deferred to a follow-up. |
| 40 | + |
| 41 | +## Proposed Solution |
| 42 | + |
| 43 | +### 1. Add a final test summary line |
| 44 | + |
| 45 | +After all tests finish (and after the mock-validation summary), print: |
| 46 | + |
| 47 | +``` |
| 48 | +Tests: 74 passed, 0 failed, 0 skipped (74 total) in 1:23.193 |
| 49 | +``` |
| 50 | + |
| 51 | +Format requirements: |
| 52 | +- One line. |
| 53 | +- Stable label `Tests:` at the start so it's grep-friendly. |
| 54 | +- Colors only on the count digits (green for passed, red for failed if > 0, yellow for skipped if > 0). The label `Tests:` and the words `passed` / `failed` / `skipped` stay uncolored so `grep "^Tests:"` works regardless of ANSI handling. |
| 55 | +- Duration in the same `m:ss.SSS` format the runner shows today. |
| 56 | + |
| 57 | +**Duration source.** Today `src/index.js` uses `console.time('Total Test Time')` / `console.timeEnd(...)` to print `Total Test Time: 1:23.193` as its own line. That call's output is not capturable as a value. Replace it with a manual `Date.now()` delta captured around the same span (start before `page.goto`, end after `runner.runAll()` returns), formatted to the same `m:ss.SSS` string. The standalone `Total Test Time:` line is removed; the duration appears only on the `Tests:` line. This keeps the log to one canonical timing line. |
| 58 | + |
| 59 | +When there are failures, also print a `Failed tests:` block with just the test names (no stack traces — those already appear inline above), so the developer can see the names at the end of the log without scrolling. |
| 60 | + |
| 61 | +### 2. Disambiguate mock-validation lines from test result lines |
| 62 | + |
| 63 | +The current mock contract output (`src/contractReport.js`) uses `✓` for passing mocks, `✗` for failing ones, and `⚠` for warnings. The `✗` glyph collides visually with the `✗` used for failed tests in the suite tree printed by `reportResults` (`twd-js/runner-ci`). Color helps in warn-mode contract failures (yellow) but not in error-mode (red — same as test failures), and color is fragile under `grep`/CI log viewers. |
| 64 | + |
| 65 | +**Decision:** add a `MOCK ` prefix to every line that comes out of `contractReport.js`. The existing glyph assignments stay (`✓` pass, `✗` fail, `⚠` warning) — they are correct *within* the contract report; the prefix is what distinguishes contract lines from test-result lines. |
| 66 | + |
| 67 | +Example before: |
| 68 | +``` |
| 69 | + ✗ GET /v1/carts/{cart_id} (200) — mock "fetchCart" — in "Checkout New — Redis ID Flow > ..." |
| 70 | +``` |
| 71 | + |
| 72 | +Example after: |
| 73 | +``` |
| 74 | + MOCK ✗ GET /v1/carts/{cart_id} (200) — mock "fetchCart" — in "Checkout New — Redis ID Flow > ..." |
| 75 | +``` |
| 76 | + |
| 77 | +Apply the prefix uniformly to all four line kinds the report can emit: pass (`✓`), fail (`✗`), warning (`⚠`), and skipped (`ℹ`). Indentation already exists; the prefix sits between the indentation and the glyph. |
| 78 | + |
| 79 | +## Exit Code Behavior |
| 80 | + |
| 81 | +No change. Exit code already reflects test failures plus `mode: "error"` contract failures (`src/index.js:101,119`). |
| 82 | + |
| 83 | +**Interplay with the `Tests:` line.** The new `Tests:` summary counts test outcomes *only* (pass/fail/skip from `testStatus`). A run can legitimately exit non-zero while `Tests:` reads `0 failed` — that means every test passed but at least one mock failed contract validation in `error` mode. The mock summary line (`Mocks validated: … | Errors: N | …`) and the contract report block above it are the canonical place to see contract failures; the `Tests:` line is not retroactively edited to fold them in. |
| 84 | + |
| 85 | +## Testing Strategy |
| 86 | + |
| 87 | +- Unit test the summary formatter directly: given a `testStatus` array with a known mix (e.g. 3 pass, 1 fail, 1 skip) and a duration value, assert the `Tests:` line matches the expected format. Keep this layer pure (no Puppeteer) so the format is easy to lock down. |
| 88 | +- Unit test the failed-tests block: given a `testStatus` array with two failures and a `handlers` array, assert both names appear under `Failed tests:` in the order the suite produced them. |
| 89 | +- Extend the existing `contractReport.test.js` to assert every emitted line starts with `MOCK ` (after any leading whitespace). Cover all four line kinds: pass, fail, warning, skipped. |
| 90 | +- Verify `grep "^Tests:"` against a raw run (ANSI included) returns exactly one line — i.e. the label is not wrapped in escape sequences. (The count digits themselves may carry color codes; the label must not.) |
| 91 | + |
| 92 | +## Benefits |
| 93 | + |
| 94 | +- **Faster developer feedback:** one line at the end answers "did it pass?" — no scrolling, no counting. |
| 95 | +- **AI-agent friendly:** stable, grep-able summary line. Avoids re-running long suites just to confirm a result. |
| 96 | +- **Less confusion between mocks and tests:** the `MOCK ` prefix removes the "is that a test failure or a mock warning?" question. |
| 97 | + |
| 98 | +## Notes / Open Questions |
| 99 | + |
| 100 | +- Should the failed-test block at the end include the file path + line number for each failure, or just the test name? (Stack traces already appear inline above.) Default for the implementation plan: **just the test name**, mirroring what the per-test line shows. Revisit if it proves too thin. |
| 101 | + |
| 102 | +## Follow-up Work (Out of Scope Here) |
| 103 | + |
| 104 | +- **`--summary` / quiet reporter.** A mode that suppresses per-request mock log lines (which dominate output for large suites) and prints only RUN/PASS/FAIL per test, the `Tests:` line, the mock-validation summary line, and the contract report path. Likely shaped as a `twd.config.json` field (`reporter: "summary"`) for consistency with how other twd-cli behavior is configured, not a CLI flag. |
| 105 | +- **`--json` reporter** for CI dashboards. The summary-line work in this spec makes this trivial later. |
0 commit comments