Phase 5 of 7 · Status: 🔬 In progress. 62 Vitest unit/component/integration tests + an axe-core a11y suite, three model-integrity guards, and a Playwright real-browser E2E suite (smoke, share deep-link, structural a11y, keyboard) — all in CI, which now also gates a bundle-size budget and a production-dependency audit. Real-browser a11y gates full WCAG A/AA including color-contrast in both themes. Open: the UAT script is written but not yet run with participants. This document is the test strategy, the current inventory, the acceptance-criteria traceability matrix, and the honest gap list.
Primary references: Build Spec Section 14 (acceptance criteria), the SRS (FR/NFR), and the charter Section 11 (Definition of Done & quality gates).
Goal: prove the tool is correct, transparent, reproducible, and accessible — not merely that it renders. The differentiator of Architecture Advisor is an auditable scoring model, so the model is tested to the number, and the documentation is machine-checked against the implementation.
In scope: the scoring engine, anti-pattern detection, exporters (ADR/report/CSV/JSON/share), i18n completeness, the model↔docs↔config consistency, the four-step UI flow, accessibility, and client-side security/performance.
Out of scope: backend/API/database testing (there is none — the app is pure client-side), load testing of servers (static hosting), and cross-browser matrices beyond evergreen browsers.
A deliberately bottom-heavy pyramid: the model logic is pure and deterministic, so most assurance lives in fast unit tests and the cross-document guards; UI and human-judgement checks sit on top.
| Layer | Tooling | What it protects | State |
|---|---|---|---|
| L0 · Model guards | Node scripts (no deps), CI | The docs, the reference model, and src/config cannot drift apart |
✅ Done |
| L1 · Unit | Vitest | Scoring math, anti-patterns, exporters, i18n | ✅ Done |
| L2 · Component/Integration | Vitest + Testing Library | The 4-step flow, reactivity, override panel + redistribution, radar, command palette, manual & A/B compare overlays | ✅ Mostly done |
| L3 · System / E2E | Playwright (chromium) | Full journeys in a real browser: smoke, share-URL deep-link, structural a11y, keyboard | ✅ Done |
| L4 · Accessibility | vitest-axe + Playwright + @axe-core/playwright |
Names/roles/ARIA (jsdom + real browser), keyboard, and full color-contrast (real browser, both themes) — all automated | ✅ Done |
| L5 · UAT | Scripted scenarios — uat-script.md | Real architects/newcomers confirm usefulness & clarity | ⏳ Script ready (not run) |
| L6 · Security | npm audit --omit=dev in CI (Section 8) |
Client-side injection, storage, dependencies | ✅ Gated in CI |
| L7 · Performance | Bundle-size budget guard in CI (Section 9) | Bundle budget, first paint, instant recompute | ✅ Gated in CI |
62 tests across 11 files, all green:
| File | Cases | Covers |
|---|---|---|
src/lib/scoring.test.ts |
22 | Fixtures A–C, equal-weight fallback, 500 seeded random invariants, requirement scenarios (AC-6/AC-7), contribution reconciliation (FR-REC-4), expert override & lock, all 25 preset targets (SRS Section 5.3), qaFit defaulting |
src/lib/antiPatternEngine.test.ts |
8 | Distributed monolith, premature microservices, and the other rules (Model Data Sheet Section 5) |
src/lib/exports.test.ts |
8 | generateAdr (MADR), generateReport, buildC4, scenario JSON round-trip, share-URL round-trip (AC-14) |
src/App.test.tsx |
4 | Integration: preset & single-factor reactivity (AC-2), language toggle (AC-13), and weight-override redistribution end-to-end |
src/components/RadarPanel.test.tsx |
3 | Component: D1 ranking + single top pick, option toggle, dimension switch (AC-12) |
src/components/SensitivityCard.test.tsx |
3 | Component: flip sentence + robust fallback, max-3 flips (AC-11) |
src/components/QaOverridePanel.test.tsx |
4 | Component: edit → lock, clamp 0–100, unlock, clear-all |
src/components/CommandPalette.test.tsx |
3 | Component: closed renders nothing; filter by query; run on click / Enter |
src/components/overlays.test.tsx |
4 | Component: ManualBook + ScenarioCompare (A/B) — hidden when closed, labelled dialog + close when open |
src/a11y.test.tsx |
2 | Accessibility (AC-15): axe-core WCAG A/AA on the composed app + Expert/override panel — caught & fixed an unlabeled file input |
src/i18n/dict.test.ts |
1 | Dictionary completeness — every key has EN and ID |
Component/integration tests render via a small
src/test/render.tsxhelper that wraps the unit under test in the i18n provider with the language pinned. (Vitest runs withcss: false, so assertions target accessible names, roles, and unique strings — not CSS-drivenguided/expertvisibility.)
| Guard | Asserts |
|---|---|
scripts/verify-model.mjs |
The reference model reproduces the math, the fixtures, and all 25 preset targets |
scripts/cross-check-docs.mjs |
The docs agree with each other and with the prototype (qaFit vectors, influence matrix, presets, option names, EN/ID parity) — 12 checks |
scripts/check-app-config.mjs |
src/config/* mirrors the documented model (no app↔doc drift) |
Real-browser journeys against the dev server at the /architecture-advisor/ sub-path. 6 pass
(all gating):
| Spec | Covers |
|---|---|
e2e/smoke.spec.ts |
The 4-step flow loads; a preset recomputes the recommendation (AC-2); the primary export downloads a .md (MADR) |
e2e/share.spec.ts |
AC-14 end to end: Share copies a #s=… deep link to the clipboard; opening it restores the exact recommendation |
e2e/a11y.spec.ts |
Full WCAG A/AA incl. color-contrast (axe, real engine) in Guided/dark + Expert/light + override panel; keyboard operability |
ci.yml— on push/PR:check-app-config→lint→test→build→size(bundle budget, L7) →audit:prod(production-dependency audit, L6).e2e.yml— installs the chromium browser and runstest:e2e(L3).docs-integrity.yml— runsverify-model+cross-check-docson doc/model changes.deploy.yml— build + publish to GitHub Pages onmain.
The model guards are intentionally dependency-free Node scripts so they run identically on a laptop and in CI, and never rot behind a test framework upgrade.
Each criterion maps to its verification method. Automated = covered by a test/guard that fails the build on regression; Manual = on the release checklist (Section 6) until L2–L4 land.
| # | Acceptance criterion (abridged) | Verified by | Status |
|---|---|---|---|
| 1 | install/dev/test/build clean; CI present | ci.yml |
✅ Automated |
| 2 | Any factor change instantly updates weights/rankings/charts/analyses | App.test (preset + single factor → verdict recomputes) |
✅ Automated |
| 3 | Defaults → D1 top Monolith; timeToMarket highest |
scoring.test Fixture A |
✅ Automated |
| 4 | team2/dist2/scale2/devops2/ttm0 → D1 top Microservices | scoring.test Fixture B |
✅ Automated |
| 5 | domain2/team0/ttm0 → Modular Monolith; D4 Hexagonal/Clean = 5.0 | scoring.test Fixture C |
✅ Automated |
| 6 | async2/realtime2 → D2 Event-driven/Streaming; scalability+perf lead | scoring.test (AC-6) |
✅ Automated |
| 7 | consistency2 → dataConsistency dominates; D3 Single shared DB |
scoring.test (AC-7) |
✅ Automated |
| 8 | Microservices + Single shared DB → distributed monolith warning | antiPatternEngine.test |
✅ Automated |
| 9 | team0/devops0 + Microservices → premature microservices warning | antiPatternEngine.test |
✅ Automated |
| 10 | Contribution table reconciles exactly to the composite | scoring.test (FR-REC-4) |
✅ Automated |
| 11 | Sensitivity names a flipping factor or correctly says "robust" | SensitivityCard.test |
✅ Automated |
| 12 | Radar overlays top options; compare 2–3 options | RadarPanel.test (toggle + dimension switch) |
✅ Automated |
| 13 | Language toggle updates all strings; dark mode fully styled | dict.test (keys) + App.test (toggle); dark mode manual |
🟡 Partial |
| 14 | Share link round-trips; Export ADR = valid MADR | exports.test + e2e/share.spec (deep-link) + e2e/smoke.spec (ADR download) |
✅ Automated |
| 15 | Keyboard-operable; accessible names; AA contrast both themes | a11y.test + e2e/a11y.spec (axe incl. color-contrast + keyboard, both themes, real browser) |
✅ Automated |
| 16 | Every QA/factor/option/rule/template in config + documented | check-app-config + cross-check-docs |
✅ Automated |
Summary: 15/16 fully automated, 1 partial (AC-13 — dark-mode styling completeness is still eyeballed; the dark theme is otherwise axe-clean). Nothing fully manual.
npm run test # Vitest unit/component/a11y suite (watch: npm run test:watch)
node scripts/verify-model.mjs # model math + fixtures + 25 preset targets
node scripts/cross-check-docs.mjs # docs agree with each other + the prototype
node scripts/check-app-config.mjs # src/config mirrors the documented model
npm run lint && npm run build # types + lint + production build
npm run size # bundle-size budget (after build) — L7
npm run audit:prod # production-dependency audit (high/critical) — L6
# E2E (real browser) — one-time browser download, then run:
npx playwright install chromium
npm run test:e2e # smoke, share deep-link, structural a11y, keyboard — L3CI runs the equivalent on every push/PR (ci.yml + e2e.yml + docs-integrity.yml); a green
checkmark is the merge gate, and deploy.yml re-runs the unit tests + build before publishing.
Run before tagging a release, in both light and dark themes and at 360 px width:
- Change a factor → weights, rankings, radar, and analyses update with no reload (AC-2).
- Open the sensitivity card → it names a flipping factor or says "robust," and is correct (AC-11).
- Toggle radar options and the dimension selector → overlays and ranking update (AC-12).
- Switch EN↔ID → no untranslated string anywhere; switch theme → everything styled (AC-13).
- Tab through the whole flow → every control reachable, visible focus, sensible order (AC-15).
- Export ADR / report / CSV / JSON, Print/PDF, Share link, Import setup → all succeed (AC-14).
-
npm run buildsize is within budget (Section 9); no console errors on load.
Accessibility — WCAG 2.1 AA (NFR + AC-15), automated: names/roles/ARIA via vitest-axe
(axe-core) in src/a11y.test.tsx (jsdom), plus full color-contrast +
keyboard in a real browser via Playwright + @axe-core/playwright in
e2e/a11y.spec.ts, across both themes. The axe run caught and fixed an
unlabeled file input; the contrast pass drove the tertiary-token, light-success-green, and
dimmed-opacity (off chips / hidden rows) adjustments to clear AA.
- Keyboard: every interactive control operable, logical tab order, visible focus (already styled
via
:focus-visibleinindex.css), no traps; overlays (palette, manual) are escapable. - Names/roles: segmented controls use
role="radiogroup"/radio; icon-only buttons havearia-label; the radar<svg>hasrole="img"+ label. - Contrast: AA for text in both themes (design tokens chosen for this; spot-check after token edits).
UAT — scripted scenarios for two personas, ≥3 participants each, success = task completed unaided + self-reported clarity ≥4/5:
- Newcomer (Guided): "You're building a small internal tool — what should you use and why?" Expectations: reaches a recommendation, can explain the top driver in their own words.
- Architect (Expert): "Justify a Modular Monolith over Microservices for a regulated, high-scale product." Expectations: uses the contribution bars + sensitivity + close-call, exports an ADR.
Pure client-side, no backend/accounts/secrets — the surface is the browser and the dependencies.
- Injection: all user/derived text rendered via React (escaped). Audit any
dangerouslySet…/innerHTML(the prototype mockup usesinnerHTMLwith non-user template strings only; the React app must not interpolate user input into HTML). - Persisted/URL state:
localStorage+ URL-hash state is validated on read (corrupt/stale snapshots are treated as empty — see theisScenario()guard) so a hostile hash can't crash or mislead the app. - Dependencies: CI-gated —
npm run audit:prod(npm audit --omit=dev --audit-level=high) runs after the build; production deps (React + fontsource + tabler icons) report 0 vulnerabilities. The known dev-only Vite/esbuild dev-server advisory does not ship and is excluded; track it and bump Vite/Vitest when a non-breaking fix lands. - Supply chain:
package-lock.jsoncommitted; CI usesnpm ci.
- Bundle budget: CI-gated —
npm run size(scripts/check-bundle-size.mjs) asserts gzip JS ≤120kB / CSS ≤25kB (currently ~110 / ~19 with React 19). No chart/diagram library ships — all visuals are hand-built SVG (see DECISIONS.md). - Recompute: changing a factor recomputes the full model synchronously (pure functions, no async) — perceptibly instant; verified by the 500-iteration invariant test running in ms.
- First paint: dark theme applied pre-paint (inline script); fonts
font-display: swap. - No layout thrash: SVGs are static markup, not re-laid-out per frame.
| Gap | Impact | Plan |
|---|---|---|
| UAT not yet executed (L5) | Real-user clarity unproven | Run uat-script.md with ≥3 per persona before v1.1 |
| Component/integration (L2) is broad but not exhaustive | A few minor affordances still ride the release checklist | Add cases opportunistically as components change |
A change is "done" when: the unit suite and all three guards pass; lint and build are clean;
new model/config/doc changes keep cross-check-docs and check-app-config green; any new UI
affordance is covered by the release checklist (Section 6); and no acceptance criterion regresses.
| Version | Date | Notes |
|---|---|---|
| 0.1 | 2026-06-18 | Initial test plan: strategy, current inventory (39 tests + 3 guards + CI), AC traceability, manual checklist, security/perf/UAT/a11y approach, and the L2–L7 gap roadmap. |
| 0.2 | 2026-06-20 | First L2 component/integration tests landed (9 cases via src/test/render.tsx): App reactivity (AC-2) + language (AC-13), RadarPanel (AC-12), SensitivityCard (AC-11). Inventory 39→48; automated AC 11→14 of 16. |
| 0.3 | 2026-06-20 | L4 accessibility automated with vitest-axe (axe-core), WCAG A/AA, on the composed app + Expert/override panel (src/a11y.test.tsx, 2 cases) — caught & fixed an unlabeled file input (Toolbar). Inventory 48→50; AC-15 manual→partial (contrast/keyboard still manual). |
| 0.4 | 2026-06-20 | Extended L2 to the override panel + redistribution, command palette, and the manual / A/B-compare overlays. Inventory 50→62. |
| 0.5 | 2026-06-20 | L3 E2E (Playwright: smoke, share deep-link, structural a11y, keyboard) + real-browser keyboard for AC-15; L6/L7 gated in CI (audit:prod, bundle-size budget); L5 UAT script added (uat-script.md). Full color-contrast AA tracked as a test.fixme. |
| 0.6 | 2026-06-20 | Full color-contrast AA remediated and gated in the real browser (tertiary tokens, light success green, off-chip/hidden-row opacity) — e2e/a11y.spec now asserts color-contrast in both themes (no fixme); L4 ✅; AC-15 ✅ automated (15/16 AC automated). |