Skip to content

Commit e150217

Browse files
CreatmanCEOclaude
andcommitted
M3: functional gaps closed (Supabase auth, onboarding overlay, severity annotation)
BLOCKERS resolved (5): - B1 Onboarding overlay handler: seedOnboardingFlags() in auth.setup.ts.tmpl flips localStorage keys matching *-features-discovered, *-onboarding-complete, *-tour-seen, *-hints-seen, *-welcome-dismissed. Override via TEST_ONBOARDING_FLAGS JSON env. Closes Lingua SK-07 (100% spec failure on apps with feature-tour overlays). - B2 Supabase Auth Pattern 1.5: auto-detect SUPABASE_URL+SUPABASE_ANON_KEY, POST to /auth/v1/token?grant_type=password with apikey header, store under sb-<project_ref>-auth-token, poll localStorage 45s (no URL-change assumption). Closes Lingua SK-03. - B3 Doc drift: SKILL.md step 10 now matches generate_report.py --run-dir CLI. - B4 Severity annotation: 3 mechanisms — [severity:S0] inline tag, in title, or // @Severity: S0 comment in spec file. severity_overrides_from_spec_file() parser, --project-root flag on fingerprint_bugs.py. - B5 Spec generation contract: SKILL.md mandates console+network listeners before goto, axe scan, issues[] collector, hard expect at end. Even when Claude writes specs from scratch. IMPORTANT (5): - I1 Anchored regex /^(sign in|log in|войти|вход)$/i in auth template - I2 Tabs-vs-buttons note in playwright-patterns.md - I3 Pydantic, Next.js 15 Turbopack, Supabase realtime, browser-extension, ResizeObserver, AbortError patterns added to triage_console.py + reference - I4 WebSocket DOM-fallback strategy in stack-specific.md - I5 detect_state.py skillDir resolves from __file__ when env unset NICE-TO-HAVE (4): - N1 install.sh MCP preflight (already present, verified) - N2 Image-budget rule wording: distinguish inline returns vs on-disk captures - N3 run_suite.py prints all artefacts + next-step commands at end - N4 New scripts/preflight.py: HEAD check on TEST_BASE_URL, auth env validation 113 tests passing (was 94), ruff + mypy clean. Bump to 0.2.0-beta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent cc034ff commit e150217

16 files changed

Lines changed: 746 additions & 51 deletions

CHANGELOG.md

Lines changed: 67 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,71 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

77
## [Unreleased]
88

9-
### Planned for `0.2.0`
10-
- Supabase Auth pattern (`auth.setup.ts.tmpl` branch on `SUPABASE_URL`)
11-
- Onboarding-overlay state-seeding hook in auth setup template
12-
- Severity annotation: `// @severity: S0` parsing in spec files
13-
- Spec generation contract: console listeners + axe scan + issues collector required
14-
- `--bugs/--diff/--out` accepted as aliases on `generate_report.py`
15-
- Anchored regex in auth template
16-
- Pydantic / Next.js 15 patterns in `console-noise-patterns.md`
9+
### Planned for `0.3.0`
10+
- Vision-classifier auto-loop (`vision_dispatch.py`)
11+
- Console LLM auto-triage (`console_llm_triage.py` with batched subagent)
12+
- Performance / Lighthouse audit script
13+
- Tracker integration CLI (`file_bugs.py --linear / --github / --jira`)
14+
- Regression watchlist mechanism (sticky fixed → escalate on regression)
15+
- Layout integrity assertions (max-width, icon grouping patterns)
16+
17+
## [0.2.0-beta] - 2026-04-29
18+
19+
Functional gaps closed based on dogfooding feedback from two real apps. Pre-OSS
20+
v1 hardening — zero false positives in skill core, 113 passing tests, green CI
21+
on Linux/macOS/Windows.
22+
23+
### Added
24+
- **Supabase Auth Pattern 1.5**`auth.setup.ts.tmpl` auto-detects `SUPABASE_URL`
25+
+ `SUPABASE_ANON_KEY`, hits `/auth/v1/token?grant_type=password` with `apikey`
26+
header, polls localStorage 45s for `sb-<ref>-auth-token` (no assumption of
27+
URL change post-login). Documented in `auth-strategies.md` Pattern 1.
28+
- **Onboarding overlay state-seeding**`seedOnboardingFlags()` helper in
29+
`auth.setup.ts.tmpl` auto-flips `localStorage` keys matching common patterns
30+
(`*-features-discovered`, `*-onboarding-complete`, `*-tour-seen`,
31+
`*-hints-seen`, `*-welcome-dismissed`). Override via `TEST_ONBOARDING_FLAGS`
32+
JSON env var. Without this, apps with feature-tour overlays fail every spec.
33+
- **Severity annotation mechanism** — three ways to override the heuristic:
34+
1. `[severity:S0]` inline tag in `issues.push(...)` lines
35+
2. `[severity:S0]` in spec test name
36+
3. `// @severity: S0` comment preceding `test('...')` in spec file
37+
`fingerprint_bugs.py --project-root` flag controls where to look for spec files.
38+
- **Spec generation contract** in SKILL.md — non-negotiable list of elements
39+
every generated spec MUST contain (console + network listeners before goto,
40+
axe scan, issues[] collector, hard `expect(issues).toEqual([])` at end).
41+
Closes the gap where Claude wrote specs from scratch and silently skipped
42+
the audit features.
43+
- **`scripts/preflight.py`** — quick env + base-URL HEAD check before scaffolding.
44+
Fails fast with actionable hints if `TEST_BASE_URL` is unreachable, auth env
45+
missing, or Supabase key looks malformed.
46+
- **Tabs-vs-buttons reference note** in `playwright-patterns.md` — handles SPAs
47+
with visual tabs but no `role="tab"` (logs as a11y soft finding instead of failing).
48+
- **WebSocket DOM-fallback strategy** in `stack-specific.md` — when WS frames are
49+
binary/encrypted/proprietary, assert on DOM mutations instead of frames.
50+
- **Pydantic, Next.js 15 Turbopack, Supabase realtime, browser-extension,
51+
ResizeObserver, AbortError** patterns added to `console-noise-patterns.md`
52+
default ignore-list.
53+
- **Run artefact summary** at end of `run_suite.py` — prints all generated paths
54+
+ next-step commands so users don't have to `ls reports/`.
55+
56+
### Fixed
57+
- **`generate_report.py` doc drift** — SKILL.md step 10 now matches the actual
58+
`--run-dir` CLI signature.
59+
- **Anchored regex** in auth UI fallback (`/^(sign in|log in|войти|вход)$/i`) —
60+
no longer matches "Sign up" / "Sign in with Google".
61+
- **Skill-dir resolution** in `detect_state.py` — populates `skillDir` from
62+
`__file__` when `CLAUDE_SKILL_DIR` env is unset, instead of returning `null`.
63+
Fixes `Isolation verified: false` mismatch reported by Lingua tester.
64+
- **Image-budget rule wording** in SKILL.md — clarifies that on-disk
65+
auto-captures (which nobody `Read`s) are FREE; the cost is inline returns.
66+
- **Fingerprint regex** for node counts now matches both ASCII `(3x nodes)` and
67+
Unicode `(3× nodes)`.
68+
69+
### Tests
70+
- 19 new tests covering severity overrides, skill-dir resolution, preflight
71+
module. Total: 113 passing across 10 scripts.
72+
73+
## [0.1.0-beta] - 2026-04-29
1774

1875
## [0.1.0-beta] - 2026-04-29
1976

@@ -63,5 +120,6 @@ Initial public beta. Validated end-to-end on a real production app
63120
- macOS / Linux installers untested in CI; help wanted (see
64121
`os-compatibility-report` issue template).
65122

66-
[Unreleased]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.1.0-beta...HEAD
123+
[Unreleased]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.2.0-beta...HEAD
124+
[0.2.0-beta]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.1.0-beta...v0.2.0-beta
67125
[0.1.0-beta]: https://github.com/CreatmanCEO/webtest-orch/releases/tag/v0.1.0-beta

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![CI](https://github.com/CreatmanCEO/webtest-orch/actions/workflows/ci.yml/badge.svg)](https://github.com/CreatmanCEO/webtest-orch/actions/workflows/ci.yml)
44
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
5-
[![Version](https://img.shields.io/badge/version-0.1.0--beta-orange.svg)](./CHANGELOG.md)
5+
[![Version](https://img.shields.io/badge/version-0.2.0--beta-orange.svg)](./CHANGELOG.md)
66
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
77

88
**Universal e2e testing skill for Claude Code.** Заменяет ad-hoc промпты с Playwright MCP на одну переиспользуемую сущность для тестирования любого web-приложения (Next.js, FastAPI, статика, Telegram WebApp, и т.д.).

SKILL.md

Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,25 @@ End-to-end testing orchestrator for web applications. Splits into **first-run ex
2323

2424
## Image budget protection — READ FIRST, MANDATORY
2525

26-
**The problem:** screenshots burn Claude Code's parent-chat **image cap** (~50–100 inline image blocks per session) before they burn its text context. Standalone Playwright MCP usage hits this wall fast. Once hit, the user must `/compact` even at 20% text-context usage.
26+
**The problem:** Claude Code has two independent context limits — text tokens (large)
27+
and **inline-image blocks** (~50–100 per session). Screenshots returned inline burn
28+
the image budget far faster than the text budget; once exhausted, the user must
29+
`/compact` even at 20% text-context usage.
30+
31+
**Distinction that matters:**
32+
-**Inline image returns to parent context** burn the budget. This includes
33+
`browser_take_screenshot` default output (image returned to caller),
34+
`Read` on a `.png/.jpg/.webp/.gif/.bmp/.svg`, markdown report with `![]()` shown
35+
to parent.
36+
-**On-disk artefacts that nobody Reads** are FREE. Playwright's failure
37+
screenshots go to `test-results/`, MCP browser tools may save `.png`s to a
38+
cache dir — none of these cost the parent context UNLESS you `Read` them.
2739

2840
**The hard rule, enforced by you (not by frontmatter):**
2941

30-
> **NEVER call `Playwright:browser_take_screenshot`, `chrome-devtools:take_screenshot`, or `Read` on `.png/.jpg/.webp` files from the parent skill context. ALWAYS dispatch a Task subagent (general-purpose) to do anything that produces or consumes images. Subagent returns ONLY text — paths, descriptions, verdicts.**
42+
> **NEVER return screenshots to the parent skill context. ALWAYS dispatch a Task
43+
> subagent (general-purpose) for anything that produces or consumes images.
44+
> Subagent returns ONLY text — paths, descriptions, verdicts.**
3145
3246
This contract was attempted via `context: fork` frontmatter but Claude Code 2.1.x on Windows does not honor that field, so enforcement is delegated to *you reading these instructions*. Verified empirically 2026-04-28 (sub-agent isolation works; `context: fork` does not parse). See `${CLAUDE_SKILL_DIR}/.isolation-verified`.
3347

@@ -101,12 +115,33 @@ Copy this checklist into TodoWrite at session start; tick as you go.
101115
- First run → minimal critical-path: home + auth + one main flow
102116
- [ ] **4. Dev server up.** `python "${CLAUDE_SKILL_DIR}/scripts/with_server.py" --help`. Use it; **do not read its source unless `--help` doesn't cover the case.**
103117
- [ ] **5a. EXPLORATORY** (BOOTSTRAP / new flow in HYBRID): use **Playwright MCP** with `Playwright:browser_snapshot` (ARIA tree, text). Walk the flow, generate POM in `tests/pages/<Page>.ts`, generate spec in `tests/specs/<flow>.spec.ts`. **Generate locators from ARIA tree refs you actually saw** — do NOT use generic regex like `getByPlaceholder(/john doe|name|имя/i)`, they cause strict-mode violations on first run. Either use exact strings from the snapshot OR add `.first()` explicitly. Run the spec once to confirm green.
118+
119+
**🔴 SPEC GENERATION CONTRACT — non-negotiable.** Even if you skip the template
120+
and write a spec from scratch (when product context is rich), every generated
121+
`*.spec.ts` MUST contain ALL of these:
122+
1. **Console listeners attached BEFORE `page.goto()`**: `consoleErrors[]` from
123+
`page.on('pageerror')` and `page.on('console', m => m.type() === 'error')`.
124+
2. **Network listeners attached BEFORE `page.goto()`**: `failedRequests[]` from
125+
`page.on('response', r => r.status() >= 400 && ...)` and `page.on('requestfailed')`.
126+
3. **`AxeBuilder` scan** with `withTags(['wcag2a','wcag2aa','wcag21aa','wcag22aa'])`
127+
whose violations are pushed into `issues[]`.
128+
4. **`issues[]` collector pattern** — every soft check pushes a structured tag
129+
into `issues` (e.g. `a11y[serious] color-contrast: ...`, `heading-jump: ...`,
130+
`touch-target: ...`, `overflow: ...`, `html-lang: ...`).
131+
5. **Single hard `expect(issues).toEqual([])`** at the end with
132+
`${count} issues found:\n - ...` message format so `run_suite.py` can split
133+
one test failure into one bug record per issue.
134+
6. **Trailing `expect(consoleErrors).toEqual([])` + `expect(failedRequests).toEqual([])`**.
135+
136+
Skip ANY of these and the skill's console-audit / a11y / per-issue fingerprinting
137+
features stop working. Use `templates/spec.ts.tmpl` as the canonical reference —
138+
copy its skeleton into hand-written specs.
104139
- [ ] **5b. REPLAY**: `npx playwright test --reporter=list,json,html`. **No Playwright MCP, no LLM browser actions.**
105140
- [ ] **6. A11y** on each visited page: deterministic `@axe-core/playwright` (in spec) + qualitative checks via nested subagent if alt-text/heading/focus suspect. See `reference/a11y-patterns.md`.
106141
- [ ] **7. Console + network.** Listeners attach BEFORE `page.goto()` (mandatory). Pipe captured logs through `python "${CLAUDE_SKILL_DIR}/scripts/triage_console.py" --help`.
107142
- [ ] **8. Visual.** Default `toHaveScreenshot()` in spec. Diff fired → `python "${CLAUDE_SKILL_DIR}/scripts/visual_diff.py" --classify` spawns nested subagent on each failed image (text verdict only). Argos opt-in via `VISUAL_DIFF=argos`.
108143
- [ ] **9. Fingerprint + diff.** `python "${CLAUDE_SKILL_DIR}/scripts/fingerprint_bugs.py" --current reports/<curr>/raw.json --previous reports/<prev>/bugs.json --out reports/<curr>/bugs.json`.
109-
- [ ] **10. Report.** `python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" --bugs bugs.json --diff diff.json --out reports/<run-id>`. Print absolute path to `index.html`.
144+
- [ ] **10. Report.** `python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" --run-dir reports/<run-id> [--app-name "My App"]`. Reads `bugs.json` + `diff.json` from that dir, writes `report.md` + `index.html` next to them. Print absolute path to `index.html`.
110145

111146
## Decision tree
112147

@@ -155,7 +190,7 @@ detect_state.py → JSON
155190
- `scripts/triage_console.py --input console.json` — noise filter + LLM long tail
156191
- `scripts/visual_diff.py --classify reports/<run-id>` — pixel diff + nested-subagent classification
157192
- `scripts/fingerprint_bugs.py --current curr.json --previous prev.json` — bug dedup + diff
158-
- `scripts/generate_report.py --bugs bugs.json --diff diff.json --out reports/<run-id>` — markdown + html
193+
- `scripts/generate_report.py --run-dir reports/<run-id> [--app-name "X"]` — markdown + html (reads bugs.json/diff.json from --run-dir)
159194
- `scripts/_image_isolation_check.py --verify` — image budget contract self-check
160195

161196
## When to dispatch a Task subagent

reference/auth-strategies.md

Lines changed: 67 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,65 @@ The skill prefers **API-based login** over UI-driven login because API login is
1010

1111
This means **every test runs already authenticated** without re-doing the login flow per test — saves time and reduces flake.
1212

13-
## Pattern 1 — API login + JWT (preferred)
13+
## Pattern 0 — Onboarding overlays (run before any pattern below)
14+
15+
Apps with feature-discovery tours, hint overlays, or "welcome" modals will
16+
**fail every spec** if those overlays intercept the first click. The
17+
`auth.setup.ts.tmpl` template ships a `seedOnboardingFlags()` helper that
18+
runs after authentication and before `storageState` is saved.
19+
20+
It does two things:
21+
22+
1. Auto-flips `localStorage` keys matching common patterns to `"true"`:
23+
`*-features-discovered`, `*-onboarding-complete`, `*-tour-seen`,
24+
`*-hints-seen`, `*-welcome-dismissed`.
25+
2. Reads `TEST_ONBOARDING_FLAGS` env (JSON array of exact key names) and sets
26+
each one to `"true"`.
27+
28+
For app-specific keys not matching default patterns:
29+
30+
```bash
31+
# In .env.test
32+
TEST_ONBOARDING_FLAGS=["myapp-tour-v2","myapp-pricing-banner-dismissed"]
33+
```
34+
35+
Specs that explicitly test the onboarding tour from a fresh state should
36+
clear these flags themselves at test start (`page.evaluate(() => localStorage.clear())`)
37+
before navigating.
38+
39+
## Pattern 1 — Supabase Auth (most common SaaS BaaS)
40+
41+
Detected automatically when `SUPABASE_URL` and `SUPABASE_ANON_KEY` env are set.
42+
Skill skips other auth patterns and uses the Supabase REST endpoint:
43+
44+
```ts
45+
POST {SUPABASE_URL}/auth/v1/token?grant_type=password
46+
Headers: { apikey: SUPABASE_ANON_KEY, Content-Type: application/json }
47+
Body: { email, password }
48+
```
49+
50+
Token is stored in localStorage as `sb-<project_ref>-auth-token` (the
51+
`project_ref` is the subdomain from `SUPABASE_URL`).
52+
53+
**Why polling instead of `Promise.race`:** Supabase clients sometimes rewrite
54+
the storage key on hydration (refresh-token rotation). The template polls
55+
localStorage with a 45-second timeout instead of racing a 15s URL-change
56+
expectation that may never fire (SPA hydrates in place).
57+
58+
```bash
59+
# .env.test for a Supabase app
60+
TEST_BASE_URL=https://your-app.example.com
61+
TEST_USER_EMAIL=qa@example.com
62+
TEST_USER_PASSWORD=...
63+
SUPABASE_URL=https://abcdefgh.supabase.co
64+
SUPABASE_ANON_KEY=eyJhbGc...
65+
```
66+
67+
OAuth providers configured in Supabase (Google, GitHub) are NOT covered by this
68+
pattern — they require browser flow. Use a dedicated test user with email/
69+
password sign-in for CI runs.
70+
71+
## Pattern 2 — API login + JWT (preferred)
1472

1573
For FastAPI, Express, NestJS, Django REST — backends with `/api/auth/login` returning JWT.
1674

@@ -81,11 +139,14 @@ If failures look like 401/403 storms, the auth file is the first thing to refres
81139

82140
```
83141
TEST_BASE_URL # required — e.g. https://your-app.example.com
84-
TEST_USER_EMAIL # required
85-
TEST_USER_PASSWORD # required
86-
TEST_API_LOGIN_PATH # optional — default /api/auth/login
87-
TEST_API_TOKEN_FIELD # optional — default access_token (also tried: token, jwt)
88-
TEST_ADMIN_TOKEN # optional — for Pattern 3
142+
TEST_USER_EMAIL # required (unless public-only site)
143+
TEST_USER_PASSWORD # required (unless public-only site)
144+
SUPABASE_URL # if set → Pattern 1 (Supabase Auth)
145+
SUPABASE_ANON_KEY # required with SUPABASE_URL
146+
TEST_API_LOGIN_PATH # Pattern 2 — default /api/auth/login
147+
TEST_API_TOKEN_FIELD # Pattern 2 — default access_token (also tried: token, jwt)
148+
TEST_ADMIN_TOKEN # Pattern 3 — server-issued admin token (Telegram WebApp etc.)
149+
TEST_ONBOARDING_FLAGS # JSON array of localStorage keys to flip to "true" post-auth
89150
TEST_USER_AGENT_KIND # optional — desktop|mobile|telegram, default desktop
90151
```
91152

reference/console-noise-patterns.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,14 @@ script (or pass `--ignore-extra` from the orchestrator) to add app-specific patt
2828
| `Blocked a frame with origin .* from accessing` | iframe sandbox | Expected in sandboxed iframes |
2929
| `\[HMR\]`, `\[vite\]`, `\[next\]: hot-update` | Hot module reload | Dev-only |
3030
| `Permission denied to access property .* on cross-origin` | Cross-origin frame | Expected |
31+
| `PydanticDeprecatedSince20`, `PydanticUserWarning` | FastAPI / Pydantic v2 backend warnings forwarded to client | Migration nag, not a runtime bug |
32+
| `[Turbopack] compiled\|building\|HMR` | Next.js 15 + Turbopack dev | Compile signals — dev-only |
33+
| `[next-auth][debug]` | next-auth debug mode | Dev-only when `NEXTAUTH_DEBUG=true` |
34+
| `[Fast Refresh]`, `fast-refresh` | Next.js HMR | Dev-only |
35+
| `[Supabase].*realtime`, `supabase.*subscribe` | Supabase realtime client | Channel subscription chatter |
36+
| `chrome-extension://`, `moz-extension://` | Browser extensions | Not the app; ignore |
37+
| `ResizeObserver loop limit exceeded` | Chromium quirk | Browser implementation detail, not the app |
38+
| `AbortError: signal is aborted` | React unmount during fetch | Cleanup race, not user-visible |
3139

3240
## Bug patterns (auto-report with severity)
3341

reference/playwright-patterns.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,28 @@ These poll automatically. Prefer them over manual loops or `waitForTimeout`.
3737

3838
`expect(locator).toBeVisible()` waits up to the test/expect timeout; `locator.isVisible()` returns immediately and is non-retrying — correct for branching, wrong for assertions.
3939

40+
## Tabs vs buttons (common a11y miss)
41+
42+
Many SPAs (Tailwind / headless-UI / radix without `Tabs` primitive) render
43+
visual tab UIs using `<button>` elements WITHOUT `role="tab"`. Result:
44+
45+
```ts
46+
// FAILS — locator returns 0 elements because there is no role=tab anywhere
47+
await page.getByRole('tab', { name: /Free Chat/i }).click();
48+
```
49+
50+
When ARIA snapshot during exploration shows the element as `button [ref=eN]`
51+
but the visual treatment is a tab strip, generate the spec with
52+
`getByRole('button', ...)` AND log the missing `role="tab"` as a soft a11y
53+
finding:
54+
55+
```ts
56+
issues.push(`a11y[moderate] aria-tabs: visual tab strip uses <button> without role="tab"`);
57+
```
58+
59+
This way the test passes (button locator works) and the a11y bug is recorded
60+
(screen-reader users can't navigate the tabs as a tablist).
61+
4062
## Anti-flake patterns
4163

4264
- **Listeners attach BEFORE `page.goto()`**`page.on('console', ...)` registered after navigation misses early errors.

0 commit comments

Comments
 (0)