CreatmanCEO
diff --git a/‎CHANGELOG.md‎
Lines changed: 67 additions & 9 deletions b/‎CHANGELOG.md‎
Lines changed: 67 additions & 9 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎SKILL.md‎
Lines changed: 39 additions & 4 deletions b/‎SKILL.md‎
Lines changed: 39 additions & 4 deletions
diff --git a/‎reference/auth-strategies.md‎
Lines changed: 67 additions & 6 deletions b/‎reference/auth-strategies.md‎
Lines changed: 67 additions & 6 deletions
diff --git a/‎reference/console-noise-patterns.md‎
Lines changed: 8 additions & 0 deletions b/‎reference/console-noise-patterns.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎reference/playwright-patterns.md‎
Lines changed: 22 additions & 0 deletions b/‎reference/playwright-patterns.md‎
Lines changed: 22 additions & 0 deletions
@@ -6,14 +6,71 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
 
-### Planned for `0.2.0`
-- Supabase Auth pattern (`auth.setup.ts.tmpl` branch on `SUPABASE_URL`)
-- Onboarding-overlay state-seeding hook in auth setup template
-- Severity annotation: `// @severity: S0` parsing in spec files
-- Spec generation contract: console listeners + axe scan + issues collector required
-- `--bugs/--diff/--out` accepted as aliases on `generate_report.py`
-- Anchored regex in auth template
-- Pydantic / Next.js 15 patterns in `console-noise-patterns.md`
+### Planned for `0.3.0`
+- Vision-classifier auto-loop (`vision_dispatch.py`)
+- Console LLM auto-triage (`console_llm_triage.py` with batched subagent)
+- Performance / Lighthouse audit script
+- Tracker integration CLI (`file_bugs.py --linear / --github / --jira`)
+- Regression watchlist mechanism (sticky fixed → escalate on regression)
+- Layout integrity assertions (max-width, icon grouping patterns)
+
+## [0.2.0-beta] - 2026-04-29
+
+Functional gaps closed based on dogfooding feedback from two real apps. Pre-OSS
+v1 hardening — zero false positives in skill core, 113 passing tests, green CI
+on Linux/macOS/Windows.
+
+### Added
+- **Supabase Auth Pattern 1.5** — `auth.setup.ts.tmpl` auto-detects `SUPABASE_URL`
+  + `SUPABASE_ANON_KEY`, hits `/auth/v1/token?grant_type=password` with `apikey`
+  header, polls localStorage 45s for `sb-<ref>-auth-token` (no assumption of
+  URL change post-login). Documented in `auth-strategies.md` Pattern 1.
+- **Onboarding overlay state-seeding** — `seedOnboardingFlags()` helper in
+  `auth.setup.ts.tmpl` auto-flips `localStorage` keys matching common patterns
+  (`*-features-discovered`, `*-onboarding-complete`, `*-tour-seen`,
+  `*-hints-seen`, `*-welcome-dismissed`). Override via `TEST_ONBOARDING_FLAGS`
+  JSON env var. Without this, apps with feature-tour overlays fail every spec.
+- **Severity annotation mechanism** — three ways to override the heuristic:
+  1. `[severity:S0]` inline tag in `issues.push(...)` lines
+  2. `[severity:S0]` in spec test name
+  3. `// @severity: S0` comment preceding `test('...')` in spec file
+  `fingerprint_bugs.py --project-root` flag controls where to look for spec files.
+- **Spec generation contract** in SKILL.md — non-negotiable list of elements
+  every generated spec MUST contain (console + network listeners before goto,
+  axe scan, issues[] collector, hard `expect(issues).toEqual([])` at end).
+  Closes the gap where Claude wrote specs from scratch and silently skipped
+  the audit features.
+- **`scripts/preflight.py`** — quick env + base-URL HEAD check before scaffolding.
+  Fails fast with actionable hints if `TEST_BASE_URL` is unreachable, auth env
+  missing, or Supabase key looks malformed.
+- **Tabs-vs-buttons reference note** in `playwright-patterns.md` — handles SPAs
+  with visual tabs but no `role="tab"` (logs as a11y soft finding instead of failing).
+- **WebSocket DOM-fallback strategy** in `stack-specific.md` — when WS frames are
+  binary/encrypted/proprietary, assert on DOM mutations instead of frames.
+- **Pydantic, Next.js 15 Turbopack, Supabase realtime, browser-extension,
+  ResizeObserver, AbortError** patterns added to `console-noise-patterns.md`
+  default ignore-list.
+- **Run artefact summary** at end of `run_suite.py` — prints all generated paths
+  + next-step commands so users don't have to `ls reports/`.
+
+### Fixed
+- **`generate_report.py` doc drift** — SKILL.md step 10 now matches the actual
+  `--run-dir` CLI signature.
+- **Anchored regex** in auth UI fallback (`/^(sign in|log in|войти|вход)$/i`) —
+  no longer matches "Sign up" / "Sign in with Google".
+- **Skill-dir resolution** in `detect_state.py` — populates `skillDir` from
+  `__file__` when `CLAUDE_SKILL_DIR` env is unset, instead of returning `null`.
+  Fixes `Isolation verified: false` mismatch reported by Lingua tester.
+- **Image-budget rule wording** in SKILL.md — clarifies that on-disk
+  auto-captures (which nobody `Read`s) are FREE; the cost is inline returns.
+- **Fingerprint regex** for node counts now matches both ASCII `(3x nodes)` and
+  Unicode `(3× nodes)`.
+
+### Tests
+- 19 new tests covering severity overrides, skill-dir resolution, preflight
+  module. Total: 113 passing across 10 scripts.
+
+## [0.1.0-beta] - 2026-04-29
 
 ## [0.1.0-beta] - 2026-04-29
 
@@ -63,5 +120,6 @@ Initial public beta. Validated end-to-end on a real production app
 - macOS / Linux installers untested in CI; help wanted (see
   `os-compatibility-report` issue template).
 
-[Unreleased]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.1.0-beta...HEAD
+[Unreleased]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.2.0-beta...HEAD
+[0.2.0-beta]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.1.0-beta...v0.2.0-beta
 [0.1.0-beta]: https://github.com/CreatmanCEO/webtest-orch/releases/tag/v0.1.0-beta
@@ -2,7 +2,7 @@
 
 [![CI](https://github.com/CreatmanCEO/webtest-orch/actions/workflows/ci.yml/badge.svg)](https://github.com/CreatmanCEO/webtest-orch/actions/workflows/ci.yml)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
-[![Version](https://img.shields.io/badge/version-0.1.0--beta-orange.svg)](./CHANGELOG.md)
+[![Version](https://img.shields.io/badge/version-0.2.0--beta-orange.svg)](./CHANGELOG.md)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
 
 **Universal e2e testing skill for Claude Code.** Заменяет ad-hoc промпты с Playwright MCP на одну переиспользуемую сущность для тестирования любого web-приложения (Next.js, FastAPI, статика, Telegram WebApp, и т.д.).
 
@@ -23,11 +23,25 @@ End-to-end testing orchestrator for web applications. Splits into **first-run ex
 
 ## Image budget protection — READ FIRST, MANDATORY
 
-**The problem:** screenshots burn Claude Code's parent-chat **image cap** (~50–100 inline image blocks per session) before they burn its text context. Standalone Playwright MCP usage hits this wall fast. Once hit, the user must `/compact` even at 20% text-context usage.
+**The problem:** Claude Code has two independent context limits — text tokens (large)
+and **inline-image blocks** (~50–100 per session). Screenshots returned inline burn
+the image budget far faster than the text budget; once exhausted, the user must
+`/compact` even at 20% text-context usage.
+
+**Distinction that matters:**
+- ❌ **Inline image returns to parent context** burn the budget. This includes
+  `browser_take_screenshot` default output (image returned to caller),
+  `Read` on a `.png/.jpg/.webp/.gif/.bmp/.svg`, markdown report with `![]()` shown
+  to parent.
+- ✅ **On-disk artefacts that nobody Reads** are FREE. Playwright's failure
+  screenshots go to `test-results/`, MCP browser tools may save `.png`s to a
+  cache dir — none of these cost the parent context UNLESS you `Read` them.
 
 **The hard rule, enforced by you (not by frontmatter):**
 
-> **NEVER call `Playwright:browser_take_screenshot`, `chrome-devtools:take_screenshot`, or `Read` on `.png/.jpg/.webp` files from the parent skill context. ALWAYS dispatch a Task subagent (general-purpose) to do anything that produces or consumes images. Subagent returns ONLY text — paths, descriptions, verdicts.**
+> **NEVER return screenshots to the parent skill context. ALWAYS dispatch a Task
+> subagent (general-purpose) for anything that produces or consumes images.
+> Subagent returns ONLY text — paths, descriptions, verdicts.**
 
 This contract was attempted via `context: fork` frontmatter but Claude Code 2.1.x on Windows does not honor that field, so enforcement is delegated to *you reading these instructions*. Verified empirically 2026-04-28 (sub-agent isolation works; `context: fork` does not parse). See `${CLAUDE_SKILL_DIR}/.isolation-verified`.
 
@@ -101,12 +115,33 @@ Copy this checklist into TodoWrite at session start; tick as you go.
    - First run → minimal critical-path: home + auth + one main flow
 - [ ] **4. Dev server up.** `python "${CLAUDE_SKILL_DIR}/scripts/with_server.py" --help`. Use it; **do not read its source unless `--help` doesn't cover the case.**
 - [ ] **5a. EXPLORATORY** (BOOTSTRAP / new flow in HYBRID): use **Playwright MCP** with `Playwright:browser_snapshot` (ARIA tree, text). Walk the flow, generate POM in `tests/pages/<Page>.ts`, generate spec in `tests/specs/<flow>.spec.ts`. **Generate locators from ARIA tree refs you actually saw** — do NOT use generic regex like `getByPlaceholder(/john doe|name|имя/i)`, they cause strict-mode violations on first run. Either use exact strings from the snapshot OR add `.first()` explicitly. Run the spec once to confirm green.
+
+  **🔴 SPEC GENERATION CONTRACT — non-negotiable.** Even if you skip the template
+  and write a spec from scratch (when product context is rich), every generated
+  `*.spec.ts` MUST contain ALL of these:
+   1. **Console listeners attached BEFORE `page.goto()`**: `consoleErrors[]` from
+      `page.on('pageerror')` and `page.on('console', m => m.type() === 'error')`.
+   2. **Network listeners attached BEFORE `page.goto()`**: `failedRequests[]` from
+      `page.on('response', r => r.status() >= 400 && ...)` and `page.on('requestfailed')`.
+   3. **`AxeBuilder` scan** with `withTags(['wcag2a','wcag2aa','wcag21aa','wcag22aa'])`
+      whose violations are pushed into `issues[]`.
+   4. **`issues[]` collector pattern** — every soft check pushes a structured tag
+      into `issues` (e.g. `a11y[serious] color-contrast: ...`, `heading-jump: ...`,
+      `touch-target: ...`, `overflow: ...`, `html-lang: ...`).
+   5. **Single hard `expect(issues).toEqual([])`** at the end with
+      `${count} issues found:\n  - ...` message format so `run_suite.py` can split
+      one test failure into one bug record per issue.
+   6. **Trailing `expect(consoleErrors).toEqual([])` + `expect(failedRequests).toEqual([])`**.
+
+  Skip ANY of these and the skill's console-audit / a11y / per-issue fingerprinting
+  features stop working. Use `templates/spec.ts.tmpl` as the canonical reference —
+  copy its skeleton into hand-written specs.
 - [ ] **5b. REPLAY**: `npx playwright test --reporter=list,json,html`. **No Playwright MCP, no LLM browser actions.**
 - [ ] **6. A11y** on each visited page: deterministic `@axe-core/playwright` (in spec) + qualitative checks via nested subagent if alt-text/heading/focus suspect. See `reference/a11y-patterns.md`.
 - [ ] **7. Console + network.** Listeners attach BEFORE `page.goto()` (mandatory). Pipe captured logs through `python "${CLAUDE_SKILL_DIR}/scripts/triage_console.py" --help`.
 - [ ] **8. Visual.** Default `toHaveScreenshot()` in spec. Diff fired → `python "${CLAUDE_SKILL_DIR}/scripts/visual_diff.py" --classify` spawns nested subagent on each failed image (text verdict only). Argos opt-in via `VISUAL_DIFF=argos`.
 - [ ] **9. Fingerprint + diff.** `python "${CLAUDE_SKILL_DIR}/scripts/fingerprint_bugs.py" --current reports/<curr>/raw.json --previous reports/<prev>/bugs.json --out reports/<curr>/bugs.json`.
-- [ ] **10. Report.** `python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" --bugs bugs.json --diff diff.json --out reports/<run-id>`. Print absolute path to `index.html`.
+- [ ] **10. Report.** `python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" --run-dir reports/<run-id> [--app-name "My App"]`. Reads `bugs.json` + `diff.json` from that dir, writes `report.md` + `index.html` next to them. Print absolute path to `index.html`.
 
 ## Decision tree
 
@@ -155,7 +190,7 @@ detect_state.py → JSON
 - `scripts/triage_console.py --input console.json` — noise filter + LLM long tail
 - `scripts/visual_diff.py --classify reports/<run-id>` — pixel diff + nested-subagent classification
 - `scripts/fingerprint_bugs.py --current curr.json --previous prev.json` — bug dedup + diff
-- `scripts/generate_report.py --bugs bugs.json --diff diff.json --out reports/<run-id>` — markdown + html
+- `scripts/generate_report.py --run-dir reports/<run-id> [--app-name "X"]` — markdown + html (reads bugs.json/diff.json from --run-dir)
 - `scripts/_image_isolation_check.py --verify` — image budget contract self-check
 
 ## When to dispatch a Task subagent
 
@@ -10,7 +10,65 @@ The skill prefers **API-based login** over UI-driven login because API login is
 
 This means **every test runs already authenticated** without re-doing the login flow per test — saves time and reduces flake.
 
-## Pattern 1 — API login + JWT (preferred)
+## Pattern 0 — Onboarding overlays (run before any pattern below)
+
+Apps with feature-discovery tours, hint overlays, or "welcome" modals will
+**fail every spec** if those overlays intercept the first click. The
+`auth.setup.ts.tmpl` template ships a `seedOnboardingFlags()` helper that
+runs after authentication and before `storageState` is saved.
+
+It does two things:
+
+1. Auto-flips `localStorage` keys matching common patterns to `"true"`:
+   `*-features-discovered`, `*-onboarding-complete`, `*-tour-seen`,
+   `*-hints-seen`, `*-welcome-dismissed`.
+2. Reads `TEST_ONBOARDING_FLAGS` env (JSON array of exact key names) and sets
+   each one to `"true"`.
+
+For app-specific keys not matching default patterns:
+
+```bash
+# In .env.test
+TEST_ONBOARDING_FLAGS=["myapp-tour-v2","myapp-pricing-banner-dismissed"]
+```
+
+Specs that explicitly test the onboarding tour from a fresh state should
+clear these flags themselves at test start (`page.evaluate(() => localStorage.clear())`)
+before navigating.
+
+## Pattern 1 — Supabase Auth (most common SaaS BaaS)
+
+Detected automatically when `SUPABASE_URL` and `SUPABASE_ANON_KEY` env are set.
+Skill skips other auth patterns and uses the Supabase REST endpoint:
+
+```ts
+POST {SUPABASE_URL}/auth/v1/token?grant_type=password
+Headers: { apikey: SUPABASE_ANON_KEY, Content-Type: application/json }
+Body:    { email, password }
+```
+
+Token is stored in localStorage as `sb-<project_ref>-auth-token` (the
+`project_ref` is the subdomain from `SUPABASE_URL`).
+
+**Why polling instead of `Promise.race`:** Supabase clients sometimes rewrite
+the storage key on hydration (refresh-token rotation). The template polls
+localStorage with a 45-second timeout instead of racing a 15s URL-change
+expectation that may never fire (SPA hydrates in place).
+
+```bash
+# .env.test for a Supabase app
+TEST_BASE_URL=https://your-app.example.com
+TEST_USER_EMAIL=qa@example.com
+TEST_USER_PASSWORD=...
+SUPABASE_URL=https://abcdefgh.supabase.co
+SUPABASE_ANON_KEY=eyJhbGc...
+```
+
+OAuth providers configured in Supabase (Google, GitHub) are NOT covered by this
+pattern — they require browser flow. Use a dedicated test user with email/
+password sign-in for CI runs.
+
+## Pattern 2 — API login + JWT (preferred)
 
 For FastAPI, Express, NestJS, Django REST — backends with `/api/auth/login` returning JWT.
 
@@ -81,11 +139,14 @@ If failures look like 401/403 storms, the auth file is the first thing to refres
 
 ```
 TEST_BASE_URL              # required — e.g. https://your-app.example.com
-TEST_USER_EMAIL            # required
-TEST_USER_PASSWORD         # required
-TEST_API_LOGIN_PATH        # optional — default /api/auth/login
-TEST_API_TOKEN_FIELD       # optional — default access_token (also tried: token, jwt)
-TEST_ADMIN_TOKEN           # optional — for Pattern 3
+TEST_USER_EMAIL            # required (unless public-only site)
+TEST_USER_PASSWORD         # required (unless public-only site)
+SUPABASE_URL               # if set → Pattern 1 (Supabase Auth)
+SUPABASE_ANON_KEY          # required with SUPABASE_URL
+TEST_API_LOGIN_PATH        # Pattern 2 — default /api/auth/login
+TEST_API_TOKEN_FIELD       # Pattern 2 — default access_token (also tried: token, jwt)
+TEST_ADMIN_TOKEN           # Pattern 3 — server-issued admin token (Telegram WebApp etc.)
+TEST_ONBOARDING_FLAGS      # JSON array of localStorage keys to flip to "true" post-auth
 TEST_USER_AGENT_KIND       # optional — desktop|mobile|telegram, default desktop
 ```
 
 
@@ -28,6 +28,14 @@ script (or pass `--ignore-extra` from the orchestrator) to add app-specific patt
 | `Blocked a frame with origin .* from accessing` | iframe sandbox | Expected in sandboxed iframes |
 | `\[HMR\]`, `\[vite\]`, `\[next\]: hot-update` | Hot module reload | Dev-only |
 | `Permission denied to access property .* on cross-origin` | Cross-origin frame | Expected |
+| `PydanticDeprecatedSince20`, `PydanticUserWarning` | FastAPI / Pydantic v2 backend warnings forwarded to client | Migration nag, not a runtime bug |
+| `[Turbopack] compiled\|building\|HMR` | Next.js 15 + Turbopack dev | Compile signals — dev-only |
+| `[next-auth][debug]` | next-auth debug mode | Dev-only when `NEXTAUTH_DEBUG=true` |
+| `[Fast Refresh]`, `fast-refresh` | Next.js HMR | Dev-only |
+| `[Supabase].*realtime`, `supabase.*subscribe` | Supabase realtime client | Channel subscription chatter |
+| `chrome-extension://`, `moz-extension://` | Browser extensions | Not the app; ignore |
+| `ResizeObserver loop limit exceeded` | Chromium quirk | Browser implementation detail, not the app |
+| `AbortError: signal is aborted` | React unmount during fetch | Cleanup race, not user-visible |
 
 ## Bug patterns (auto-report with severity)
 
 
@@ -37,6 +37,28 @@ These poll automatically. Prefer them over manual loops or `waitForTimeout`.
 
 `expect(locator).toBeVisible()` waits up to the test/expect timeout; `locator.isVisible()` returns immediately and is non-retrying — correct for branching, wrong for assertions.
 
+## Tabs vs buttons (common a11y miss)
+
+Many SPAs (Tailwind / headless-UI / radix without `Tabs` primitive) render
+visual tab UIs using `<button>` elements WITHOUT `role="tab"`. Result:
+
+```ts
+// FAILS — locator returns 0 elements because there is no role=tab anywhere
+await page.getByRole('tab', { name: /Free Chat/i }).click();
+```
+
+When ARIA snapshot during exploration shows the element as `button [ref=eN]`
+but the visual treatment is a tab strip, generate the spec with
+`getByRole('button', ...)` AND log the missing `role="tab"` as a soft a11y
+finding:
+
+```ts
+issues.push(`a11y[moderate] aria-tabs: visual tab strip uses <button> without role="tab"`);
+```
+
+This way the test passes (button locator works) and the a11y bug is recorded
+(screen-reader users can't navigate the tabs as a tablist).
+
 ## Anti-flake patterns
 
 - **Listeners attach BEFORE `page.goto()`** — `page.on('console', ...)` registered after navigation misses early errors.