Skip to content

Commit 4c75e09

Browse files
CreatmanCEOclaude
andcommitted
Initial public beta — webtest-orch v0.1.0-beta
End-to-end web app testing skill for Claude Code. Splits into: - BOOTSTRAP — LLM-driven exploration via Playwright MCP on first run - REPLAY — deterministic `npx playwright test` on subsequent runs (~zero LLM tokens) - HYBRID — replay existing + explore new flows Image-budget protection: all browser work delegated to Task subagents, parent chat never receives inline images. Includes 9 black-box scripts, 7 reference docs, 6 templates, 3 examples, plus standard OSS files (LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, CHANGELOG, issue/PR templates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 parents  commit 4c75e09

37 files changed

Lines changed: 4292 additions & 0 deletions
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
name: Bug report
3+
about: Skill misbehaves, scripts crash, generated tests are wrong
4+
title: "[bug] "
5+
labels: bug, needs-triage
6+
---
7+
8+
## What happened
9+
10+
<!-- One paragraph. Include the command you ran and what the skill produced. -->
11+
12+
## What you expected
13+
14+
<!-- One paragraph. -->
15+
16+
## Reproduction
17+
18+
1. ...
19+
2. ...
20+
3. ...
21+
22+
## Environment
23+
24+
- OS: <!-- Windows / macOS / Linux + version -->
25+
- Node version: `node --version`
26+
- Python version: `python --version`
27+
- Claude Code version: <!-- from `claude --version` -->
28+
- Skill version: <!-- from CHANGELOG -->
29+
- Target app stack: <!-- Next.js, FastAPI, Supabase, etc. -->
30+
31+
## Logs / artifacts
32+
33+
<details>
34+
<summary>Console output</summary>
35+
36+
```
37+
<!-- paste relevant output -->
38+
```
39+
40+
</details>
41+
42+
<!-- If a `reports/<run-id>/` directory exists, attach `bugs.json` and `report.md`. -->
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
name: Feature request
3+
about: Suggest an improvement
4+
title: "[feature] "
5+
labels: enhancement, needs-triage
6+
---
7+
8+
## Problem
9+
10+
<!-- What can't you do today? Be concrete — point at a real workflow that hits friction. -->
11+
12+
## Proposed solution
13+
14+
<!-- Sketch of how the skill should behave. If it's a new script, describe its CLI. -->
15+
16+
## Alternatives considered
17+
18+
<!-- What did you try / think about? Why didn't it work? -->
19+
20+
## Out of scope
21+
22+
<!-- What this feature is explicitly NOT trying to solve. Helps reviewers calibrate. -->
23+
24+
## Stack / context
25+
26+
<!-- Your app's stack and what kind of testing you're doing. Helps prioritise. -->
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
name: OS compatibility report
3+
about: Tested on a non-Windows OS — share the results (good or bad)
4+
title: "[compat] <OS> + <shell>"
5+
labels: compatibility, beta-feedback
6+
---
7+
8+
We're actively looking for cross-platform feedback during the `0.1.x` beta. Even "everything works" reports help — they let us mark an OS as smoke-tested.
9+
10+
## Environment
11+
12+
- OS: <!-- Ubuntu 22.04, macOS 14.x, Fedora 40, ... -->
13+
- Architecture: <!-- x86_64 / arm64 -->
14+
- Shell: <!-- bash 5.2, zsh, fish, ... -->
15+
- Node version:
16+
- Python version:
17+
- Claude Code version:
18+
19+
## Install path
20+
21+
- Method: `bash install.sh` / `bash install.sh --symlink` / cloned manually
22+
- Result:
23+
- [ ] `install.sh` ran cleanly
24+
- [ ] `claude mcp list` shows `playwright` and `chrome-devtools`
25+
- [ ] Skill appears in Claude Code's available skills list after restart
26+
27+
## First run
28+
29+
- Target app: <!-- public site / authed app / Supabase / NextAuth / FastAPI / ... -->
30+
- Bootstrap result:
31+
- [ ] `npm i -D @playwright/test @axe-core/playwright dotenv` succeeded
32+
- [ ] `npx playwright install chromium` succeeded
33+
- [ ] `auth.setup.ts` ran without manual edits
34+
- [ ] First spec passed
35+
- [ ] `report.md` and `bugs.json` were generated
36+
37+
## Friction points
38+
39+
<!-- What did you have to fix manually? Quote shell commands or error messages. -->
40+
41+
## Time to first green spec
42+
43+
<!-- From `git clone` to first passing assertion. -->
44+
45+
## Anything we should add to the docs?
46+
47+
<!-- Edge cases that surprised you. -->

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
## What this PR does
2+
3+
<!-- One paragraph. -->
4+
5+
## Why
6+
7+
<!-- Link to issue or describe the user-facing problem this fixes. -->
8+
9+
## Type
10+
11+
- [ ] Bug fix
12+
- [ ] New feature / template / reference doc
13+
- [ ] Breaking change
14+
- [ ] Documentation only
15+
- [ ] Test / CI / tooling
16+
17+
## Checklist
18+
19+
- [ ] CHANGELOG.md updated under `[Unreleased]`
20+
- [ ] Touched scripts have `--help` and run on the smoke-test path
21+
- [ ] No personal/internal data in code, comments, or examples
22+
- [ ] Image-budget protection rules untouched (or documented violation justified)
23+
- [ ] If touching templates: placeholder list updated at file top
24+
25+
## Tested on
26+
27+
- OS:
28+
- Target app stack:
29+
- What you ran:
30+
31+
## Risk
32+
33+
<!-- Worst case if this PR is wrong. -->

.gitignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Skill state
2+
.isolation-verified
3+
fixtures/iso-test/
4+
5+
# Test artefacts (the skill writes these into target projects, not into the skill repo,
6+
# but contributors may have them lying around when they hand-test)
7+
node_modules/
8+
test-results/
9+
playwright-report/
10+
playwright/.auth/
11+
reports/
12+
**/*.spec.ts.snap
13+
14+
# Python
15+
__pycache__/
16+
*.pyc
17+
.pytest_cache/
18+
.ruff_cache/
19+
.mypy_cache/
20+
21+
# Editors / OS
22+
.vscode/
23+
.idea/
24+
.DS_Store
25+
Thumbs.db
26+
27+
# Local env
28+
.env
29+
.env.local
30+
.env.test
31+
*.env.local
32+
33+
# Build output (when we publish to npm)
34+
dist/
35+
*.tsbuildinfo

CHANGELOG.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented here. Format is based on
4+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
5+
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6+
7+
## [Unreleased]
8+
9+
### Planned for `0.2.0`
10+
- Supabase Auth pattern (`auth.setup.ts.tmpl` branch on `SUPABASE_URL`)
11+
- Onboarding-overlay state-seeding hook in auth setup template
12+
- Severity annotation: `// @severity: S0` parsing in spec files
13+
- Spec generation contract: console listeners + axe scan + issues collector required
14+
- `--bugs/--diff/--out` accepted as aliases on `generate_report.py`
15+
- Anchored regex in auth template
16+
- Pydantic / Next.js 15 patterns in `console-noise-patterns.md`
17+
18+
## [0.1.0-beta] - 2026-04-29
19+
20+
Initial public beta. Validated end-to-end on a real production app
21+
(static Next.js portfolio + a SaaS chat app via dogfooding).
22+
23+
### Added
24+
- `SKILL.md` — Claude Code skill workflow (181 lines)
25+
- `README.md` — user-facing documentation
26+
- `install.sh` — copy/symlink installer with MCP preflight check
27+
- 9 black-box scripts:
28+
- `detect_state.py` — project state probe (JSON / human modes)
29+
- `with_server.py` — dev-server lifecycle (frontend + backend)
30+
- `_image_isolation_check.py` — image-budget contract self-test
31+
- `run_suite.py` — wraps `npx playwright test`, normalizes output, ANSI-strip,
32+
extracts individual issues from `issues[]` collector pattern
33+
- `fingerprint_bugs.py` — composite SHA-256 fingerprints, severity heuristics
34+
(a11y impact-aware), Linear/GitHub/Jira tracker mappings, run-diff
35+
- `triage_console.py` — default ignore-list (GTM, Stripe, Sentry, dev-mode
36+
React, source-map 404s); bug-pattern classifier (hydration, CORS, CSP, 5xx)
37+
- `visual_diff.py` — locates `toHaveScreenshot()` failures, prepares
38+
vision-classification tasks
39+
- `vision_classify.py` — validates verdict format from Task subagent
40+
- `generate_report.py` — emits `report.md` + `index.html` + diff section
41+
- 7 reference docs: Playwright patterns, auth strategies, a11y patterns,
42+
responsive checklist, console noise patterns, stack-specific (Next.js,
43+
FastAPI, Telegram WebApp, WS/SSE, TTS), reporting (JSON schema + tracker
44+
mappings)
45+
- 6 templates: `playwright.config.ts.tmpl` (with auth), `playwright.config.public.ts.tmpl`
46+
(no auth), `auth.setup.ts.tmpl` (API-first, UI fallback), `fixture.ts.tmpl`,
47+
`pom.ts.tmpl`, `spec.ts.tmpl` (issues[] collector pattern)
48+
49+
### Image-budget protection
50+
- All browser work is delegated to a Task subagent (not a frontmatter
51+
`context: fork` directive — that field is not honoured by all Claude Code
52+
builds yet). Parent chat never receives inline images.
53+
54+
### Known limitations
55+
- Playwright MCP must be installed separately (`claude mcp add playwright`).
56+
- Documentation drift: `generate_report.py` argument signature does not match
57+
`SKILL.md` step 10 wording; will be reconciled in `0.2.0`.
58+
- Severity is structurally inferred (no LLM pass); P0 product regressions can
59+
be misclassified as S2 unless the spec is annotated; `0.2.0` adds annotation
60+
parsing.
61+
- Onboarding overlays in target apps will fail every spec until the user adds
62+
state-seeding to `auth.setup.ts`; `0.2.0` adds an explicit hook block.
63+
- macOS / Linux installers untested in CI; help wanted (see
64+
`os-compatibility-report` issue template).
65+
66+
[Unreleased]: https://github.com/CreatmanCEO/webtest-orch/compare/v0.1.0-beta...HEAD
67+
[0.1.0-beta]: https://github.com/CreatmanCEO/webtest-orch/releases/tag/v0.1.0-beta

CODE_OF_CONDUCT.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Code of Conduct
2+
3+
## Our pledge
4+
5+
In the interest of fostering an open and welcoming environment, we pledge to make participation in this project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
6+
7+
## Standards
8+
9+
Examples of behavior that contributes to a positive environment:
10+
11+
- Using welcoming and inclusive language
12+
- Being respectful of differing viewpoints and experiences
13+
- Gracefully accepting constructive criticism
14+
- Focusing on what is best for the community
15+
- Showing empathy towards other community members
16+
17+
Examples of unacceptable behavior:
18+
19+
- The use of sexualized language or imagery, and unwelcome sexual attention
20+
- Trolling, insulting/derogatory comments, and personal or political attacks
21+
- Public or private harassment
22+
- Publishing others' private information without explicit permission
23+
- Other conduct which could reasonably be considered inappropriate in a professional setting
24+
25+
## Enforcement
26+
27+
Project maintainers are responsible for clarifying acceptable behavior and may take corrective action in response to any instances of unacceptable behavior, including warnings, temporary or permanent bans from the project's spaces.
28+
29+
Reports go through GitHub's [report-abuse flow](https://docs.github.com/en/communities/maintaining-your-safety-on-github/reporting-abuse-or-spam) or via private email to the repository owner listed on the GitHub profile.
30+
31+
## Attribution
32+
33+
Adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.

CONTRIBUTING.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Contributing to webtest-orch
2+
3+
Thanks for considering a contribution. This document is short on purpose — read it once, then move fast.
4+
5+
## Quick start
6+
7+
1. Fork the repo, clone your fork.
8+
2. Install dependencies: Python 3.10+, Node.js 18+, Claude Code CLI.
9+
3. Install the skill locally for testing:
10+
```bash
11+
bash install.sh --symlink # symlinks live source into ~/.claude/skills/
12+
```
13+
4. Make changes. Re-run the skill in any project to test.
14+
15+
## What we want
16+
17+
| Welcome | Less welcome |
18+
|---|---|
19+
| Bug reports with reproduction steps | "doesn't work for me" without context |
20+
| New stack-specific patterns (Supabase, NextAuth, Clerk, etc.) | Speculative refactors of working code |
21+
| OS compatibility reports (Linux/macOS/Windows variants) | Style-only changes without functional reason |
22+
| New `console-noise-patterns.md` patterns from your apps | Re-litigating architectural decisions |
23+
| Test cases for `scripts/*.py` | New dependencies without strong justification |
24+
25+
## Pull request workflow
26+
27+
1. **One PR = one concern.** Don't bundle a bug fix with a refactor.
28+
2. **Update CHANGELOG.md** under `[Unreleased]`. The first line of your CHANGELOG entry is your PR description.
29+
3. **Add tests if you touched a Python script.** Smoke-tests live in `tests/python/` (coming in `0.2.0`); for now, hand-verify and document what you ran.
30+
4. **Don't add LLM calls inside scripts.** Skill scripts MUST stay deterministic. LLM work happens via Task subagents dispatched by the orchestrator, not inside `*.py`.
31+
5. **Image-budget protection is non-negotiable.** Any change that lets screenshots leak into the parent chat context will be reverted. Read `SKILL.md` § "Image budget protection" first.
32+
6. **Sign-off.** Add a `Co-authored-by:` trailer if multiple humans worked on the PR.
33+
34+
## Naming and structure
35+
36+
- New script: `scripts/<verb>_<noun>.py`. Must accept `--help` and return useful info.
37+
- New reference doc: `reference/<topic>.md`. Add a one-line entry in `SKILL.md` § References.
38+
- New template: `templates/<file>.<ext>.tmpl`. Document its placeholders at the top with `// PLACEHOLDERS: __NAME1__ __NAME2__`.
39+
40+
## Review SLA
41+
42+
- **Bug reports:** triaged within 7 days.
43+
- **PRs:** first review within 14 days. We're a small project; please be patient.
44+
45+
## Code of Conduct
46+
47+
By participating, you agree to abide by [`CODE_OF_CONDUCT.md`](./CODE_OF_CONDUCT.md).
48+
49+
## Releasing (maintainers only)
50+
51+
1. Bump version in `package.json` (when published).
52+
2. Move `[Unreleased]` items into a new `[X.Y.Z] - YYYY-MM-DD` section in `CHANGELOG.md`.
53+
3. `git tag vX.Y.Z && git push --tags`.
54+
4. GitHub Actions creates the release.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 webtest-orch contributors
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
 (0)