Drive your live web app as a source-blind, first-time user. An AI agent hovers, clicks, and types in a real browser, watches what actually happens, and reports the gaps (bugs, broken expectations, UX surprises, accessibility issues) before real users hit them.
It covers a gap that three other things miss:
- Real users find these problems, but only after you ship.
- You, testing manually, know how the app is built, so you cannot see it fresh.
- Scripted end-to-end tests assert flows you already know. They cannot discover the ones you do not.
The agent forms expectations from only two sources: what is on the screen, and
universal web conventions. It never reads your source code. It builds a knowledge base
under qa/naive-user/<app>/ that lives outside the code, the way a real user's mental
model does, and it compounds across runs.
qa/naive-user/<app>/
├── mental-model.md # what a naive user believes the app does; refined every run
├── findings/<date>.md # dated gap reports (Expected / Did / Observed / Gap / Severity / Repro)
└── screenshots/ # before/after evidence (gitignored)
Findings are severity-ranked: bug, broken-expectation, ux-gap, surprise, a11y.
- Node.js 18+ is the only hard requirement. The Playwright MCP server provisions its
browser on first use. If you ever hit a missing-browser error, run
npx playwright install chromium. - A running app, either already serving at a URL or startable from a
startCommandin your config. - One of the supported harnesses below. Each needs agentic tool-calling plus the Playwright MCP browser tools, which is why instruction-only IDE rule hosts are not targeted.
The same MCP server body, npx @playwright/mcp@latest, works everywhere. Only OpenCode
and Copilot tweak the shape.
Claude Code. Marketplace install. The plugin bundles the Playwright MCP, so it is one step:
/plugin marketplace add TechGardenCode/naive-user # or a local path: ./naive-user
/plugin install naive-user@naive-user
Not using the plugin? Copy this repo's
.mcp.jsoninto your app repo root and dropskills/andcommands/into your project's.claude/.
Codex. Drop the plugin in, then add the MCP server to ~/.codex/config.toml:
[mcp_servers.playwright]
command = "npx"
args = ["@playwright/mcp@latest"]Gemini CLI. Install as an extension (gemini-extension.json bundles both the MCP server
and the skill as context), or add the server to ~/.gemini/settings.json:
{
"mcpServers": {
"playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] }
}
}GitHub Copilot CLI. The plugin lives at .github/plugin/plugin.json and bundles the MCP.
To wire it manually, add to ~/.copilot/mcp-config.json:
{
"mcpServers": {
"playwright": { "type": "local", "command": "npx", "args": ["@playwright/mcp@latest"], "tools": ["*"] }
}
}OpenCode. opencode.json bundles the MCP. Note that command is an array, and -y
avoids the interactive npx prompt:
{
"mcp": {
"playwright": { "type": "local", "command": ["npx", "-y", "@playwright/mcp@latest"], "enabled": true }
}
}Copy templates/naive-user.config.json into your app repo's root and fill it in:
{
"app": "myapp",
"baseUrl": "http://localhost:3000",
"startCommand": null,
"auth": { "steps": ["Go to /", "Type the dev username", "Submit"], "critiqueLoginPage": false },
"coverageNotes": "Optional hints, not a script."
}startCommand replaces any "how do I start the app" step. Set it and the agent runs it.
Leave it null and the agent assumes the app is already up at baseUrl. See
examples/notes-app/ for a fully worked config.
In your app repo's .gitignore, commit the markdown but ignore the screenshot evidence:
qa/naive-user/*/screenshots//naive-test [app]
With no argument it uses the app from naive-user.config.json. The agent loads the prior
mental model, makes sure the app is running, signs in via the configured auth steps, explores
the live UI source-blind, and writes an updated mental-model.md plus a dated findings report.
Run it on demand while developing, or dispatch it as a subagent to run in parallel with other work.
A changed-from-last-time behavior is flagged as a regression.
A findings file leads with a one-line summary and a severity-sorted table, then one entry per gap:
| # | Severity | Surface | Gap |
|---|----------|---------|-----|
| 1 | bug | Capture | Pressing Enter in the title field reloads the page |
| 2 | a11y | Sidebar | Active nav item has no visible focus ring |
## 1. Pressing Enter reloads instead of saving
- Expected: Enter submits the form (primary-button convention).
- Did: Typed a title, pressed Enter.
- Observed: Full page reload, draft lost.
- Severity: bug
- Repro: 1. Open /. 2. Type in title. 3. Press Enter.
- Screenshot: screenshots/capture-enter-before.pngCore content lives once:
skills/naive-user/SKILL.mdholds the source-blind testing methodology (config-driven).commands/naive-test.md(plus.tomlfor Codex and OpenCode) is the/naive-testentry point.
Each harness gets a thin manifest that points at those files and declares the Playwright MCP in that harness's native format. No content is duplicated:
| Harness | Manifest | MCP declared in |
|---|---|---|
| Claude Code | .claude-plugin/plugin.json (plus marketplace.json) |
plugin mcpServers / .mcp.json |
| Codex | .codex-plugin/plugin.json |
~/.codex/config.toml |
| Gemini CLI | gemini-extension.json |
extension mcpServers / settings.json |
| Copilot CLI | .github/plugin/plugin.json |
plugin mcpServers / ~/.copilot/mcp-config.json |
| OpenCode | opencode.json |
opencode.json mcp |
Issues and pull requests are welcome. See CONTRIBUTING.md for the workflow, and SECURITY.md to report a vulnerability privately.
MIT. See LICENSE.