Skip to content

Commit bd875c3

Browse files
gewenyu99claude
andauthored
feat(e2e-harness): drive and snapshot the real wizard TUI (#702)
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent b08f8c1 commit bd875c3

25 files changed

Lines changed: 2233 additions & 41 deletions
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
name: exploring-the-wizard
3+
description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end.
4+
compatibility: Designed for Claude Code working on the PostHog wizard codebase.
5+
metadata:
6+
author: posthog
7+
version: "3.0"
8+
---
9+
10+
# Exploring the wizard as an agent
11+
12+
Drive a real wizard run yourself: boot it on an app, read each screen, decide, act,
13+
snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound
14+
in this repo (registered in `.mcp.json`). For _how_ it works underneath, read
15+
[`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md).
16+
17+
If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server
18+
isn't approved yet — ask the user to approve `wizard-ci`, then retry.
19+
20+
## Set up
21+
22+
Ask the user for the absolute path to their PostHog key file — e.g. "What's the
23+
path to your phx key file?" — plus the project id and region if you don't have
24+
them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real
25+
fixture). Never print or commit the key.
26+
27+
## Drive
28+
29+
1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on
30+
the app and returns the first screen. `appDir` is the throwaway copy.
31+
2. **`read_state`** — current screen, run phase, secret-free session, tasks, and
32+
the actions legal right now. Call after every move.
33+
3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`,
34+
`dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`),
35+
`set_mcp_outcome`, `dismiss_slack`, `keep_skills`.
36+
4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it.
37+
5. **`run_agent`** — kicks off the **real integration** in the background and
38+
returns immediately; it bootstraps credentials, so it's what advances `auth`
39+
and `run`. Then **poll `read_state`**`runPhase` goes `running → completed`
40+
and the screen advances to `outro`.
41+
42+
A typical walk:
43+
44+
```
45+
open_app → intro → perform_action confirm_setup
46+
read_state → health-check → perform_action dismiss_outage
47+
read_state → auth → run_agent (returns at once; integration runs in background)
48+
read_state (poll) → runPhase running → completed, screen → outro
49+
outro → perform_action dismiss_outro → … → keep_skills
50+
```
51+
52+
Snapshot with `render_screen` at each key moment and save each frame to a numbered
53+
file — `/tmp/wz-explore-snaps/NN-<screen>.txt`, incrementing `NN` in visit order —
54+
so the run leaves a readable, ordered record you and the user can review afterward
55+
(the same shape the CI route's `.txt` frames take). Capture the run screen as it
56+
progresses, not just on screen changes.
57+
58+
## Key facts
59+
60+
- **State → screen.** You never navigate; you commit a decision (an action) and the
61+
router re-derives the active screen. Name actions, not keys.
62+
- **`auth` and `run` advance only via `run_agent`.** They expose no action and
63+
don't self-advance. `run_agent` returns immediately and runs the integration in
64+
the background — poll `read_state` for `runPhase` (`running → completed`).
65+
Everything else is an instant commit.
66+
- **`run_agent` creates real PostHog resources** (a dashboard + insights) in the
67+
project; each run duplicates them.
68+
- **A green run ≠ a valid integration.** `runPhase=completed` means the flow
69+
finished, not that the wizard understood the framework (e.g. it'll treat a Wasp
70+
app as react-router). Read what it actually changed.
71+
- **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`.

.mcp.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"mcpServers": {
3+
"wizard-ci": {
4+
"command": "npx",
5+
"args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"]
6+
}
7+
}
8+
}

AGENTS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,15 @@ boundaries, screen resolution
3131

3232
## Skills available
3333

34-
Four skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:
34+
Five skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:
3535

3636
| Skill | When to use |
3737
|---|---|
3838
| `wizard-development` | Before any structural change. Design principles + decision framework. |
3939
| `adding-framework-support` | Adding a new framework integration (e.g. Ruby on Rails, Go, Angular). |
4040
| `adding-skill-program` | Adding a new skill-based program (e.g. a new product feature setup). |
4141
| `ink-tui` | Building or modifying TUI screens, layouts, and primitives. |
42+
| `exploring-the-wizard` | Running/driving/exploring the wizard headlessly (read_state/perform_action, TUI snapshots). |
4243

4344
## CLI command surface
4445

README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ wizard --integration=nextjs
398398
wizard --integration=nextjs --local-mcp
399399
```
400400

401-
## Testing
401+
### Testing
402402

403403
To run unit tests, run:
404404

@@ -415,6 +415,40 @@ bin/test-e2e
415415
E2E tests are a bit more complicated to create and adjust due to to their mocked
416416
LLM calls. See the `e2e-tests/README.md` for more information.
417417

418+
#### Explore with an agent
419+
420+
You can hand the wizard to an AI agent and have it drive the real flow itself —
421+
deciding each screen and snapshotting the TUI to see what happened. The agent
422+
drives through the `wizard-ci` MCP tools (`open_app` / `read_state` /
423+
`perform_action` / `render_screen` / `run_agent`), which are registered in this
424+
repo's `.mcp.json` and bound in every session here — approve `wizard-ci` the first
425+
time you're prompted. The how-to is the `exploring-the-wizard` skill
426+
(`.claude/skills/exploring-the-wizard/SKILL.md`), which an agent discovers
427+
automatically.
428+
429+
Example prompt — explore against
430+
[open-saas](https://github.com/wasp-lang/open-saas):
431+
432+
> Explore the PostHog wizard against open-saas, following the
433+
> `exploring-the-wizard` skill. Ask me for my phx key file path and project id,
434+
> then clone `https://github.com/wasp-lang/open-saas` into a throwaway `/tmp`
435+
> copy. Drive the whole flow yourself through the `wizard-ci` MCP tools, deciding
436+
> each screen:
437+
>
438+
> 1. `open_app` on the `/tmp` copy, then `read_state` to see the screen and the
439+
> actions legal right now.
440+
> 2. At each key moment, `render_screen` and save the frame to
441+
> `/tmp/wz-explore-snaps/NN-<screen>.txt` (numbered in order) so we get a
442+
> readable record of the run.
443+
> 3. Act: `confirm_setup` at intro, `dismiss_outage` at health-check, `choose`
444+
> for any setup question, then `run_agent` at auth.
445+
> 4. Poll `read_state` until `integration` is `done` (or `failed` — then report
446+
> `integrationError`), snapshotting as the run screen progresses.
447+
> 5. Finish the tail: dismiss outro / mcp / slack, then `keep_skills`.
448+
>
449+
> Then show me the saved snapshots in order, the screen path, whether `posthog`
450+
> landed in the app, and anything that broke.
451+
418452
## Publishing your tool
419453

420454
To make your version of a tool usable with a one-line `npx` command:

e2e-harness/ARCHITECTURE.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# e2e-harness — Headless e2e Control Plane
2+
3+
How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**,
4+
no browser, no keystrokes — and captures what it rendered. Both e2e routes share
5+
one idea: run the real `startTUI` (the real ink render) and drive its store by
6+
**state manipulation**, then capture the real rendered screen from a PTY.
7+
8+
> If you're an agent that just wants to run and explore the wizard, use the
9+
> `exploring-the-wizard` skill
10+
> ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)).
11+
> This doc is the _how it works_ underneath.
12+
13+
## The pieces
14+
15+
This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of
16+
`src/` so none of it is part of the wizard's production source (nothing in `src/`
17+
imports it; the tsdown bundle never includes it).
18+
19+
```
20+
e2e-harness/
21+
wizard-ci-driver.ts WizardCiDriver — read_state / perform_action over the store
22+
action-registry.ts screen → the actions legal on it (+ NO_ACTION_SCREENS)
23+
e2e-profile.ts WizardE2eProfile + decideE2eAction — the scripted walk policy
24+
profiles.ts per-program profiles + profileFor(programId)
25+
tui-capture.ts run a command in a PTY (node-pty) + read its real screen (@xterm/headless)
26+
scripts/
27+
tui-host.no-jest.ts the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve
28+
tui-snapshots.no-jest.ts CI route: host(fixed) in a PTY → per-screen real-TUI snapshots
29+
wizard-ci-mcp.no-jest.ts agent route: MCP server proxying host(serve)
30+
```
31+
32+
The driver reads and mutates the **real** `WizardStore` that the TUI renders from:
33+
the router resolves the active screen from session state, every action goes
34+
through a store setter, and the render is a pure projection of that state. So
35+
manipulating the store makes the real TUI react — the driver and the renderer
36+
share one store and never conflict; you never touch the TUI's input.
37+
38+
## Auth without a browser
39+
40+
The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**:
41+
`getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into
42+
credentials, and `store.setCredentials(...)` sets them — the same bearer path an
43+
OAuth token takes, so the auth screen advances with no browser and no keystrokes.
44+
(`run_agent` does the same bootstrap as part of the real integration.)
45+
46+
## The two routes
47+
48+
- **CI snapshots**`tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`)
49+
in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the
50+
real agent run and signals each key moment; the parent writes the real rendered
51+
screen to `SNAP_OUT/NN-<screen>.txt` (including the run screen's progression).
52+
- **Agent**`wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns
53+
`tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` /
54+
`run_agent` forward over a unix socket; `render_screen` returns the real
55+
captured frame. The agent decides each screen itself.
56+
57+
## Things that bite
58+
59+
1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`,
60+
`CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host →
61+
`apiKeySource: none` → 401. The harness strips these for the child. A plain CI
62+
shell never has them.
63+
2. **A project-scoped key needs its project id.** Pass the team's `--project-id`
64+
(or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch.
65+
3. **Never run on a real fixture.** Always a throwaway copy.
66+
4. **`run_agent` is minutes long and creates real resources** (a dashboard +
67+
insights) each run; the agent log is one shared file — never run two at once.
68+
5. **node-pty's spawn-helper.** When the package is extracted without running its
69+
build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute
70+
bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores
71+
it best-effort on each spawn.
72+
73+
## Changing what the run does
74+
75+
Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) —
76+
not on the program config — so this machinery stays out of production source. Edit
77+
the program's entry (typed by `WizardE2eProfile`); the host asks
78+
`decideE2eAction(state, profile)` what to commit on each screen. The (screen →
79+
decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update).
80+
81+
## Visual-regression snapshots (the workbench flow)
82+
83+
[wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route
84+
for real-run visual regression: each test definition runs `tui-snapshots`, the
85+
real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and
86+
run-to-run differences are surfaced for a human, not asserted away. See
87+
`services/wizard-ci/` there.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
// Jest Snapshot v1, https://goo.gl/fbAQLP
2+
3+
exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = `
4+
{
5+
"profile": {
6+
"ask": "first",
7+
"healthCheck": "dismiss",
8+
"mcp": "skip",
9+
"setup": "first",
10+
"skills": "delete",
11+
"slack": "skip",
12+
},
13+
"program": "posthog-integration",
14+
"trace": [
15+
{
16+
"action": "confirm_setup",
17+
"screen": "intro",
18+
},
19+
{
20+
"action": "dismiss_outage",
21+
"screen": "health-check",
22+
},
23+
{
24+
"action": "choose",
25+
"screen": "setup",
26+
},
27+
{
28+
"action": "(external)",
29+
"screen": "auth",
30+
},
31+
{
32+
"action": "(external)",
33+
"screen": "run",
34+
},
35+
{
36+
"action": "dismiss_outro",
37+
"screen": "outro",
38+
},
39+
{
40+
"action": "set_mcp_outcome",
41+
"screen": "mcp",
42+
},
43+
{
44+
"action": "dismiss_slack",
45+
"screen": "slack-connect",
46+
},
47+
{
48+
"action": "keep_skills",
49+
"screen": "keep-skills",
50+
},
51+
],
52+
}
53+
`;
54+
55+
exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = `
56+
{
57+
"program": "posthog-integration",
58+
"trace": [
59+
{
60+
"action": "confirm_setup",
61+
"screen": "intro",
62+
},
63+
{
64+
"action": "dismiss_outage",
65+
"screen": "health-check",
66+
},
67+
{
68+
"action": "(external)",
69+
"screen": "auth",
70+
},
71+
{
72+
"action": "(external)",
73+
"screen": "run",
74+
},
75+
{
76+
"action": "dismiss_outro",
77+
"screen": "outro",
78+
},
79+
{
80+
"action": "set_mcp_outcome",
81+
"screen": "mcp",
82+
},
83+
{
84+
"action": "dismiss_slack",
85+
"screen": "slack-connect",
86+
},
87+
{
88+
"action": "keep_skills",
89+
"screen": "keep-skills",
90+
},
91+
],
92+
}
93+
`;

0 commit comments

Comments
 (0)