-
Notifications
You must be signed in to change notification settings - Fork 25
feat(e2e-harness): drive and snapshot the real wizard TUI #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
aa98b5b
feat(ci-driver): wizard-ci-tools control plane for headless e2e + rec…
gewenyu99 7713d1e
refactor(posthog-integration): extract e2e profile to its own file
gewenyu99 6bbead3
docs(ci-driver): point the agent guide at the extracted e2e profile file
gewenyu99 c705cbe
test(ci-driver): add offline sample-recording generator for replay
gewenyu99 afaafd2
docs(scripts): add README indexing the ci-driver/e2e scripts
gewenyu99 c35daf1
fix(ci-driver): classify warehouse-intro + self-driving-intro screens
gewenyu99 19e88f6
docs(ci-driver): rename agent guide to ARCHITECTURE.md, strip interna…
gewenyu99 eb81853
feat(ci-driver): render a recording to per-frame TUI snapshots
gewenyu99 3031d3e
refactor: move e2e/recording harness out of prod src into e2e-harness/
gewenyu99 66dd1e0
docs(e2e-harness): cross-link the workbench visual-snapshots flow + env
gewenyu99 d588d7d
docs(posthog-integration): describe the e2e test path next to the pro…
gewenyu99 de81fd8
refactor(e2e): make the test definition a readable JSON the harness l…
gewenyu99 62c5595
docs(e2e-harness): instrument the perform_action trace across the hops
gewenyu99 220891b
docs(e2e-harness): state the never-ships-to-prod guarantee in each mo…
gewenyu99 8b8f342
revert: drop the explanatory comments from source
gewenyu99 8fc7a74
chore(scripts): remove demo/proof scaffolding from the PR
gewenyu99 20309c4
docs(e2e-harness): add the agent exploration runbook
gewenyu99 ab0b970
docs: move agent-exploration to wizard README, trim comments to curre…
gewenyu99 6c5d887
feat(skills): promote the agent-exploration runbook to a skill
gewenyu99 31286a5
feat(e2e-harness): live MCP server so an agent drives the wizard turn…
gewenyu99 5e5bfcd
chore: align zod spec to ^3.25.76 (matches the pi stack #701)
gewenyu99 3a71eac
refactor(e2e-harness): drop redundant list_actions from the MCP server
gewenyu99 8fbc396
docs: revert prettier reflow of README + AGENTS, keep only the real c…
gewenyu99 3821e31
docs: fix dead link — point ARCHITECTURE at the skill, not the delete…
gewenyu99 25a32c6
fix(skill): correct the driving instructions — MCP tools bind at sess…
gewenyu99 9ded865
fix(e2e-harness): make the MCP server actually loadable; skill leads …
gewenyu99 15b5bfe
feat(e2e-harness): bind wizard-ci as committed MCP tools so an agent …
gewenyu99 423bd1d
docs(e2e-harness): drop "monorepo" wording from open_app guidance
gewenyu99 1981123
docs(e2e-harness): drop the app-dir hand-holding from open_app
gewenyu99 67bb510
fix(e2e-harness): point agents to run_agent on the auth screen
gewenyu99 648c3ee
fix(e2e-harness): make run_agent non-blocking so the MCP server survi…
gewenyu99 6195da2
refactor(e2e-harness): drive + snapshot the real TUI; one primitive f…
gewenyu99 f1f6f85
docs: drop stale e2e-full-run reference from a comment
gewenyu99 6728282
fix(tui-snapshots): capture the run screen's progression + every tran…
gewenyu99 27c6822
fix(tui-snapshots): drop the throttle; don't hang at exit on the park…
gewenyu99 e139ca6
refactor(tui-snapshots): always run the agent; drop the RUN_AGENT toggle
gewenyu99 f03297c
build: allow node-pty's build script (compiles pty.node on Linux CI)
gewenyu99 c506fea
fix(tui-capture): strip CI markers so ink renders the real TUI
gewenyu99 0574032
fix(e2e-harness): classify the source-maps-detect screen
gewenyu99 f001d64
build(e2e): keep the harness scripts out of the published package
gewenyu99 530ed80
refactor(e2e-harness): address review nits
gewenyu99 a45afa1
fix(e2e-harness): address review on open_app + run_agent
gewenyu99 5f972c1
docs(e2e-harness): concrete agent-exploration prompt with snapshotting
gewenyu99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| --- | ||
| name: exploring-the-wizard | ||
| description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end. | ||
| compatibility: Designed for Claude Code working on the PostHog wizard codebase. | ||
| metadata: | ||
| author: posthog | ||
| version: "3.0" | ||
| --- | ||
|
|
||
| # Exploring the wizard as an agent | ||
|
|
||
| Drive a real wizard run yourself: boot it on an app, read each screen, decide, act, | ||
| snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound | ||
| in this repo (registered in `.mcp.json`). For _how_ it works underneath, read | ||
| [`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md). | ||
|
|
||
| If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server | ||
| isn't approved yet — ask the user to approve `wizard-ci`, then retry. | ||
|
|
||
| ## Set up | ||
|
|
||
| Ask the user for the absolute path to their PostHog key file — e.g. "What's the | ||
| path to your phx key file?" — plus the project id and region if you don't have | ||
| them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real | ||
| fixture). Never print or commit the key. | ||
|
|
||
| ## Drive | ||
|
|
||
| 1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on | ||
| the app and returns the first screen. `appDir` is the throwaway copy. | ||
| 2. **`read_state`** — current screen, run phase, secret-free session, tasks, and | ||
| the actions legal right now. Call after every move. | ||
| 3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`, | ||
| `dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`), | ||
| `set_mcp_outcome`, `dismiss_slack`, `keep_skills`. | ||
| 4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it. | ||
| 5. **`run_agent`** — kicks off the **real integration** in the background and | ||
| returns immediately; it bootstraps credentials, so it's what advances `auth` | ||
| and `run`. Then **poll `read_state`** — `runPhase` goes `running → completed` | ||
| and the screen advances to `outro`. | ||
|
|
||
| A typical walk: | ||
|
|
||
| ``` | ||
| open_app → intro → perform_action confirm_setup | ||
| read_state → health-check → perform_action dismiss_outage | ||
| read_state → auth → run_agent (returns at once; integration runs in background) | ||
| read_state (poll) → runPhase running → completed, screen → outro | ||
| outro → perform_action dismiss_outro → … → keep_skills | ||
| ``` | ||
|
|
||
| Snapshot with `render_screen` at each key moment and save each frame to a numbered | ||
| file — `/tmp/wz-explore-snaps/NN-<screen>.txt`, incrementing `NN` in visit order — | ||
| so the run leaves a readable, ordered record you and the user can review afterward | ||
| (the same shape the CI route's `.txt` frames take). Capture the run screen as it | ||
| progresses, not just on screen changes. | ||
|
|
||
| ## Key facts | ||
|
|
||
| - **State → screen.** You never navigate; you commit a decision (an action) and the | ||
| router re-derives the active screen. Name actions, not keys. | ||
| - **`auth` and `run` advance only via `run_agent`.** They expose no action and | ||
| don't self-advance. `run_agent` returns immediately and runs the integration in | ||
| the background — poll `read_state` for `runPhase` (`running → completed`). | ||
| Everything else is an instant commit. | ||
| - **`run_agent` creates real PostHog resources** (a dashboard + insights) in the | ||
| project; each run duplicates them. | ||
| - **A green run ≠ a valid integration.** `runPhase=completed` means the flow | ||
| finished, not that the wizard understood the framework (e.g. it'll treat a Wasp | ||
| app as react-router). Read what it actually changed. | ||
| - **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "mcpServers": { | ||
| "wizard-ci": { | ||
| "command": "npx", | ||
| "args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"] | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # e2e-harness — Headless e2e Control Plane | ||
|
|
||
| How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**, | ||
| no browser, no keystrokes — and captures what it rendered. Both e2e routes share | ||
| one idea: run the real `startTUI` (the real ink render) and drive its store by | ||
| **state manipulation**, then capture the real rendered screen from a PTY. | ||
|
|
||
| > If you're an agent that just wants to run and explore the wizard, use the | ||
| > `exploring-the-wizard` skill | ||
| > ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)). | ||
| > This doc is the _how it works_ underneath. | ||
|
|
||
| ## The pieces | ||
|
|
||
| This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of | ||
| `src/` so none of it is part of the wizard's production source (nothing in `src/` | ||
| imports it; the tsdown bundle never includes it). | ||
|
|
||
| ``` | ||
| e2e-harness/ | ||
| wizard-ci-driver.ts WizardCiDriver — read_state / perform_action over the store | ||
| action-registry.ts screen → the actions legal on it (+ NO_ACTION_SCREENS) | ||
| e2e-profile.ts WizardE2eProfile + decideE2eAction — the scripted walk policy | ||
| profiles.ts per-program profiles + profileFor(programId) | ||
| tui-capture.ts run a command in a PTY (node-pty) + read its real screen (@xterm/headless) | ||
| scripts/ | ||
| tui-host.no-jest.ts the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve | ||
| tui-snapshots.no-jest.ts CI route: host(fixed) in a PTY → per-screen real-TUI snapshots | ||
| wizard-ci-mcp.no-jest.ts agent route: MCP server proxying host(serve) | ||
| ``` | ||
|
|
||
| The driver reads and mutates the **real** `WizardStore` that the TUI renders from: | ||
| the router resolves the active screen from session state, every action goes | ||
| through a store setter, and the render is a pure projection of that state. So | ||
| manipulating the store makes the real TUI react — the driver and the renderer | ||
| share one store and never conflict; you never touch the TUI's input. | ||
|
|
||
| ## Auth without a browser | ||
|
|
||
| The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**: | ||
| `getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into | ||
| credentials, and `store.setCredentials(...)` sets them — the same bearer path an | ||
| OAuth token takes, so the auth screen advances with no browser and no keystrokes. | ||
| (`run_agent` does the same bootstrap as part of the real integration.) | ||
|
|
||
| ## The two routes | ||
|
|
||
| - **CI snapshots** — `tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`) | ||
| in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the | ||
| real agent run and signals each key moment; the parent writes the real rendered | ||
| screen to `SNAP_OUT/NN-<screen>.txt` (including the run screen's progression). | ||
| - **Agent** — `wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns | ||
| `tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` / | ||
| `run_agent` forward over a unix socket; `render_screen` returns the real | ||
| captured frame. The agent decides each screen itself. | ||
|
|
||
| ## Things that bite | ||
|
|
||
| 1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`, | ||
| `CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host → | ||
| `apiKeySource: none` → 401. The harness strips these for the child. A plain CI | ||
| shell never has them. | ||
| 2. **A project-scoped key needs its project id.** Pass the team's `--project-id` | ||
| (or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch. | ||
| 3. **Never run on a real fixture.** Always a throwaway copy. | ||
| 4. **`run_agent` is minutes long and creates real resources** (a dashboard + | ||
| insights) each run; the agent log is one shared file — never run two at once. | ||
| 5. **node-pty's spawn-helper.** When the package is extracted without running its | ||
| build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute | ||
| bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores | ||
| it best-effort on each spawn. | ||
|
|
||
| ## Changing what the run does | ||
|
|
||
| Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) — | ||
| not on the program config — so this machinery stays out of production source. Edit | ||
| the program's entry (typed by `WizardE2eProfile`); the host asks | ||
| `decideE2eAction(state, profile)` what to commit on each screen. The (screen → | ||
| decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update). | ||
|
|
||
| ## Visual-regression snapshots (the workbench flow) | ||
|
|
||
| [wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route | ||
| for real-run visual regression: each test definition runs `tui-snapshots`, the | ||
| real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and | ||
| run-to-run differences are surfaced for a human, not asserted away. See | ||
| `services/wizard-ci/` there. |
93 changes: 93 additions & 0 deletions
93
e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| // Jest Snapshot v1, https://goo.gl/fbAQLP | ||
|
|
||
| exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = ` | ||
| { | ||
| "profile": { | ||
| "ask": "first", | ||
| "healthCheck": "dismiss", | ||
| "mcp": "skip", | ||
| "setup": "first", | ||
| "skills": "delete", | ||
| "slack": "skip", | ||
| }, | ||
| "program": "posthog-integration", | ||
| "trace": [ | ||
| { | ||
| "action": "confirm_setup", | ||
| "screen": "intro", | ||
| }, | ||
| { | ||
| "action": "dismiss_outage", | ||
| "screen": "health-check", | ||
| }, | ||
| { | ||
| "action": "choose", | ||
| "screen": "setup", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "auth", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "run", | ||
| }, | ||
| { | ||
| "action": "dismiss_outro", | ||
| "screen": "outro", | ||
| }, | ||
| { | ||
| "action": "set_mcp_outcome", | ||
| "screen": "mcp", | ||
| }, | ||
| { | ||
| "action": "dismiss_slack", | ||
| "screen": "slack-connect", | ||
| }, | ||
| { | ||
| "action": "keep_skills", | ||
| "screen": "keep-skills", | ||
| }, | ||
| ], | ||
| } | ||
| `; | ||
|
|
||
| exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = ` | ||
| { | ||
| "program": "posthog-integration", | ||
| "trace": [ | ||
| { | ||
| "action": "confirm_setup", | ||
| "screen": "intro", | ||
| }, | ||
| { | ||
| "action": "dismiss_outage", | ||
| "screen": "health-check", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "auth", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "run", | ||
| }, | ||
| { | ||
| "action": "dismiss_outro", | ||
| "screen": "outro", | ||
| }, | ||
| { | ||
| "action": "set_mcp_outcome", | ||
| "screen": "mcp", | ||
| }, | ||
| { | ||
| "action": "dismiss_slack", | ||
| "screen": "slack-connect", | ||
| }, | ||
| { | ||
| "action": "keep_skills", | ||
| "screen": "keep-skills", | ||
| }, | ||
| ], | ||
| } | ||
| `; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.