Skip to content

Commit 11b3b16

Browse files
ThomasK33claude
andauthored
feat(demo): record the hero + Codex/Claude demos beside a live dashboard (#116)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Thomas Kosiewski <tk@coder.com>
1 parent a852c95 commit 11b3b16

31 files changed

Lines changed: 1554 additions & 1599 deletions

.oxfmtrc.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@
1111
"dist",
1212
"node_modules",
1313
"package-lock.json",
14-
"aube-lock.yaml"
14+
"aube-lock.yaml",
15+
"dogfood/agent-uses-agent-tty/README.md",
16+
"dogfood/agent-uses-agent-tty/promoted-run-summary.md",
17+
"dogfood/agent-uses-agent-tty/manifest.json",
18+
"dogfood/agent-uses-agent-tty/artifacts"
1519
]
1620
}

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
## Changed
1616

17+
- The README hero GIF (`assets/hero.tape`) and the `dogfood/agent-uses-agent-tty/` Codex/Claude recordings now record inside a tmux two-pane split: the agent (or, in the hero, plain `agent-tty` CLI calls) drives a session on the left while `agent-tty dashboard` live-mirrors it on the right — showing the dashboard reacting as sessions are created and modified. Both panes share one `AGENT_TTY_HOME` so the dashboard auto-follows the newest session; the status bar is disabled so VHS's whole-screen `Wait+Screen` scrape stays unambiguous, and each run uses an isolated, reaped tmux server socket. The hero hides the tmux split plumbing and instead launches the dashboard on camera — typing `agent-tty dashboard` into the right pane and hopping back with the tmux prefix — and its panes/session run `bash --norc` with a minimal prompt so the live mirror stays free of personal shell-prompt clutter. It runs against this checkout's freshly-built CLI, since `agent-tty dashboard` is unreleased. A new `mise run demo:hero` task (which `depends` on `build`) regenerates the hero GIF, joining `mise run demo:agent-uses-agent-tty` for the agent recordings. `tmux` (`>= 3.1`, pinned to `3.6` in `mise`) is now a recorder prerequisite alongside `vhs`/`ttyd`/`ffmpeg`. The agent recordings now run concurrently via a bounded worker pool (`--concurrency`, default `2`) — each run is mostly an idle review-window sleep, so overlapping the two agents roughly halves wall-clock; raising the cap also overlaps an agent's own retry attempts at the cost of more CPU and shared-account load, while same-agent attempts stay serialized so two sessions of one account never record at once.
1718
- Spawned shells now default `PROMPT_EOL_MARK=` (empty) in the session environment, suppressing the inverse-video `%` end-of-partial-line marker that `zsh` prints when output lacks a trailing newline. agent-tty strips a hidden completion-marker postamble after each `run`, which desynced the rendered cursor and left that `%` in snapshots, screenshots, and recordings; the default keeps captures clean. The marker is zsh-only and inert in other shells. Opt back in per session with `agent-tty create --env PROMPT_EOL_MARK='%B%S%#%s%b' -- <shell>` to restore zsh's styled default (a lone `'%'` expands to nothing), or pass any explicit `--env PROMPT_EOL_MARK=...` value. The default is applied at PTY spawn time and is not written to the manifest, so `inspect`, `list`, and `create --json` env maps are unchanged ([#114](https://github.com/coder/agent-tty/pull/114)).
1819
- `inspect` collects renderer state and the session snapshot in a single synchronous tick before awaiting, so concurrent RPC handlers cannot interleave a mutated renderer state with a stale session snapshot ([#104](https://github.com/coder/agent-tty/pull/104)).
1920

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Drive and inspect long-lived terminal sessions from the CLI, with reviewable sna
77
[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](./LICENSE)
88
![Node](https://img.shields.io/node/v/agent-tty)
99

10-
![agent-tty: drive a terminal session and inspect it as reviewable text](./assets/hero.gif)
10+
![agent-tty: drive a terminal session and watch it live in the dashboard](./assets/hero.gif)
1111

1212
`agent-tty` keeps a real PTY-backed terminal session alive across separate CLI invocations. You `run` a command in it, `wait` for the screen to reach a condition instead of sleeping, then capture what happened as a semantic text snapshot, a PNG screenshot, an asciinema-compatible `.cast`, or a WebM. The recording is the point: a human — or an AI coding agent — can replay and verify exactly what the terminal did, instead of trusting a blind script.
1313

@@ -101,8 +101,8 @@ Real Codex and Claude TUIs discovering the `agent-tty` skill, driving `nvim --cl
101101
<th width="50%">Claude</th>
102102
</tr>
103103
<tr>
104-
<td><video src="https://github.com/user-attachments/assets/27cc3b9b-9b91-4cd9-a3a5-1bbb61c33e19" controls width="100%"></video></td>
105-
<td><video src="https://github.com/user-attachments/assets/36221ef7-97c4-4b06-b673-21ac623a5f0a" controls width="100%"></video></td>
104+
<td><video src="https://github.com/user-attachments/assets/f1823164-330c-4962-8adf-2b825080e06f" controls width="100%"></video></td>
105+
<td><video src="https://github.com/user-attachments/assets/966bed35-9383-444e-b06a-1d103ccba49a" controls width="100%"></video></td>
106106
</tr>
107107
</table>
108108

assets/hero.gif

53.5 KB
Loading

assets/hero.tape

Lines changed: 102 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,135 @@
11
# assets/hero.tape — generates the README hero GIF with VHS (charmbracelet/vhs).
22
#
3-
# Run from the repo root:
4-
# vhs assets/hero.tape # writes assets/hero.gif
3+
# Regenerate (from the repo root):
4+
# mise run demo:hero # builds the dev CLI, then runs this tape
55
#
6-
# Prereqs (vhs 0.11.0 + jq are pinned in mise's [tools]):
7-
# - `agent-tty` on PATH (npm i -g agent-tty, or `npm run build && npm link`)
8-
# - `jq`
9-
# It uses the fast, browser-free `libghostty-vt` renderer so the GIF stays short
10-
# (no Chromium boot). For a crisper, tighter, more "terminal" look, tune
11-
# FontFamily (must be installed!), FontSize, LineHeight, LetterSpacing, and
12-
# Padding below, plus the per-step Sleeps. The session is sized to 72x18 so a
13-
# 26pt font fills the frame without the 80-col default wrapping. Regenerate,
14-
# then replace the HERO DEMO comment block in README.md with:
15-
# ![agent-tty: drive a terminal session and inspect it as reviewable text](./assets/hero.gif)
6+
# That task `depends` on `build` and puts vhs/ffmpeg/tmux/jq on PATH. To run the
7+
# tape directly instead, build first and provide those tools yourself:
8+
# npm run build && vhs assets/hero.tape
9+
#
10+
# Why build first: this hero shows `agent-tty dashboard`, which is unreleased —
11+
# a globally-installed `agent-tty` (e.g. `npm i -g agent-tty`) won't have it. So
12+
# the hidden setup below points PATH at THIS checkout's freshly-built CLI
13+
# (dist/cli/main.js) instead of `Require agent-tty`. It also needs the
14+
# dashboard's optional renderer `@coder/libghostty-vt-node` (a normal `npm i`
15+
# fetches it; check with `agent-tty doctor`).
16+
#
17+
# Prereqs (jq + tmux are pinned in mise's [tools]; tmux >= 3.1 for `-l 50%`):
18+
# - `npm run build` has been run in this checkout
19+
# - `jq`, `tmux`
20+
# - `ttyd` (VHS needs it): mise installs it on Linux, but it has no macOS
21+
# binary — on macOS run `brew install ttyd` yourself (mise finds it on PATH).
22+
#
23+
# Structure: the hidden setup builds a clean tmux split (left = operator shell,
24+
# right = idle shell). The panes and the agent-tty session all run `bash --norc`
25+
# (with a minimal `$ ` prompt on the panes via PS1) so they — and the dashboard's
26+
# live mirror — stay free of personal prompt clutter; `--norc` is what keeps the
27+
# user's interactive shell config (e.g. zsh `%{…%}` prompt escapes) out of the
28+
# frame. The visible recording then *types* `agent-tty dashboard` into the right
29+
# pane (so viewers see how it's launched), hops back to the left with the tmux
30+
# prefix, and drives a session whose changes the dashboard mirrors live.
31+
# AGENT_TTY_HOME is exported before tmux so the server and both panes share it.
32+
# (For a minimal `$ ` prompt in the mirror too, add `--env 'PS1=$ '` to create.)
33+
#
34+
# TUNING (do a visual pass after regenerating): Width/Height/FontSize and the
35+
# split percentage (`-l 50%`) trade off readability of the two panes. The split
36+
# is 50/50; FontSize is 18 so the longest CLI lines fit one un-split half — bump
37+
# the font only if you also widen the frame, or the operator-pane lines wrap.
38+
# The session defaults to 80x24, a touch wider than the half-width dashboard
39+
# pane, so its mirror clips to the top-left (where the `echo` output lands); add
40+
# `--cols/--rows` to `create` for a tighter, fully-visible mirror. FontFamily fallbacks if FiraCode
41+
# isn't installed: "Menlo", "SF Mono", "JetBrains Mono". Keep the README hero
42+
# pointing at ./assets/hero.gif:
43+
# ![agent-tty: drive a terminal session and watch it live in the dashboard](./assets/hero.gif)
1644

1745
Output assets/hero.gif
1846

19-
Require agent-tty
2047
Require jq
48+
Require tmux
2149

2250
Set Shell bash
2351
# Use a font that's actually installed (VHS silently falls back to an ugly
2452
# default otherwise). FiraCode Nerd Font Mono is on this machine and reads clean.
25-
# Bulletproof alternatives: "Menlo", "SF Mono", "Monaco", "JetBrains Mono".
2653
Set FontFamily "FiraCode Nerd Font Mono"
27-
Set FontSize 26
28-
Set Width 1280
29-
Set Height 640
30-
Set Padding 16 # tighter frame (was 28)
31-
Set LineHeight 1.0 # tight lines; nudge to ~1.15 if they touch
32-
Set LetterSpacing 0 # no extra tracking
33-
Set Theme "Catppuccin Mocha" # any VHS theme works; try "Dracula", "Nord"
54+
# 18pt (not 20) so each half of the 50/50 split is wide enough for the longest
55+
# CLI lines (the `create … | jq` and `run --no-wait …` lines) without wrapping.
56+
Set FontSize 18
57+
Set Width 1920
58+
Set Height 720
59+
Set Padding 16
60+
Set LineHeight 1.0
61+
Set LetterSpacing 0
62+
Set Theme "Catppuccin Mocha"
3463
Set TypingSpeed 40ms
3564
Set PlaybackSpeed 1.0
3665

37-
# --- hidden setup: isolated home + fast native renderer, then a clean screen ---
66+
# --- hidden setup: dev CLI on PATH + a clean tmux split (operator | idle) ---
67+
# AGENT_TTY_HOME, PATH, and PS1 are exported BEFORE tmux so the server and both
68+
# `bash --norc` panes inherit them. The split runs idle shells; the dashboard is
69+
# launched visibly below. kill-server first makes regeneration idempotent.
3870
Hide
3971
Type "export AGENT_TTY_HOME=$(mktemp -d) AGENT_TTY_RENDERER=libghostty-vt" Enter
40-
Type "clear" Enter
72+
Type "export HERO_BIN=$(mktemp -d)" Enter
73+
Type 'chmod +x dist/cli/main.js && ln -sf "$PWD/dist/cli/main.js" "$HERO_BIN/agent-tty"' Enter
74+
Type 'export PATH="$HERO_BIN:$PATH"' Enter
75+
Type "export PS1='$ '" Enter
76+
Type "tmux -L hero kill-server 2>/dev/null; true" Enter
77+
Type "tmux -f /dev/null -L hero new-session -d -s hero 'bash --norc' \; set -g status off \; split-window -h -l 50% -t hero 'bash --norc' \; select-pane -t hero.0 \; attach -t hero" Enter
4178
Show
4279

43-
Sleep 800ms
44-
Type "# open a long-lived terminal session" Enter
80+
Sleep 1000ms
81+
Type "# drive a real terminal session with plain CLI calls" Enter
4582
Sleep 500ms
46-
Type 'SID=$(agent-tty create --json --cols 72 --rows 18 -- bash | jq -r .result.sessionId)' Enter
83+
Type 'SID=$(agent-tty create --json -- bash --norc | jq -r .result.sessionId)' Enter
4784
Sleep 1200ms
4885

49-
Type "# run a command inside it" Enter
86+
Type "# open the dashboard on the right to watch it live →" Enter
87+
Sleep 600ms
88+
# hop to the right pane with the tmux prefix (Ctrl+B then o = next pane)
89+
Ctrl+B
90+
Type "o"
91+
Sleep 500ms
92+
Type "agent-tty dashboard --all" Enter
93+
Sleep 2200ms
94+
# hop back to the left pane to keep driving the session
95+
Ctrl+B
96+
Type "o"
97+
Sleep 600ms
98+
99+
Type "# run a command inside it — watch the dashboard mirror it live →" Enter
50100
Sleep 500ms
51101
Type 'agent-tty run "$SID" "echo hello from agent-tty"' Enter
52-
Sleep 1500ms
102+
Sleep 2200ms
53103

54-
Type "# wait for the screen — no sleeps, no grep" Enter
104+
Type "# fire a slow command, then wait for its OUTPUT — no sleeps, no polling →" Enter
55105
Sleep 500ms
56-
Type 'agent-tty wait "$SID" --text "hello from agent-tty"' Enter
57-
Sleep 1500ms
106+
# Two things make the wait meaningful here:
107+
# 1. $RANDOM is single-quoted so the SESSION expands it — the echoed command
108+
# line shows the literal "$RANDOM", not a number.
109+
# 2. We therefore wait on a DIGIT after the colon (--regex), NOT the phrase
110+
# "your random number is:". That phrase is already on screen from the echoed
111+
# command, so a --text wait would match instantly and prove nothing; the
112+
# digits only appear once the command actually prints, after the 6s sleep.
113+
Type "agent-tty run --no-wait $SID 'sleep 6; echo your random number is: $RANDOM'" Enter
114+
# `wait` is typed right after the fire (no comment between) so the bulk of the 6s
115+
# sleep elapses WHILE wait blocks — the deterministic wait is visible on camera.
116+
Sleep 400ms
117+
Type 'agent-tty wait "$SID" --regex "random number is: [0-9]+"' Enter
118+
Sleep 5000ms
58119

59-
Type "# inspect the rendered screen as text you can diff" Enter
120+
Type "# and snapshot the result as text you can diff" Enter
60121
Sleep 500ms
61122
Type 'agent-tty snapshot "$SID" --format text' Enter
62123
Sleep 2800ms
63124

64-
Type "# screenshots, asciicasts and WebM export come from the same session" Enter
65-
Sleep 1200ms
66-
67-
# --- hidden teardown ---
125+
# --- hidden teardown: stays hidden to the end, so the GIF's last frame is the
126+
# split — NOT the bare outer shell that `kill-server` drops back to (a trailing
127+
# `Show` here flashes that shell + a "[server exited]" line, which looks ugly) ---
68128
Hide
69129
Type 'agent-tty destroy "$SID" >/dev/null 2>&1' Enter
70-
Show
71-
Sleep 500ms
72-
73-
# -----------------------------------------------------------------------------
74-
# ALTERNATIVE — dogfood it: record the loop with agent-tty itself, then convert.
75-
# Drive the same create/run/wait/snapshot sequence, then:
76-
# agent-tty record export "$SID" --format webm --out demo.webm
77-
# ffmpeg -i demo.webm -vf "fps=12,scale=1200:-1:flags=lanczos" assets/hero.gif
78-
# This makes the hero GIF literally the tool's own output ("recorded by the tool
79-
# it documents"), at the cost of the Chromium-backed ghostty-web render path.
80-
# ffmpeg isn't in mise's [tools] (it can't be cross-locked on Linux via conda);
81-
# install it yourself for this path (brew install ffmpeg / apt-get install ffmpeg).
130+
Type "tmux -L hero kill-server" Enter
131+
# kill-server returns to the outer shell, which still holds these vars; clean up
132+
# the temp home + bin so repeated local runs don't litter $TMPDIR.
133+
Type 'rm -rf "$AGENT_TTY_HOME" "$HERO_BIN"' Enter
134+
# Hidden settle time so the cleanup completes before the tape ends (no Show).
135+
Sleep 1s

dogfood/agent-uses-agent-tty/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ The Outer Hero Demo column embeds the uploaded H.264 MP4 recordings as inline Gi
1414
</tr>
1515
<tr>
1616
<td>Codex</td>
17-
<td><video src="https://github.com/user-attachments/assets/27cc3b9b-9b91-4cd9-a3a5-1bbb61c33e19" controls width="320"></video></td>
17+
<td><video src="https://github.com/user-attachments/assets/f1823164-330c-4962-8adf-2b825080e06f" controls width="320"></video></td>
1818
<td><a href="./artifacts/codex-inner-nvim.cast">cast</a>, <a href="./artifacts/codex-inner-nvim.webm">WebM</a></td>
1919
<td><a href="./artifacts/codex-final-file-proof.txt">proof</a></td>
2020
</tr>
2121
<tr>
2222
<td>Claude</td>
23-
<td><video src="https://github.com/user-attachments/assets/36221ef7-97c4-4b06-b673-21ac623a5f0a" controls width="320"></video></td>
23+
<td><video src="https://github.com/user-attachments/assets/966bed35-9383-444e-b06a-1d103ccba49a" controls width="320"></video></td>
2424
<td><a href="./artifacts/claude-inner-nvim.cast">cast</a>, <a href="./artifacts/claude-inner-nvim.webm">WebM</a></td>
2525
<td><a href="./artifacts/claude-final-file-proof.txt">proof</a></td>
2626
</tr>

dogfood/agent-uses-agent-tty/VIDEO_PLAYBACK.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,16 @@ mise run demo:agent-uses-agent-tty:upload-assets
3535
The task uses the pinned `ffmpeg`/`ffprobe` from `mise.toml`. For each agent it
3636
prepends ~0.3s of `artifacts/<agent>-thumbnail.png` as the opening frames, encodes
3737
H.264 MP4, writes ffprobe metadata, and writes checksums under `.debug/video-upload/`.
38+
The upload MP4 is encoded at the recording's own probed resolution, so it always
39+
preserves the source aspect ratio (no squish if the recording dimensions change).
3840

39-
Expected constraints for the promoted 2026-05-21 recordings:
41+
Expected constraints for the current promoted recordings (dimensions track the
42+
recording resolution, currently 1920x900):
4043

4144
| Agent | Upload file | Expected codec | Expected dimensions | Expected size |
4245
| ------ | ------------------------------------------- | ----------------- | ------------------- | ------------- |
43-
| Codex | `.debug/video-upload/codex-outer-h264.mp4` | H.264 / `yuv420p` | 1600x900 | ~3.5 MB |
44-
| Claude | `.debug/video-upload/claude-outer-h264.mp4` | H.264 / `yuv420p` | 1600x900 | ~3.4 MB |
46+
| Codex | `.debug/video-upload/codex-outer-h264.mp4` | H.264 / `yuv420p` | 1920x900 | ~3.4 MB |
47+
| Claude | `.debug/video-upload/claude-outer-h264.mp4` | H.264 / `yuv420p` | 1920x900 | ~4.0 MB |
4548

4649
Both expected sizes are below GitHub's 10 MB video attachment limit for free plans.
4750

0 commit comments

Comments
 (0)