diff --git a/.claude/skills/devcontainer-dev/SKILL.md b/.claude/skills/devcontainer-dev/SKILL.md index 67e93072c..1a7db53be 100644 --- a/.claude/skills/devcontainer-dev/SKILL.md +++ b/.claude/skills/devcontainer-dev/SKILL.md @@ -72,6 +72,22 @@ The readiness banner is what you care about. It gates on **three** signals simul Only once all three are true does the banner fire and the host's browser auto-open. +> **Process running ≠ UI rendered.** The three gates above tell you the +> backing processes are alive, not that Electron has actually painted. +> A devcontainer-in-CI proof captured a blank screen because all three +> gates passed while the window was still mid-first-paint. For +> agent-grade readiness (e.g. before driving the app via `xdotool` or +> taking a screenshot to feed to a vision model), add this gate: +> +> ```bash +> docker exec -u node "$CONTAINER" bash -c \ +> 'DISPLAY=:99 xdotool search --class ToolHive >/dev/null 2>&1' +> ``` +> +> `xdotool search --class` only succeeds once the main window is mapped +> on Xvfb. A short settling sleep (~2s) after that first match is cheap +> insurance against catching a partially rendered frame. + --- ## Per-worktree isolation @@ -146,13 +162,33 @@ All commands run via `docker exec` against the container with `DISPLAY=:99` set ### See the screen (screenshots) +Use the project's helper script — it auto-finds the container, captures the +root window, and streams the PNG out to the host. Prints the absolute host +path on stdout so it composes: + ```bash -# Take a PNG of the whole virtual framebuffer +SHOT=$(scripts/devcontainer-screenshot.sh) +# or with an explicit path: +scripts/devcontainer-screenshot.sh /tmp/shot.png +``` + +**Why a helper script and not just `docker cp`?** `/tmp` (and possibly +other paths) inside the devcontainer is mounted as `tmpfs`. Docker's +`docker cp` cannot read from tmpfs mounts — it only traverses the +container's overlay layers — so the obvious one-liner + +```bash +# DON'T — silently broken: import succeeds, ls confirms the file, +# but docker cp says "Could not find the file in container". docker exec "$CONTAINER" bash -c 'DISPLAY=:99 import -window root /tmp/shot.png' -# Copy it to the host for viewing / feeding to a vision model docker cp "$CONTAINER:/tmp/shot.png" /tmp/shot.png ``` +**fails everywhere** (CI and local). The helper bypasses `docker cp` by +streaming the file via `docker exec cat` (see `scripts/devcontainer-steal.sh` +for the generic version — use that one to extract any tmpfs file, e.g. logs +or generated artifacts, not just screenshots). + `import` is from ImageMagick. For a specific window only, use `xwininfo` to get the WID then `import -window `. ### See the window tree (what's there, where, which is focused) diff --git a/.codex/skills/devcontainer-dev/SKILL.md b/.codex/skills/devcontainer-dev/SKILL.md index 67e93072c..1a7db53be 100644 --- a/.codex/skills/devcontainer-dev/SKILL.md +++ b/.codex/skills/devcontainer-dev/SKILL.md @@ -72,6 +72,22 @@ The readiness banner is what you care about. It gates on **three** signals simul Only once all three are true does the banner fire and the host's browser auto-open. +> **Process running ≠ UI rendered.** The three gates above tell you the +> backing processes are alive, not that Electron has actually painted. +> A devcontainer-in-CI proof captured a blank screen because all three +> gates passed while the window was still mid-first-paint. For +> agent-grade readiness (e.g. before driving the app via `xdotool` or +> taking a screenshot to feed to a vision model), add this gate: +> +> ```bash +> docker exec -u node "$CONTAINER" bash -c \ +> 'DISPLAY=:99 xdotool search --class ToolHive >/dev/null 2>&1' +> ``` +> +> `xdotool search --class` only succeeds once the main window is mapped +> on Xvfb. A short settling sleep (~2s) after that first match is cheap +> insurance against catching a partially rendered frame. + --- ## Per-worktree isolation @@ -146,13 +162,33 @@ All commands run via `docker exec` against the container with `DISPLAY=:99` set ### See the screen (screenshots) +Use the project's helper script — it auto-finds the container, captures the +root window, and streams the PNG out to the host. Prints the absolute host +path on stdout so it composes: + ```bash -# Take a PNG of the whole virtual framebuffer +SHOT=$(scripts/devcontainer-screenshot.sh) +# or with an explicit path: +scripts/devcontainer-screenshot.sh /tmp/shot.png +``` + +**Why a helper script and not just `docker cp`?** `/tmp` (and possibly +other paths) inside the devcontainer is mounted as `tmpfs`. Docker's +`docker cp` cannot read from tmpfs mounts — it only traverses the +container's overlay layers — so the obvious one-liner + +```bash +# DON'T — silently broken: import succeeds, ls confirms the file, +# but docker cp says "Could not find the file in container". docker exec "$CONTAINER" bash -c 'DISPLAY=:99 import -window root /tmp/shot.png' -# Copy it to the host for viewing / feeding to a vision model docker cp "$CONTAINER:/tmp/shot.png" /tmp/shot.png ``` +**fails everywhere** (CI and local). The helper bypasses `docker cp` by +streaming the file via `docker exec cat` (see `scripts/devcontainer-steal.sh` +for the generic version — use that one to extract any tmpfs file, e.g. logs +or generated artifacts, not just screenshots). + `import` is from ImageMagick. For a specific window only, use `xwininfo` to get the WID then `import -window `. ### See the window tree (what's there, where, which is focused) diff --git a/.cursor/skills/devcontainer-dev/SKILL.md b/.cursor/skills/devcontainer-dev/SKILL.md index 67e93072c..1a7db53be 100644 --- a/.cursor/skills/devcontainer-dev/SKILL.md +++ b/.cursor/skills/devcontainer-dev/SKILL.md @@ -72,6 +72,22 @@ The readiness banner is what you care about. It gates on **three** signals simul Only once all three are true does the banner fire and the host's browser auto-open. +> **Process running ≠ UI rendered.** The three gates above tell you the +> backing processes are alive, not that Electron has actually painted. +> A devcontainer-in-CI proof captured a blank screen because all three +> gates passed while the window was still mid-first-paint. For +> agent-grade readiness (e.g. before driving the app via `xdotool` or +> taking a screenshot to feed to a vision model), add this gate: +> +> ```bash +> docker exec -u node "$CONTAINER" bash -c \ +> 'DISPLAY=:99 xdotool search --class ToolHive >/dev/null 2>&1' +> ``` +> +> `xdotool search --class` only succeeds once the main window is mapped +> on Xvfb. A short settling sleep (~2s) after that first match is cheap +> insurance against catching a partially rendered frame. + --- ## Per-worktree isolation @@ -146,13 +162,33 @@ All commands run via `docker exec` against the container with `DISPLAY=:99` set ### See the screen (screenshots) +Use the project's helper script — it auto-finds the container, captures the +root window, and streams the PNG out to the host. Prints the absolute host +path on stdout so it composes: + ```bash -# Take a PNG of the whole virtual framebuffer +SHOT=$(scripts/devcontainer-screenshot.sh) +# or with an explicit path: +scripts/devcontainer-screenshot.sh /tmp/shot.png +``` + +**Why a helper script and not just `docker cp`?** `/tmp` (and possibly +other paths) inside the devcontainer is mounted as `tmpfs`. Docker's +`docker cp` cannot read from tmpfs mounts — it only traverses the +container's overlay layers — so the obvious one-liner + +```bash +# DON'T — silently broken: import succeeds, ls confirms the file, +# but docker cp says "Could not find the file in container". docker exec "$CONTAINER" bash -c 'DISPLAY=:99 import -window root /tmp/shot.png' -# Copy it to the host for viewing / feeding to a vision model docker cp "$CONTAINER:/tmp/shot.png" /tmp/shot.png ``` +**fails everywhere** (CI and local). The helper bypasses `docker cp` by +streaming the file via `docker exec cat` (see `scripts/devcontainer-steal.sh` +for the generic version — use that one to extract any tmpfs file, e.g. logs +or generated artifacts, not just screenshots). + `import` is from ImageMagick. For a specific window only, use `xwininfo` to get the WID then `import -window `. ### See the window tree (what's there, where, which is focused) diff --git a/.github/workflows/experiment-devcontainer-proof.yml b/.github/workflows/experiment-devcontainer-proof.yml new file mode 100644 index 000000000..ddc482f7d --- /dev/null +++ b/.github/workflows/experiment-devcontainer-proof.yml @@ -0,0 +1,259 @@ +name: Devcontainer-in-CI Proof + +# Proof-of-concept: boot the project's devcontainer (Xvfb + fluxbox + Electron +# + thv) inside a GitHub Actions runner, drive the live app via docker exec +# (xdotool, screenshots), and run a workspace smoke check (type-check) inside +# the container. No agent involved — this isolates the genuinely novel piece +# (devcontainer-in-CI) from the AI-agent piece, which has a separate +# constraint (`claude-code-action` enforces workflow_file == default branch). +# +# Trigger: push to this branch (no claude-code-action means no validation). +# git commit --allow-empty -m "trigger" && git push +# To skip a run, use [skip ci] in the commit message. +# +# Once this proof is reliable, integrating it into the production +# `_bug-fix-agent.yml` is small: add the devcontainer steps before the +# claude-code-action step, expand `--allowedTools`, append the live-app +# paragraph to the prompt. + +on: + push: + branches: + - experiment/bug-fix-visual + +permissions: + contents: read + +concurrency: + group: devcontainer-proof-${{ github.ref }} + cancel-in-progress: true + +jobs: + proof: + name: Devcontainer Proof + runs-on: ubuntu-latest + timeout-minutes: 45 + env: + # Required by .devcontainer/devcontainer.json runArgs port mapping. + # Anything valid; we don't expose the port off-runner. + NOVNC_HOST_PORT: 6080 + + steps: + - name: Mark t0 (workflow start) + id: t0 + run: echo "epoch=$(date +%s)" >> $GITHUB_OUTPUT + + - name: Checkout + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 + + - name: Set up Docker Buildx + # Required so BuildKit's `gha` cache backend is available below. + # Without this, `cacheFrom: type=gha` fails with + # "unknown cache importer: gha" and the build goes fully cold. + uses: docker/setup-buildx-action@v3 + + - name: Cache devcontainer node_modules + # Image-layer caching alone doesn't help us, because the + # devcontainer's `node_modules` lives in a named Docker volume that + # is created fresh per run (volumes don't survive across CI jobs). + # postCreateCommand then runs `pnpm install` from scratch on every + # run, which dominates the build step's wall time. We cache the + # contents of that volume here and restore them into the volume + # before devcontainers/ci starts the container, so postCreateCommand + # sees an already-populated node_modules and pnpm install is a no-op + # (lockfile-equal). + id: nm-cache + uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5 + with: + path: nm-cache + key: dc-nm-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }} + + - name: Pre-populate devcontainer node_modules volume + if: steps.nm-cache.outputs.cache-hit == 'true' + run: | + # The volume name follows devcontainer.json's + # source=toolhive-node-modules-${localWorkspaceFolderBasename}. + # In CI the workspace basename is always `toolhive-studio`. + VOLUME=toolhive-node-modules-toolhive-studio + docker volume create "$VOLUME" >/dev/null + docker run --rm \ + -v "$PWD/nm-cache:/src:ro" \ + -v "$VOLUME:/dst" \ + alpine:3 sh -c 'cp -a /src/. /dst/' + + - name: Build & start devcontainer (cached via gha) + uses: devcontainers/ci@v0.3 + with: + imageName: toolhive-studio-devcontainer + cacheFrom: type=gha + push: never + runCmd: | + echo "Devcontainer up. Workspace: $(pwd)" + ls -la + node --version + pnpm --version + + - name: Mark t1 (devcontainer up); record cold/warm + id: t1 + run: | + NOW=$(date +%s) + ELAPSED=$(( NOW - ${{ steps.t0.outputs.epoch }} )) + echo "epoch=$NOW" >> $GITHUB_OUTPUT + echo "elapsed=$ELAPSED" >> $GITHUB_OUTPUT + echo "::notice::Devcontainer up after ${ELAPSED}s" + + - name: Find container ID + id: container + run: | + C=$(docker ps \ + --filter "label=devcontainer.local_folder=$GITHUB_WORKSPACE" \ + --format '{{.ID}}' | head -1) + if [ -z "$C" ]; then + echo "::error::No devcontainer found for $GITHUB_WORKSPACE" + docker ps + exit 1 + fi + echo "id=$C" >> $GITHUB_OUTPUT + echo "::notice::Devcontainer ID: $C" + + - name: Launch app inside devcontainer (background) + run: | + docker exec -d -u node ${{ steps.container.outputs.id }} \ + bash -c 'cd /workspaces/toolhive-studio && \ + nohup bash scripts/devcontainer-entrypoint.sh \ + > /tmp/entrypoint.log 2>&1 &' + + - name: Wait for app readiness + id: ready + timeout-minutes: 10 + run: | + C=${{ steps.container.outputs.id }} + T_START=$(date +%s) + for i in $(seq 1 120); do + # Process gates AND a window-mapped check via xdotool. The previous + # version only checked process presence, which is not the same as + # "UI has rendered" — the first proof run captured a blank screen + # because Electron was alive but had not painted yet. The + # `xdotool search --class ToolHive` call only succeeds once the + # app's main window is mapped on Xvfb. Bracket trick on pgrep + # avoids self-match (see devcontainer-dev skill). + if docker exec -u node "$C" bash -c ' + curl -fsS http://localhost:6080/ >/dev/null 2>&1 \ + && pgrep -f "[e]lectron/dist/electron" >/dev/null \ + && pgrep -f "[t]hv serve" >/dev/null \ + && DISPLAY=:99 xdotool search --class ToolHive >/dev/null 2>&1 + '; then + # Settling pause — the window may be mid-first-paint when + # xdotool first finds it. Cheap insurance against a partially + # rendered screenshot. + sleep 2 + NOW=$(date +%s) + ELAPSED=$(( NOW - T_START )) + SINCE_T1=$(( NOW - ${{ steps.t1.outputs.epoch }} )) + echo "::notice::App ready in ${ELAPSED}s after launch (${SINCE_T1}s after devcontainer up)" + echo "elapsed=$ELAPSED" >> $GITHUB_OUTPUT + echo "since_t1=$SINCE_T1" >> $GITHUB_OUTPUT + exit 0 + fi + echo "[$i/120] not ready yet..." + sleep 5 + done + echo "::error::App did not become ready within 10 minutes" + docker exec "$C" tail -200 /tmp/entrypoint.log || true + exit 1 + + - name: Screenshot 1 — initial paint + run: | + scripts/devcontainer-screenshot.sh shot1.png + ls -la shot1.png + + - name: Drive xdotool — focus app and send Tab + run: | + C=${{ steps.container.outputs.id }} + docker exec -u node "$C" bash -c 'DISPLAY=:99 xdotool search --name ToolHive windowactivate' || true + sleep 2 + docker exec -u node "$C" bash -c 'DISPLAY=:99 xdotool key Tab' || true + sleep 1 + + - name: Screenshot 2 — after input + run: | + scripts/devcontainer-screenshot.sh shot2.png + ls -la shot2.png + + - name: Workspace smoke check — type-check inside container + id: type_check + run: | + C=${{ steps.container.outputs.id }} + T_START=$(date +%s) + if docker exec -u node "$C" bash -c \ + 'cd /workspaces/toolhive-studio && pnpm run type-check'; then + NOW=$(date +%s) + echo "elapsed=$(( NOW - T_START ))" >> $GITHUB_OUTPUT + echo "result=pass" >> $GITHUB_OUTPUT + else + NOW=$(date +%s) + echo "elapsed=$(( NOW - T_START ))" >> $GITHUB_OUTPUT + echo "result=fail" >> $GITHUB_OUTPUT + echo "::warning::type-check failed" + fi + + - name: Dump devcontainer node_modules for cache + # Runs only on cache miss. Populates nm-cache/ with the volume + # contents so actions/cache's post-step picks it up for next time. + # Multiple containers can mount the same named volume simultaneously, + # so this doesn't disturb the running devcontainer. + if: always() && steps.nm-cache.outputs.cache-hit != 'true' + run: | + VOLUME=toolhive-node-modules-toolhive-studio + rm -rf nm-cache && mkdir -p nm-cache + docker run --rm \ + -v "$VOLUME:/src:ro" \ + -v "$PWD/nm-cache:/dst" \ + alpine:3 sh -c 'cp -a /src/. /dst/' + echo "::notice::node_modules dumped: $(du -sh nm-cache | cut -f1)" + + - name: Upload screenshots + if: always() + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7 + with: + name: screenshots + path: | + shot1.png + shot2.png + if-no-files-found: warn + retention-days: 7 + + - name: Job summary + if: always() + run: | + { + echo "## Devcontainer-in-CI Proof" + echo + echo "| Stage | Elapsed |" + echo "|---|---|" + echo "| Checkout + devcontainer build/start | ${{ steps.t1.outputs.elapsed || 'n/a' }}s |" + echo "| App readiness (after launch) | ${{ steps.ready.outputs.elapsed || 'n/a' }}s |" + echo "| App readiness (since devcontainer) | ${{ steps.ready.outputs.since_t1 || 'n/a' }}s |" + echo "| Type-check inside container | ${{ steps.type_check.outputs.elapsed || 'n/a' }}s (${{ steps.type_check.outputs.result || 'n/a' }}) |" + echo + echo "Cache backend: \`type=gha\`. Compare cold (first run) and warm (subsequent) timings to evaluate cache effectiveness." + echo + echo "Screenshots uploaded as the **screenshots** artifact (\`shot1.png\` initial, \`shot2.png\` after Tab keystroke)." + } >> $GITHUB_STEP_SUMMARY + + - name: Diagnostics on failure + if: failure() + run: | + C=${{ steps.container.outputs.id }} + if [ -n "$C" ]; then + echo '=== Container processes ===' + docker exec "$C" ps auxf || true + echo '=== entrypoint.log ===' + docker exec "$C" tail -300 /tmp/entrypoint.log || true + echo '=== xvfb.log ===' + docker exec "$C" tail -100 /tmp/xvfb.log || true + echo '=== fluxbox.log ===' + docker exec "$C" tail -100 /tmp/fluxbox.log || true + echo '=== keyring.log ===' + docker exec "$C" tail -50 /tmp/keyring.log || true + fi diff --git a/scripts/devcontainer-screenshot.sh b/scripts/devcontainer-screenshot.sh new file mode 100755 index 000000000..50e478586 --- /dev/null +++ b/scripts/devcontainer-screenshot.sh @@ -0,0 +1,52 @@ +#!/usr/bin/env bash +# Capture the running ToolHive Studio screen via X11 (Xvfb on :99) inside +# the project's devcontainer and place the PNG at the given host path. +# +# Default host path: ./toolhive-shot.png +# +# Internally: +# 1. Locate the devcontainer attached to the current workspace (or +# $GITHUB_WORKSPACE in CI). +# 2. Run `import -window root` inside the container to capture the root +# window of Xvfb on display :99. +# 3. Stream the file back to the host via devcontainer-steal.sh +# (necessary because /tmp inside the container is tmpfs — see that +# script for full context). +# 4. Clean up the intermediate file inside the container. +# +# Usage: +# devcontainer-screenshot.sh [host-path] +# +# Prints the absolute host path on stdout for ergonomic chaining: +# SHOT=$(scripts/devcontainer-screenshot.sh) +# +set -euo pipefail + +HOST_PATH="${1:-./toolhive-shot.png}" +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +WORKSPACE="${GITHUB_WORKSPACE:-$(pwd)}" + +CONTAINER=$(docker ps \ + --filter "label=devcontainer.local_folder=$WORKSPACE" \ + --format '{{.ID}}' | head -1) + +if [ -z "$CONTAINER" ]; then + echo "devcontainer-screenshot.sh: no devcontainer found for $WORKSPACE" >&2 + echo " (is the devcontainer running? try: pnpm devContainer:dev)" >&2 + exit 1 +fi + +CONTAINER_TMP="/tmp/.devcontainer-shot-$$.png" + +docker exec -u node "$CONTAINER" bash -c \ + "DISPLAY=:99 import -window root '$CONTAINER_TMP'" + +"$SCRIPT_DIR/devcontainer-steal.sh" "$CONTAINER" "$CONTAINER_TMP" "$HOST_PATH" + +docker exec -u node "$CONTAINER" rm -f "$CONTAINER_TMP" || true + +# Resolve to absolute path for downstream consumers. +case "$HOST_PATH" in + /*) echo "$HOST_PATH" ;; + *) echo "$(cd "$(dirname "$HOST_PATH")" && pwd)/$(basename "$HOST_PATH")" ;; +esac diff --git a/scripts/devcontainer-steal.sh b/scripts/devcontainer-steal.sh new file mode 100755 index 000000000..ab3af7b3e --- /dev/null +++ b/scripts/devcontainer-steal.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash +# Stream a file out of a running devcontainer to the host filesystem. +# +# Why this exists: `/tmp` (and possibly other paths) inside the +# ToolHive Studio devcontainer are mounted as tmpfs. Docker's `docker cp` +# cannot read from tmpfs mounts — it only traverses the container's overlay +# layers — so `docker cp $C:/tmp/foo .` returns "Could not find the file" +# even when `ls /tmp/foo` inside the container confirms it exists. This +# script bypasses `docker cp` by streaming the file via `docker exec cat`, +# which works regardless of mount type. +# +# Usage: +# devcontainer-steal.sh +# +# Example: +# devcontainer-steal.sh "$C" /tmp/foo.bin ./foo.bin +# +# Tip: the screenshot helper (devcontainer-screenshot.sh) calls this +# internally, so most callers don't need to invoke it directly. Reach for +# this script when extracting non-screenshot artifacts (logs, generated +# files, etc.) from /tmp or any other tmpfs-backed path. +set -euo pipefail + +CONTAINER="${1:?usage: devcontainer-steal.sh }" +SRC="${2:?missing container path}" +DEST="${3:?missing host path}" + +docker exec -u node "$CONTAINER" cat "$SRC" > "$DEST"