Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions .claude/skills/screenshot/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
name: capturing-screenshots
description: Capture or update documentation screenshots for the Quarto website using Playwright. Use when screenshots need refreshing, new screenshots are needed for docs pages, or the user mentions screenshots, screen captures, or visual documentation.
allowed-tools: Bash(node _tools/screenshots/*), Bash(npm.cmd *), Bash(cat *), Bash(playwright-cli *), Bash(oxipng *), Agent
---

## Setup

When this skill loads, run these commands to gather context:

1. **List registered screenshots:** `bash "${CLAUDE_SKILL_DIR}/scripts/list-screenshots.sh"`
2. **Read visual rules:** `cat _tools/screenshots/CLAUDE.md`
3. **Read capture agent reference:** `cat "${CLAUDE_SKILL_DIR}/capture-agent.md"`
4. **Read manifest schema:** `cat "${CLAUDE_SKILL_DIR}/manifest-schema.md"`

**Working directory:** `npm run` commands (`render`, `capture`, `compress`) work from
any directory — they resolve paths from `_tools/screenshots/package.json`. Direct
`node scripts/...` calls and `playwright-cli` must run from `_tools/screenshots/` or
use absolute paths. Be careful not to double up path segments if you've already `cd`'d
into `_tools/screenshots/`.

## Instructions

You are the screenshot orchestrator. The list output shows all registered screenshots, the visual rules define quality standards, and the capture agent reference describes how browser operations work.

### If the user wants to UPDATE existing screenshots:

1. Ask which screenshots to update (or "all")
2. Process screenshots **one at a time** — never batch-capture without confirmation:
a. Render: `node _tools/screenshots/scripts/render.js <project-path>` (can batch-render all profiles upfront)
b. Capture: `npm run capture -- --name <name>` (handles serve, capture, dark variant, compress)
c. Show the user the output image(s) using the Read tool
d. **STOP and wait for explicit confirmation** before proceeding to the next screenshot
e. If the user requests adjustments, update manifest and re-capture
f. Only after confirmation, move to the next screenshot
3. Show results summary

**Critical:** Each screenshot requires user visual review and explicit approval. Do not proceed to the next screenshot until the user confirms the current one is acceptable. This applies to both new captures and re-captures of existing screenshots.

### If the user wants to CREATE a new screenshot:

Gather these parameters (ask about unknowns, infer from context when obvious):

| Parameter | Values / Notes |
|-----------|---------------|
| Source type | `url` (live site), `example` (Quarto project — render then serve) |
| Source detail | URL or example project path (create minimal project if needed) |
| Viewport | navbar=1440x400, sidebar=992x600, about=1200x900, full page=1440x900 |
| Zoom | Default 1.0; use 1.15 for about pages or excess internal padding |
| Element | CSS selector if capturing a specific element; omit for full viewport |
| Interactions | Clicks, hovers, etc. needed before capture |
| Trim / Crop | `trim: true` for uniform background edges; `cropBottom`/`maxHeight` when vertical rules prevent trim |
| Output path | Suggest based on doc location |
| Doc file | Which .qmd references this image (for manifest `doc.file`) |

Then work through two phases:

#### Phase A: Visual design (what to capture)

Use playwright-cli to explore the page interactively and nail down the visual.
Phase A ends when the user approves the screenshot visual.

1. Create example project if needed
2. Render: `node _tools/screenshots/scripts/render.js <project-path>` (add `--profile <name>` if needed)
3. Serve the **rendered output directory**: `node _tools/screenshots/scripts/serve.js <output-dir>`
The serve script takes a directory path — it does not understand `--profile`.
For default renders, the output is `_site/` inside the project. For profiled renders,
it's `docs-<profile>/` (e.g., `examples/navbar-basic/docs-reader-mode`). Check the
render output to confirm the actual path.
4. Open in headed mode: `playwright-cli -s=screenshot open --headed <url>`
(headed mode shows the browser window so you can see the page)
5. Discover what to capture:
a. Take a snapshot (`playwright-cli -s=screenshot snapshot`) to see page structure
b. If replacing an existing screenshot, download and read the current image to
understand what it looks like (e.g., `curl -sL -o "$TMPDIR/existing.png" <url>`
then Read tool). Note what's included, cropped, and framed — the new
screenshot should match unless the doc content has changed.
c. Read the .qmd doc file to understand what the image should illustrate — check
the YAML example above the image, the fig-alt text, and surrounding prose
d. Determine initial viewport from the category table (navbar=1440x400,
sidebar=992x600, about=1200x900, full page=1440x900)
6. Test and iterate in headed mode:
a. Resize: `playwright-cli -s=screenshot resize <w> <h>`
b. Test cleanup evals if needed (hiding elements, removing banners)
c. Test interactions (click/hover) — take snapshot, find ref, click, verify state
d. Take a test screenshot:
`playwright-cli -s=screenshot screenshot --filename="$TMPDIR/test.png"`
e. Show the screenshot to the user: `npm run open -- "$TMPDIR/test.png"`
(cross-platform; do NOT use `open` or `start` directly)
f. Provide review context so the user can judge the screenshot:
- Which .qmd file and section (line number, heading)
- The fig-alt text (what the image is supposed to show)
- The code example shown alongside it in the doc (if any)
- A link to the live doc page if available (e.g., quarto.org URL)
- What to specifically check (does navbar match the YAML? Are the
right items visible? etc.)
g. Ask: "Does this capture what the doc needs? Anything to adjust?"
h. Repeat until the user approves the visual
7. Encode findings into manifest:
a. Read manifest-schema.md for the complete field reference
b. Create the manifest entry based on what was validated interactively
c. Every field value should come from tested exploration, not guesswork

Use `playwright-cli --help` to discover available commands.
See capture-agent.md for `eval` vs `run-code` guidance — use `run-code` for complex JS.

#### When stuck: Chrome DevTools MCP (only if available)

If playwright-cli's shell escaping fights you on complex JS (template literals,
nested quotes, `getComputedStyle`), Chrome DevTools MCP can help — but ONLY if
it's available in the current session, and ALWAYS ask the user before switching.

- `evaluate_script` — proper JS function, no shell escaping layer
- `take_screenshot` — inline visual feedback in conversation
- Best for: iterative CSS/DOM debugging (e.g., spotlight stacking contexts)
- Trade-off: more verbose output per call = higher token usage

Never switch to Chrome DevTools MCP proactively. Suggest it as an option and
let the user decide.

#### Phase B: Image processing (how to post-process)

Phase B starts after the user approves the visual in Phase A and a manifest entry
exists. Now run the automated capture pipeline and tune post-processing.

1. Add the manifest entry to `_tools/screenshots/manifest.json`
2. Run `npm run validate` to check the manifest entry
3. Run `npm run capture -- --name <name>` to produce the screenshot
3. Show the user the output — ask them to verify visually
4. If blank space remains, decide with the user:
- **Uniform background edges?** → add `"trim": true`
- **Vertical rules or multi-color edges?** → add `"cropBottom": N` or `"maxHeight": N`
- **Both?** → trim runs first, then crop
5. Re-capture and verify until the user is satisfied

### Launching the capture agent:

Use the Agent tool with `subagent_type="general-purpose"` and `model="sonnet"`. Pass:
- The base URL where the site is being served
- The capture agent reference (from `${CLAUDE_SKILL_DIR}/capture-agent.md`)
- Specific screenshot details: viewport, cleanup, interactions, element, output path
- Note: zoom and post-processing (trim, crop) are handled by capture.js, not the agent. If the agent captures manually, it should apply zoom via `page.evaluate(z => document.body.style.zoom = z, String(zoom))`
- Instruct it to follow the capture workflow and use `-s=screenshot` session flag
178 changes: 178 additions & 0 deletions .claude/skills/screenshot/capture-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Screenshot Capture Agent

## Contents
- Capture Workflow: navigate, resize, zoom, wait, cleanup, interactions, screenshot
- Dark mode variants
- Visual validation
- Rules

You capture documentation screenshots using playwright-cli. You receive:
- A base URL where the page is already being served
- One or more screenshot specifications from the manifest

Always use the session flag: `-s=screenshot` for all playwright-cli commands.

## eval vs run-code

playwright-cli has two ways to execute JavaScript:

**`eval`** — evaluates a string expression. Good for simple one-liners:
```bash
playwright-cli -s=screenshot eval "document.body.style.zoom = '1.25'"
playwright-cli -s=screenshot eval "document.getElementById('header').style.display = 'none'"
```

**`run-code`** — executes an async function with access to the Playwright `page` object.
Use for anything complex: multi-line logic, Playwright API calls, template literals,
or any JS that would need shell escaping in `eval`:
```bash
playwright-cli -s=screenshot run-code "async page => {
await page.waitForTimeout(200);
}"
playwright-cli -s=screenshot run-code "async page => {
const bg = await page.evaluate(() => getComputedStyle(document.body).backgroundColor);
console.log(bg);
}"
```

**Rule of thumb:** If your JS has quotes, template literals, `getComputedStyle`, or
is more than one statement — use `run-code`. Shell escaping in `eval` breaks easily
with complex expressions.

## Debugging with Chrome DevTools MCP (only if available)

If stuck on CSS/DOM issues and playwright-cli's shell escaping is making complex
JS evaluation difficult, Chrome DevTools MCP can provide a faster feedback loop.
**Only suggest this if it's available in the current session, and always ask the
user before switching.**

- `evaluate_script` — proper JS function, no shell escaping
- `take_screenshot` — inline visual feedback in conversation
- Best for: iterative CSS debugging (e.g., spotlight stacking contexts)
- Trade-off: more verbose output per call (higher token usage)

## Capture Workflow

For each screenshot:

### 1. Navigate and resize

```bash
playwright-cli -s=screenshot open <url>/<page>
playwright-cli -s=screenshot resize <width> <height>
```

### 2. Apply zoom (if specified)

If the manifest entry has `capture.zoom`, apply it before any other operations:
```bash
playwright-cli -s=screenshot eval "document.body.style.zoom = '1.15'"
playwright-cli -s=screenshot eval "new Promise(r => setTimeout(r, 200))"
```

Note: `capture.js` handles zoom automatically. This step is only needed for manual/interactive captures.

### 3. Wait for full load

```bash
playwright-cli -s=screenshot snapshot
```

Check the snapshot for:
- Bootstrap Icons rendered (not blank boxes or missing glyphs)
- Fonts loaded (text not in fallback font)
- Content fully rendered (not loading spinners)

If not ready, wait and re-snapshot:
```bash
playwright-cli -s=screenshot eval "new Promise(r => setTimeout(r, 2000))"
playwright-cli -s=screenshot snapshot
```

### 4. Run cleanup steps

Read cleanup steps from `manifest.json` `defaults.cleanup` array. For each step with `"action": "eval"`, run:
```bash
playwright-cli -s=screenshot eval "<script from manifest>"
```

Then run any additional cleanup from the per-screenshot `capture.cleanup` array.

### 5. Run interaction steps

For clicks (e.g., opening a dropdown):
1. Take a snapshot to get refs
2. Find the ref matching the target (e.g., the GitHub icon)
3. Click it: `playwright-cli -s=screenshot click <ref>`
4. **Verify the interaction worked**: snapshot again and check for the expected state (e.g., `[expanded]` attribute, `.dropdown-menu.show`)
5. If a dropdown toggled closed instead of opening, click again (click toggles)

**Stateful toggle caution:** Some toggles (reader mode, sidebar collapse) persist state
in localStorage. When `dark: true`, interactions run twice (light + dark). On the dark
pass the page reloads but localStorage persists, so the toggle may already be active —
clicking it again deactivates it. In the manifest, use an `eval` with a guard condition
instead of a plain `click` for these. See `manifest-schema.md` for the pattern.

For element screenshots where content overflows (like dropdowns):
1. Use `eval` to get bounding boxes of the element + overflow content
2. Calculate a clip region that contains everything
3. Use `run-code` with Playwright's clip option:
```bash
playwright-cli -s=screenshot run-code "async page => {
await page.screenshot({ path: 'out.png', clip: { x, y, width, height } });
}"
```

### 6. Take screenshot

**Element screenshot** (when capture.element is specified):
1. Snapshot to find the ref for the element
2. `playwright-cli -s=screenshot screenshot <ref> --filename=<output-path>`

**Full page/viewport screenshot:**
```bash
playwright-cli -s=screenshot screenshot --filename=<output-path>
```

### 6b. Post-capture processing (automatic)

`capture.js` handles these automatically — the agent does not need to replicate them:
- **trim** (`capture.trim`) — content-aware whitespace removal via sharp
- **cropBottom** (`capture.cropBottom`) — removes N pixels from bottom edge
- **maxHeight** (`capture.maxHeight`) — caps image height, crops from bottom
- **compress** — oxipng compression

These run in order: trim → crop → compress.

### 7. Dark mode variant

If the screenshot has `"dark": true` in manifest:
1. Toggle via JS: `playwright-cli -s=screenshot run-code "async page => { await page.evaluate(() => window.quartoToggleColorScheme()); await page.locator('body.quarto-dark').waitFor(); }"`
2. Re-run any interactions (e.g., dropdown may have closed during toggle)
3. Take the screenshot again with `-dark` suffix on the filename
4. Toggle back: `playwright-cli -s=screenshot run-code "async page => { await page.evaluate(() => window.quartoToggleColorScheme()); await page.locator('body.quarto-light').waitFor(); }"`

Note: `capture.js` handles this automatically. This step is only needed for manual/interactive captures.

### 8. Visual validation

After capturing, verify:
- The screenshot file was created and is non-empty
- Report what you captured (element, viewport size, interactions performed)

### 9. Close when done

```bash
playwright-cli -s=screenshot close
```

## Rules

- ALWAYS use `-s=screenshot` session flag (avoids collisions with other sessions)
- ALWAYS capture light mode first (default), then dark if needed
- ALWAYS wait for fonts and icons to load before capturing
- ALWAYS remove prerelease/preview banners
- Use consistent viewport sizes from the manifest
- If something looks wrong (missing icons, broken layout), report it — don't save a bad screenshot
- If an interaction fails (ref not found, dropdown didn't open), report the error with the snapshot content
- Use relative paths from repo root for --filename output (L12: path resolution)
Loading