refactor(skill): step-0 owns capture, not analysis

ukimsanov · ukimsanov · commit 65d836f90ca0 · 2026-05-20T15:19:29.000-07:00
Step 0 had bloated to 91 lines that did the work of Steps 1–3:
viewing contact sheets cell-by-cell, reading 8 data files, listing
promising assets, inferring product purpose / audience / value prop
/ brand voice. That meant the agent did all the heavy lifting
upfront, produced summaries that went stale before they were used,
and the actual "run the capture" instruction was buried.

Step 0 now owns only what Step 0 is: run the capture command,
sanity-check it succeeded, hand off. 91 → 55 lines.

Moved (composed into destination files, verified each was the right
home before adding):

- Read tokens.json + design-styles.json → step-1-design.md replaces
  the passive "you read these in Step 0" line with an active
  "Read these now — primary data source for Sections 3–6."
- Contact-sheet "every cell, name 5 assets per page" anti-glance
  prose → step-3-storyboard.md asset-discovery bullet (which already
  covered contact-sheet viewing generally, now strengthened with
  the anti-glance rule).
- Strategic site summary (product / audience / voice / value prop)
  → step-2-brief.md absorbed this; the brief itself IS the summary.
  Replaced "After presenting the site summary (from Step 0)" with
  step-2 grounding itself by reading DESIGN.md + asset-descriptions
  + visible-text directly.

Step 0's new structure:
- Run the capture (CLI command + project-dir convention) — unchanged
- Confirm it succeeded (1-line summary, error-out on bad capture)
- Reference table mapping each capture/ file to the step that
  first reads it (explicit "DO NOT read these here")
- Gate: capture exits 0 + counts non-zero
diff --git a/skills/website-to-hyperframes/references/step-0-capture.md b/skills/website-to-hyperframes/references/step-0-capture.md
@@ -1,14 +1,14 @@
-# Step 0: Capture & Understand the Brand
+# Step 0: Capture
 
-You're capturing the site **to understand a brand and a product** — not to inventory building blocks. Reading the assets tells you what the product is, who it's for, what voice the brand speaks in, and what mood it lives in. That understanding is the strategic foundation for the video; the assets themselves are decoration the storyboard will reach for late, only where they serve the concept. **The video is not a recombination of the captured assets.**
+The capture pipeline downloads the site and extracts structured data for the rest of the workflow to read. Step 0 is a single command plus a sanity check. **All analysis (reading files, viewing contact sheets, deriving brand voice, picking assets) happens in Steps 1–3, not here.**
 
 ## Run the capture
 
 No API keys required for the base capture. However, before running, ask the user:
 
 > "For the best results, it is recommended to set a Gemini API key — it gives me AI-powered descriptions of every captured image, which helps me choose the right assets for each scene. It costs about $0.001 per image. You can skip this if you want, but the video quality will be better with it. To set it up: add `GEMINI_API_KEY=your-key` to a `.env` file in the project root. You can get a free key at ai.google.dev."
 
-If the user provides the key or already has one set, proceed. If they skip it, proceed anyway — the capture works without it, but `asset-descriptions.md` will have DOM-context descriptions only (position, size, alt text) instead of AI vision descriptions (what the image actually shows).
+If the user provides the key or already has one set, proceed. If they skip it, proceed anyway — the capture works without it, but `asset-descriptions.md` will have DOM-context descriptions only (position, size, alt text) instead of AI vision descriptions.
 
 Create a project directory for your video if it doesn't exist yet, then capture the website into a `capture/` subfolder within it:
 
@@ -18,74 +18,38 @@ npx hyperframes capture <URL> -o <project-dir>/capture
 
 Example: `npx hyperframes capture https://stripe.com -o videos/stripe-launch/capture`
 
-Keeping the capture artifacts (`screenshots/`, `assets/`, `extracted/`, `AGENTS.md`, `CLAUDE.md`) in a dedicated `capture/` subfolder keeps them isolated from the later build files (`SCRIPT.md`, `STORYBOARD.md`, `DESIGN.md`, `compositions/`, `index.html`, `narration.wav`, `transcript.json`, `renders/`, `snapshots/`), which all live at `<project-dir>/` root.
+Keeping capture artifacts (`screenshots/`, `assets/`, `extracted/`, `AGENTS.md`, `CLAUDE.md`) in a dedicated `capture/` subfolder keeps them isolated from later build files (`SCRIPT.md`, `STORYBOARD.md`, `DESIGN.md`, `compositions/`, `index.html`, `narration.wav`, `transcript.json`, `renders/`, `snapshots/`), which all live at `<project-dir>/` root.
 
 For exploratory captures that aren't becoming a video yet, `-o captures/<name>` at the repo root is fine — the isolation convention only matters when you're building a video on top of the capture.
 
-Wait for the capture to complete. Print how many screenshots, assets, sections, and fonts were extracted.
+## Confirm it succeeded
 
-## Read and summarize
+Wait for the capture to complete. Print one line summarizing what was captured:
 
-Read every file below. After each one, **write a 3-4 sentence summary** of what you learned. These summaries carry forward — the raw file content may be cleared from context later, and your summaries are what keep the capture data usable through the rest of the pipeline.
+> "Captured N screenshots, M assets, K SVGs, F fonts. Ready for Step 1."
 
-### Read these
+If the command exited non-zero, the counts are all zero, or required directories (`extracted/`, `assets/`, `screenshots/`) are missing, surface the error and stop — don't advance to Step 1 with a broken capture.
 
-1. **View the contact sheets — carefully, every cell, not a glance.** Contact sheets are labeled grids that let you see many images at once. They are paginated: look for `contact-sheet-1.jpg`, `contact-sheet-2.jpg`, etc. (or `contact-sheet.jpg` for older captures). View ALL pages for each category. **For each page, name at least 5 specific assets you can see in it before moving on** — this forces you to actually look at every cell instead of scrolling past. Past agents have reported "viewed the contact sheet" after literally one glance, then later in Step 3 they wrote beats using assets that didn't exist or missed the brand logo entirely. Don't be that agent. The contact sheet is your single best opportunity to learn what's actually available in the capture.
-   - `capture/screenshots/contact-sheet-*.jpg` — scroll screenshots grid. View FIRST. Each cell numbered with scroll percentage. List the directory if unsure how many pages exist.
-   - `capture/assets/contact-sheet-*.jpg` — all downloaded raster images grid. Each cell labeled with filename.
-   - `capture/assets/svgs/contact-sheet-*.jpg` — all SVGs rendered as thumbnails. Each cell labeled with filename. Check `capture/assets/` root too — some captures store SVGs there instead of `svgs/`.
+## What lives in `capture/` (reference table — DO NOT read these here)
 
-   After viewing the screenshot contact sheets, write 3-4 sentences describing the site's visual mood, layout patterns, color strategy, and overall feel. Then list, by filename, the 5-10 captured assets that look most promising for video use (logo, hero illustration, brand mark, gradient backgrounds, product art). **Open and view those promising assets individually** — the contact sheet thumbnails are too small to judge fine detail, but once you've narrowed to the 5-10 candidates, read each one carefully. Don't just trust the thumbnail.
+Each downstream step reads only what it needs. Don't pre-fetch everything in Step 0; that bloats context and produces summaries that get stale by the time they're used.
 
-2. **`capture/extracted/tokens.json`** — Note the top 5-7 colors (HEX), all font families with their weights (e.g. `Inter (400,700)` or `Sohne (100-900 variable)`), number of sections, and number of headings/CTAs.
-
-3. **`capture/extracted/design-styles.json`** — Computed styles extracted from live DOM elements. Contains: typography hierarchy (every text role with exact font-size, weight, line-height, letter-spacing), button variants (background, padding, radius, shadow), card/container styles, navigation styles, spacing scale with base unit, border-radius scale, and box-shadow values with usage counts. This is your primary data source for writing DESIGN.md Sections 3-6.
-
-4. **`capture/extracted/visible-text.txt`** — Each line is prefixed with the HTML tag: `[h1] Heading`, `[p] Body text`, `[a] Link text`. Use these tags to understand hierarchy — headings are key messages, paragraphs are supporting copy. Strip the `[tag]` prefix if quoting text in the script.
-
-5. **`capture/extracted/asset-descriptions.md`** — One-line-per-file summary of all downloaded assets. Note which assets are most visually striking or useful for video.
-
-6. **`capture/extracted/fonts-manifest.json`** — Each downloaded font identified by its real family name (read from the binary's OpenType `name` table, so hashed Next.js/Webpack filenames are resolved automatically). Lists per-family aggregates with weights, variable-font axes, and file counts. Read this in Step 1 instead of guessing fonts from filenames. If the manifest's `unidentified[]` is empty, every captured font has a known identity. Skip the file if it doesn't exist (older captures).
-
-### Required to check and read IF they exist
-
-7. **`capture/extracted/animations.json`** — See for yourself if the site uses scroll-triggered animations, marquees, canvas/WebGL, or named CSS animations. Just good to know.
-
-8. **`capture/extracted/lottie-manifest.json`** — View each preview image at `capture/assets/lottie/previews/` to see what the animations look like. It will help you think of what you can do in the video.
-
-9. **`capture/extracted/video-manifest.json`** — View each preview at `capture/assets/videos/previews/` to see what each video shows.
-
-10. **`capture/extracted/shaders.json`** — If present, this contains the actual GLSL shader code that powers the site's WebGL visual effects (gradient waves, particle systems, noise fields). Read the fragment shaders to extract: color values used in gradients, noise algorithms, blend functions. You are able to recreate similar effects in your compositions using Canvas 2D, Three.js, HTML-in-canvas or by embedding the shader patterns with a `<canvas>` + WebGL context. Absolutely read the patterns in `techniques.md`!!
-
-### Required On-demand (only when actually needed in Step 5)
-
-11. **Individual images in `capture/assets/`** — The contact sheet pages cover all assets. Only open an individual file when:
-    - You are placing text over a screenshot and need to check the safe zone / exact content at full resolution
-    - A storyboard-assigned asset's contact sheet thumbnail is too small to judge its content
-
-    Do NOT batch-view individual assets at this stage. That is what the contact sheets are for.
-
-### For rich captures (30+ images)
-
-If asset-descriptions.md has mostly bare descriptions (no AI vision — check if entries say things like 'icon: icon 0' instead of actual descriptions), launch a sub-agent to view and describe all of those.
-
-## Carry-forward to Step 1
-
-After reading `tokens.json` and `design-styles.json` here, **summarize the key values** (top colors, font families, key component styles) in your step-0 site summary. Step 1 reads your summary — it does NOT re-read these files. If your summary is thorough, Step 1 can write DESIGN.md without opening them again.
+| Path                                      | First read in                                 |
+| ----------------------------------------- | --------------------------------------------- |
+| `capture/extracted/tokens.json`           | Step 1 (DESIGN.md — colors / fonts)           |
+| `capture/extracted/design-styles.json`    | Step 1 (DESIGN.md — typography / components)  |
+| `capture/extracted/fonts-manifest.json`   | Step 1 (font identification)                  |
+| `capture/extracted/asset-descriptions.md` | Step 2 (brief grounding) and Step 3 (assets)  |
+| `capture/extracted/visible-text.txt`      | Step 2 (brief) and Step 3 (script)            |
+| `capture/assets/contact-sheet-*.jpg`      | Step 3 (asset picking)                        |
+| `capture/assets/svgs/contact-sheet-*.jpg` | Step 3 (SVG / logo picking)                   |
+| `capture/screenshots/contact-sheet-*.jpg` | Step 3 (visual mood reference)                |
+| `capture/extracted/animations.json`       | Step 3 / Step 5 (only if site has animations) |
+| `capture/extracted/lottie-manifest.json`  | Step 3 (only if site uses Lottie)             |
+| `capture/extracted/video-manifest.json`   | Step 3 (only if site embeds video)            |
+| `capture/extracted/shaders.json`          | Step 3 / Step 5 (only if site has WebGL)      |
+| `capture/assets/<individual files>`       | Step 5 (only when placing a specific asset)   |
 
 ## Gate
 
-Print your site summary before proceeding to Step 1. The summary is **strategy-first, not asset-first**:
-
-- **Site:** [name]
-- **What the product does:** [one sentence — the product's actual job, what problem it solves]
-- **Who it's for:** [audience — developers, designers, ops teams, consumers, enterprise, etc.]
-- **Core value prop:** [the one promise the homepage makes — what the brand is selling, in their own words if visible-text supports it]
-- **Brand voice:** [one phrase — confident/playful/clinical/premium/urgent/etc., grounded in the copy you read]
-- **Visual identity:** [one sentence — dominant mood, e.g. "dark cinematic with single saturated accent" or "white-and-color clean consumer"]
-- **Colors:** [top 3-5 HEX values with roles]
-- **Fonts:** [font families]
-- **Sections:** [count] sections, [count] headings, [count] CTAs
-- **Notable captured assets:** [3-5 assets worth remembering as potential brand accents — typically logo, hero illustration, gradient, brand mark. Note these are candidates, not assignments. Most won't make it into the final video.]
-
-The first 5 bullets are the strategic frame — they tell you what video to make. The last 4 are the brand toolkit you'll inflect that video with.
+Capture exits 0. Asset / screenshot / font counts non-zero. Proceed to Step 1.
diff --git a/skills/website-to-hyperframes/references/step-1-design.md b/skills/website-to-hyperframes/references/step-1-design.md
@@ -10,7 +10,10 @@ DESIGN.md is the brand inflection sub-agents layer on top of every composed beat
 
 **User preferences always override brand rules.** If the user says "make it bright even though the site is dark" or "use serif fonts even though the brand is sans" — follow the user. DESIGN.md describes the captured website. The video might deliberately break that.
 
-You read `tokens.json` and `design-styles.json` in Step 0. If you remember the values, use them; if not, re-read. Don't guess.
+**Read these now** — they're the inputs DESIGN.md is built from. Don't guess colors or sizes from screenshots:
+
+- `capture/extracted/tokens.json` — top brand colors (HEX) and font families with weight ranges.
+- `capture/extracted/design-styles.json` — computed CSS values from the live DOM: typography hierarchy (font-size, weight, line-height, letter-spacing per text role), button variants (background, padding, radius, shadow), card/container/nav styles, spacing scale, border-radius scale, box-shadow values with usage counts. **Primary data source for Sections 3–6 below.**
 
 **Font availability check — do this before writing anything else.** Read `capture/extracted/fonts-manifest.json`. The capture pipeline reads the OpenType `name` table embedded in every downloaded font file, so even hash-renamed Next.js/Webpack fonts are identified by their real family name (Inter, JetBrains Mono, Geist Mono, etc.). No guessing required.
 
diff --git a/skills/website-to-hyperframes/references/step-2-brief.md b/skills/website-to-hyperframes/references/step-2-brief.md
@@ -12,7 +12,9 @@ Skip questions the user already answered. Ask only what's missing. If the prompt
 
 ## What to Ask
 
-After presenting the site summary (from Step 0), engage the user with these questions. Use your agent's question/answer UI if available (multi-choice with custom option). If not, ask conversationally.
+Before asking the user anything, ground yourself in the brand: skim `DESIGN.md` (just written in Step 1) for colors / fonts / voice; read `capture/extracted/asset-descriptions.md` for what visual assets exist; skim `capture/extracted/visible-text.txt` for what the site says about itself. Don't summarize all of it back to the user — that's noise; they captured the site, they know what's there. Use it to draft a tight one-paragraph framing of the brand and proceed to the questions.
+
+Engage the user with the questions below. Use your agent's question/answer UI if available (multi-choice with custom option). If not, ask conversationally.
 
 ### Question 1: What's this video for?
 
diff --git a/skills/website-to-hyperframes/references/step-3-storyboard.md b/skills/website-to-hyperframes/references/step-3-storyboard.md
@@ -61,7 +61,7 @@ Beat 3: composed kanban (4 cards-as-divs per column) + counter chip on In-Progre
 **Re-read these files before writing:**
 
 - **DESIGN.md** — your color palette, font rules, components, Do's/Don'ts. Every visual must be grounded in this brand identity. If it says "white backgrounds with purple accent" — plan light scenes, not dark moody ones.
-- **Asset discovery — use the contact sheets.** View `capture/assets/contact-sheet-*.jpg` and `capture/assets/svgs/contact-sheet-*.jpg`. Each grid cell is labeled with the filename. This is how you browse what's available without opening 50 individual files. When you find an asset that earns its place in a beat, note the filename from the label and reference it as `capture/assets/<filename>`. If an asset looks promising but you need to check resolution or detail, THEN open the individual file. Also read `capture/extracted/asset-descriptions.md` for one-line summaries. **Never use contact sheets or scroll screenshots in the video** — contact sheets have grid labels and headers baked in, scroll screenshots are raw browser captures. Both are for AI to BROWSE and understand the site, not to place in compositions.
+- **Asset discovery — view the contact sheets carefully, every cell.** Open `capture/assets/contact-sheet-*.jpg` and `capture/assets/svgs/contact-sheet-*.jpg`. Both are paginated — view every page (`contact-sheet-1.jpg`, `contact-sheet-2.jpg`, etc.). **For each page, name 5 specific assets you can see before moving on.** Past agents have reported "viewed the contact sheet" after one glance and then wrote beats referencing assets that didn't exist or missed the brand logo entirely. Don't be that agent. When you find an asset that earns its place in a beat, note the filename from the label and reference it as `capture/assets/<filename>`. If a thumbnail is too small to judge resolution / fine detail, open the individual file. Also read `capture/extracted/asset-descriptions.md` for one-line summaries. **Never use contact sheets or scroll screenshots in the video itself** — contact sheets have grid labels baked in; scroll screenshots are raw browser captures. Both are for AI to BROWSE and understand the site, not to place in compositions.
 - **[techniques.md](../../hyperframes/references/techniques.md)** — 13 primitive animation techniques with code patterns. Pick for beats, these are starting points to adapt, not templates to copy.
 - **[text-effects.md](../../hyperframes/references/text-effects.md)** — 24 named text animation effects bundled in the repo. Read the catalog now and assign a specific effect ID to every headline, label, and copy element in every beat — not generic "fades in" descriptions.