Skip to content

Commit e4e8b44

Browse files
committed
feat(cli): capture-video on-demand fetcher + capture pipeline robustness
For the hyperframes.dev website-to-video flow. Real-AI-test runs against heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's logo / asset-captioning signals missed modern React/Tailwind builds; and (2) there was no CLI surface to pull the videos the manifest references. New command: • `hyperframes capture-video <project>` — on-demand downloader for entries in capture/extracted/video-manifest.json. Capture writes the manifest + preview PNGs but skips the mp4s; this pulls one entry by `--index N` (matched against the entry's `index` field, NOT array offset — gaps are possible when a preview screenshot fails). SSRF-safe via safeFetch, 250 MB cap, content-type whitelist, race-free exclusive-create write. Layout-aware (handles both standalone capture and W2H project layouts). Capture pipeline fixes: • Structural logo signals (assetCataloger + tokenExtractor): inBanner / inHomeLink / matchesTitleBrand. Class-substring alone caught 0/32 SVGs on heygen.com — modern builds don't put 'logo' / 'brand' in any className. • Content-hash SVG slugs (assetDownloader): `svg-<8char-sha1>.svg` — label-derived slugs mis-attributed partner-logo carousels (heygen-logo.svg actually contained Google, hubspot-logo.svg contained Trivago, etc.). Content-hash names are invariant by construction. • SVG → PNG rasterization before Gemini Vision (contentExtractor): the raw-SVG-as-text path was hallucinating wordmarks (VIVIENNE for HubSpot, 'wrestling' for Workday). Adds polarity detection so a white-glyph SVG flattened to a blank PNG gets inverted before captioning. LOGO tag in asset-descriptions.md when structural signals fire (independent of Gemini key presence). • Double-escape \/ inside the page.evaluate template literal in assetCataloger + tokenExtractor: the original `/^https?:\/\/.../` collapsed to `/` mid-template and threw `Unexpected token ^`. Capture was 100% blocked on this until the escape was fixed. • `asset-descriptions.md` header branches on Gemini-key presence with an explicit 'Vision OFF — catalog-derived descriptions' warning. New lint rule: • `lintMissingLocalAsset` (cli/utils/lintProject): scans <video> / <img> / <source> src for local files that don't exist in the project. Empirically the most common sub-agent mistake across multi-URL runs (~5+ per run). Uses `resolveExistingLocalAsset` so the existence check matches the bundler's notion of 'resolves'. Masks comment / style / script ranges before scanning so a literal `<img src=missing.png>` inside a tutorial comment isn't reported. Tests: 17 new for capture-video (safeFilename decoding/sanitization, VIDEO_CONTENT_TYPE_RE accept/reject, pickManifestEntry index-field lookup with gaps, URL-mismatch + bad-index rejection, --index over --url priority); 70 cases under lintProject.test.ts covering the new rule and existing rules. Sibling PRs in this stack: • #PR_A1 — fix(producer): __dirname ESM banner shim • #PR_A2 — fix(core/lint): findRootTag masks comment/style/script
1 parent 211e0ad commit e4e8b44

11 files changed

Lines changed: 946 additions & 29 deletions

File tree

packages/cli/src/capture/assetCataloger.ts

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,12 @@ export interface CatalogedAsset {
2525
sectionClasses?: string;
2626
/** Whether the image is above the fold (visible without scrolling) */
2727
aboveFold?: boolean;
28+
/** Element sits inside <header>, <nav>, or [role="banner"] — logo signal */
29+
inBanner?: boolean;
30+
/** Element sits inside <a> with site-root href ("/", "#", origin-only) — brand-home link */
31+
inHomeLink?: boolean;
32+
/** alt/aria-label/title contains the brand segment of document.title */
33+
matchesTitleBrand?: boolean;
2834
}
2935

3036
/**
@@ -62,6 +68,29 @@ export async function catalogAssets(page: Page): Promise<CatalogedAsset[]> {
6268
var rect = el.getBoundingClientRect();
6369
ctx.aboveFold = rect.top < window.innerHeight;
6470
} catch(e) {}
71+
// Logo signals — surfaced explicitly so the downloader can prefix
72+
// logo-<hash> reliably. Real-AI-test on heygen.com + huly.io showed
73+
// the prior class-substring detector caught 0 logos; these explicit
74+
// structural signals catch the header logo across modern React/
75+
// Tailwind builds where "logo" isn't in any className.
76+
// 1. inBanner: element sits inside <header>, <nav>, or [role=banner].
77+
ctx.inBanner = el.closest('header, nav, [role="banner"]') !== null;
78+
// 2. inHomeLink: element sits inside an <a> whose href is the site
79+
// root ("/", "#", "./" or origin-only URL) — the brand-home link.
80+
var homeAnchor = el.closest('a[href]');
81+
if (homeAnchor) {
82+
var aHref = homeAnchor.getAttribute('href') || '';
83+
ctx.inHomeLink = aHref === '/' || aHref === '#' || aHref === './' ||
84+
/^https?:\\/\\/[^/]+\\/?$/.test(aHref);
85+
}
86+
// 3. matchesTitleBrand: alt/aria-label/title contains the brand
87+
// segment of the page title (everything before " - " / " | " /
88+
// " — ") — the "alt=HeyGen" / "aria-label=Huly" pattern.
89+
var titleBrand = (document.title || '').split(/[-|—]/)[0].trim();
90+
if (desc && titleBrand.length > 1 && titleBrand.length < 30 &&
91+
desc.toLowerCase().indexOf(titleBrand.toLowerCase()) !== -1) {
92+
ctx.matchesTitleBrand = true;
93+
}
6594
return ctx;
6695
}
6796
@@ -92,12 +121,18 @@ export async function catalogAssets(page: Page): Promise<CatalogedAsset[]> {
92121
if (notes && !entry.notes) {
93122
entry.notes = notes;
94123
}
95-
// Merge rich context (first one wins)
124+
// Merge rich context. Text fields: first-occurrence wins. Boolean
125+
// signals (inBanner / inHomeLink / matchesTitleBrand): any positive
126+
// sample wins — if ANY DOM occurrence of this URL is in a header,
127+
// the URL is a header-context asset.
96128
if (richCtx) {
97129
if (richCtx.description && !entry.description) entry.description = richCtx.description;
98130
if (richCtx.nearestHeading && !entry.nearestHeading) entry.nearestHeading = richCtx.nearestHeading;
99131
if (richCtx.sectionClasses && !entry.sectionClasses) entry.sectionClasses = richCtx.sectionClasses;
100132
if (richCtx.aboveFold !== undefined && entry.aboveFold === undefined) entry.aboveFold = richCtx.aboveFold;
133+
if (richCtx.inBanner) entry.inBanner = true;
134+
if (richCtx.inHomeLink) entry.inHomeLink = true;
135+
if (richCtx.matchesTitleBrand) entry.matchesTitleBrand = true;
101136
}
102137
}
103138
@@ -324,6 +359,10 @@ function deduplicateSrcsetVariants(assets: CatalogedAsset[]): CatalogedAsset[] {
324359
if (a.notes && !existing.notes) {
325360
existing.notes = a.notes;
326361
}
362+
// Boolean logo signals: any positive sample wins through the merge.
363+
if (a.inBanner) existing.inBanner = true;
364+
if (a.inHomeLink) existing.inHomeLink = true;
365+
if (a.matchesTitleBrand) existing.matchesTitleBrand = true;
327366
// Keep the URL with highest w= value (largest image)
328367
const existingW = getWidthParam(existing.url);
329368
const newW = getWidthParam(a.url);

packages/cli/src/capture/assetDownloader.ts

Lines changed: 54 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,22 @@
77

88
import { writeFileSync, mkdirSync } from "node:fs";
99
import { join, extname } from "node:path";
10+
import { createHash } from "node:crypto";
1011
import type { DesignTokens, DownloadedAsset } from "./types.js";
1112
import type { CatalogedAsset } from "./assetCataloger.js";
1213

14+
/**
15+
* Content-hash slug for SVGs — `svg-<8-char-sha1>` for icons / `logo-<hash>`
16+
* when DOM evidence says it's a logo. Replaces label-derived slugging which
17+
* mis-assigned brand names to the wrong SVG bodies (e.g. `heygen-logo.svg`
18+
* landing on the Google partner-logo SVG). The hash is a function of the
19+
* bytes, so the filename can never mismatch the content.
20+
*/
21+
function svgContentHashSlug(svgSource: string | Buffer, isLogo: boolean): string {
22+
const hash = createHash("sha1").update(svgSource).digest("hex").slice(0, 8);
23+
return isLogo ? `logo-${hash}` : `svg-${hash}`;
24+
}
25+
1326
export async function downloadAssets(
1427
tokens: DesignTokens,
1528
outputDir: string,
@@ -22,15 +35,20 @@ export async function downloadAssets(
2235
const assets: DownloadedAsset[] = [];
2336
const downloadedUrls = new Set<string>();
2437

25-
// 1. ALL inline SVGs — save as files (logos get priority naming)
38+
// 1. ALL inline SVGs — save as files. Names are content-hash based
39+
// (`svg-<hash>.svg` or `logo-<hash>.svg`) so the filename can never
40+
// drift from the SVG body. The DOM-derived `label` is unreliable —
41+
// it has misassigned `heygen-logo.svg` to the Google partner SVG in
42+
// past captures because aria-label / nearest-heading inference can
43+
// pick up text from the wrong ancestor. Content-hash is invariant.
2644
mkdirSync(join(outputDir, "assets", "svgs"), { recursive: true });
2745
const usedSvgNames = new Set<string>();
2846
for (let i = 0; i < tokens.svgs.length && i < 30; i++) {
2947
const svg = tokens.svgs[i]!;
3048
if (!svg.outerHTML || svg.outerHTML.length < 50) continue;
31-
const label = svg.label?.replace(/[^a-zA-Z0-9-_ ]/g, "").trim();
32-
let slug = label ? slugify(label) : svg.isLogo ? `logo-${i}` : `icon-${i}`;
33-
// Deduplicate — two SVGs with same aria-label get suffixed
49+
const slug = svgContentHashSlug(svg.outerHTML, !!svg.isLogo);
50+
// Hash collisions are negligible for 8-char sha1 prefix over <30 SVGs,
51+
// but suffix-dedupe anyway for safety + idempotent re-runs.
3452
let finalSlug = slug;
3553
let suffix = 2;
3654
while (usedSvgNames.has(finalSlug)) {
@@ -135,8 +153,38 @@ export async function downloadAssets(
135153
if (result.status !== "fulfilled" || !result.value) continue;
136154
const { url, isPoster, parsedUrl, ext, buffer, catalog } = result.value;
137155
try {
138-
// Generate human-readable name from catalog context
139-
const slug = deriveAssetName(parsedUrl, catalog, isPoster, imgIdx, usedNames);
156+
// SVGs use content-hash names because catalog-derived slugs
157+
// mis-assigned brand names to the wrong SVG bodies (the same
158+
// alignment failure that produced `heygen-logo.svg` containing
159+
// the Google wordmark). Rasters keep the catalog-derived
160+
// human-readable slug — they were not affected by the bug.
161+
let slug: string;
162+
if (ext === ".svg") {
163+
// isLogo signals — broadened. The original `contexts` substring
164+
// check never fired in practice because contexts hold HTML
165+
// positions like 'img[src]' / 'video[poster]', not semantic
166+
// labels. Real signals come from DOM structure + alt/aria text:
167+
// 1. The cataloger now flags inBanner (inside <header>/<nav>/
168+
// [role=banner]), inHomeLink (inside <a href="/">), and
169+
// matchesTitleBrand (alt/aria matches document.title's
170+
// brand segment) — see assetCataloger.ts getElementContext.
171+
// 2. As a backstop, also check description / nearestHeading /
172+
// sectionClasses for "logo" / "brand" / "wordmark" text.
173+
const c = catalog;
174+
const brandRe = /logo|brand|wordmark/i;
175+
const isLogo = !!(
176+
c?.inBanner ||
177+
c?.inHomeLink ||
178+
c?.matchesTitleBrand ||
179+
c?.contexts?.some((s) => brandRe.test(s)) ||
180+
(c?.description && brandRe.test(c.description)) ||
181+
(c?.nearestHeading && brandRe.test(c.nearestHeading)) ||
182+
(c?.sectionClasses && brandRe.test(c.sectionClasses))
183+
);
184+
slug = svgContentHashSlug(buffer, isLogo);
185+
} else {
186+
slug = deriveAssetName(parsedUrl, catalog, isPoster, imgIdx, usedNames);
187+
}
140188
const name = `${slug}${ext}`;
141189
usedNames.add(slug);
142190
const localPath = `assets/${name}`;

packages/cli/src/capture/contentExtractor.ts

Lines changed: 64 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import type { Page } from "puppeteer-core";
1212
import { existsSync, readdirSync, statSync, readFileSync } from "node:fs";
1313
import { join } from "node:path";
14+
import sharp from "sharp";
1415
import type { CatalogedAsset } from "./assetCataloger.js";
1516
import type { DesignTokens } from "./types.js";
1617

@@ -232,7 +233,12 @@ export async function captionImagesWithGemini(
232233
}
233234
progress("design", `${Object.keys(geminiCaptions).length} images captioned with Gemini`);
234235

235-
// Caption SVGs by sending source code as text (vision API rejects image/svg+xml).
236+
// Caption SVGs by RENDERING each to PNG via sharp first, then sending the
237+
// PNG bytes to the Vision API — same call shape as raster images.
238+
// Previous implementation sent SVG path markup as TEXT, which produced
239+
// pure hallucinations on wordmarks (`hubspot-logo.svg` → "VIVIENNE",
240+
// `huly-logo.svg` → "Kube", `workday.svg` → "wrestling"). Vision models
241+
// can't reliably mental-render path commands; they need actual pixels.
236242
const svgFiles: Array<{ file: string; relPath: string }> = [];
237243
const assetsDir = join(outputDir, "assets");
238244
for (const f of readdirSync(assetsDir)) {
@@ -246,30 +252,63 @@ export async function captionImagesWithGemini(
246252
}
247253

248254
if (svgFiles.length > 0) {
249-
progress("design", `Captioning ${svgFiles.length} SVGs via code analysis...`);
255+
progress("design", `Rasterizing + captioning ${svgFiles.length} SVGs via vision API...`);
250256
const SVG_BATCH = 20;
251-
const MAX_SVG_CHARS = 10_000;
257+
const SVG_RENDER_SIZE = 256; // px — enough resolution for Gemini to read wordmarks, small enough to keep payload sub-MB
252258
for (let i = 0; i < svgFiles.length; i += SVG_BATCH) {
253259
const batch = svgFiles.slice(i, i + SVG_BATCH);
254260
const results = await Promise.allSettled(
255261
batch.map(async ({ relPath }) => {
256262
const filePath = join(assetsDir, relPath);
257-
let svgText = readFileSync(filePath, "utf-8");
258-
if (svgText.length > MAX_SVG_CHARS) {
259-
svgText = svgText.slice(0, MAX_SVG_CHARS) + "\n<!-- truncated -->";
263+
let pngBase64: string;
264+
try {
265+
// Detect SVG fill polarity so we can pick a contrasting flatten
266+
// background. White-glyph SVGs (huly's "✕ huly" wordmark uses
267+
// fill="#fff") render invisible against white; dark-glyph SVGs
268+
// render invisible against black. Choosing the background by
269+
// dominant fill keeps both polarities readable for the vision API.
270+
const svgSource = readFileSync(filePath, "utf-8");
271+
const lightFillHits = (
272+
svgSource.match(/fill\s*=\s*["'](#fff(fff)?|white|#f[ef][ef])["']/gi) || []
273+
).length;
274+
const darkFillHits = (
275+
svgSource.match(/fill\s*=\s*["'](#000(000)?|black|#[0-3]{6}|#[0-3]{3})["']/gi) || []
276+
).length;
277+
const bg =
278+
lightFillHits > darkFillHits
279+
? { r: 32, g: 32, b: 32 } // dark slate behind light glyphs
280+
: { r: 255, g: 255, b: 255 }; // white behind dark glyphs (default)
281+
// sharp rasterizes SVG → PNG natively.
282+
const pngBuffer = await sharp(filePath)
283+
.resize({
284+
width: SVG_RENDER_SIZE,
285+
height: SVG_RENDER_SIZE,
286+
fit: "inside",
287+
withoutEnlargement: false,
288+
})
289+
.flatten({ background: bg })
290+
.png()
291+
.toBuffer();
292+
pngBase64 = pngBuffer.toString("base64");
293+
} catch {
294+
// SVG rasterization can fail on exotic features (external fonts,
295+
// foreignObject, filters with missing primitives). Skip caption
296+
// rather than block — agent will fall back to contact-sheet view.
297+
return { file: relPath, caption: "" };
260298
}
261299
const response = await ai.models.generateContent({
262300
model,
263301
contents: [
264302
{
265303
role: "user",
266304
parts: [
305+
{ inlineData: { mimeType: "image/png", data: pngBase64 } },
267306
{
268307
text:
269-
"This SVG code is from a website. Describe what it renders in ONE short sentence " +
270-
"for a video storyboard. Focus on: what shape/icon/illustration it is, its colors. " +
271-
"Be factual.\n\n" +
272-
svgText,
308+
"Describe this SVG asset rendered from a website in ONE short sentence for a video storyboard. " +
309+
"Focus on: what shape/icon/illustration/wordmark it is, its colors, any text it contains. " +
310+
"If you see a wordmark, READ THE LETTERS LITERALLY — do not guess a brand from context. " +
311+
"Be factual.",
273312
},
274313
],
275314
},
@@ -358,11 +397,6 @@ export function generateAssetDescriptions(
358397
const svgsPath = join(assetsPath, "svgs");
359398
for (const file of readdirSync(svgsPath)) {
360399
if (!file.endsWith(".svg")) continue;
361-
const geminiCaption = geminiCaptions[`svgs/${file}`];
362-
if (geminiCaption) {
363-
svgLines.push(`svgs/${file}${geminiCaption}`);
364-
continue;
365-
}
366400
const svgMatch = tokens.svgs.find(
367401
(s) =>
368402
s.label &&
@@ -373,9 +407,22 @@ export function generateAssetDescriptions(
373407
.slice(0, 15),
374408
),
375409
);
410+
// The `logo-<hash>` filename prefix is preserved by the capture
411+
// pipeline as a cheap structural-context signal (DOM said this SVG
412+
// was inside a header/home-link/etc.). We DO NOT re-tag it in the
413+
// description text — captions are the agent's selector, and "LOGO"
414+
// tag noise misled agents into thinking every tagged file was a
415+
// distinct brand. Captions identify content directly.
416+
const geminiCaption = geminiCaptions[`svgs/${file}`];
417+
if (geminiCaption) {
418+
svgLines.push(`svgs/${file}${geminiCaption}`);
419+
continue;
420+
}
376421
const label = svgMatch?.label || file.replace(".svg", "").replace(/-/g, " ");
377-
const isLogo = svgMatch?.isLogo || file.includes("logo");
378-
svgLines.push(`svgs/${file}${isLogo ? "logo: " : "icon: "}${label}`);
422+
// No-Gemini fallback: keep a short uncaptioned line. The filename
423+
// prefix (logo-<hash> vs svg-<hash>) already carries the structural
424+
// hint without needing a text tag.
425+
svgLines.push(`svgs/${file}${label}`);
379426
}
380427
} catch {
381428
/* no svgs dir */

packages/cli/src/capture/index.ts

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -579,14 +579,19 @@ export async function captureWebsite(
579579
const lines = generateAssetDescriptions(outputDir, tokens, catalogedAssets, geminiCaptions);
580580

581581
if (lines.length > 0) {
582+
const hasGeminiKey = !!(process.env.GEMINI_API_KEY || process.env.GOOGLE_API_KEY);
583+
const header = hasGeminiKey
584+
? "# Asset Descriptions\n\nOne line per file. Read this instead of opening every image individually.\n\nTo find a specific brand or icon, **grep this file for the brand name in the description text** (e.g. `grep -i 'autodesk' asset-descriptions.md`). The Gemini Vision captions identify what's actually in each file — that's the agent's selector.\n\nThe `logo-<hash>.svg` filename prefix is a cheap structural hint (DOM said this SVG was inside a `<header>`, home-link `<a>`, or had an aria-label matching the page brand). It is NOT a content claim — many `logo-*` files are nav icons or decorative shapes. Trust the captions, not the filename prefix.\n\n"
585+
: "# Asset Descriptions\n\n⚠️ GEMINI_API_KEY not set — descriptions below are catalog-derived (alt text, headings, section context, filename) instead of Vision-generated. To get richer Vision descriptions on the next capture, set GEMINI_API_KEY (or GOOGLE_API_KEY) and re-run.\n\nThe `logo-<hash>.svg` filename prefix is a structural hint (DOM said this SVG was inside a `<header>`, home-link `<a>`, or had an aria-label matching the page brand). To pick the actual brand logo without Vision, open the `logo-*` candidates in a previewer or rasterize them with `sharp` before referencing — composing a fake logo ships off-brand in the final video.\n\n";
582586
writeFileSync(
583587
join(outputDir, "extracted", "asset-descriptions.md"),
584-
"# Asset Descriptions\n\nOne line per file. Read this instead of opening every image individually.\n\n" +
585-
lines.map((l) => "- " + l).join("\n") +
586-
"\n",
588+
header + lines.map((l) => "- " + l).join("\n") + "\n",
587589
"utf-8",
588590
);
589-
progress("design", `${lines.length} asset descriptions written`);
591+
progress(
592+
"design",
593+
`${lines.length} asset descriptions written${hasGeminiKey ? "" : " (no Gemini key — catalog-fallback mode)"}`,
594+
);
590595
}
591596
} catch {
592597
/* non-critical */

0 commit comments

Comments
 (0)