Duration target: ~61 seconds (Act 3 expanded from 11.2s → 23.5s for full-breath clips + logo-reveal section)
Voice: ElevenLabs — Rachel (21m00Tcm4TlvDq8ikWAM)
Voice settings: stability 0.35 · similarity 0.75 · style 0.20 · speaker boost: on
Model: eleven_multilingual_v2
Voice direction: Curious, playful on Act 1 questions. Confident and slightly smiling on the Act 1 announcement. Rhythmic, punchy, trailer-cadence on the Act 3 capability listing. Calm + warm on the Act 4 CTA. Always feels like a person, never like a reader.
Four questions under the 4 typing panels, then the announcement, then silence through YEAHH → click → drop.
Time: 1.66 – 3.9s Delivery: Upward lilt at the end — genuine curiosity.
Promo for your app?
Time: 4.2 – 6.4s Delivery: Slightly brighter than line 1, same curious register.
Brand video?
Time: 6.7 – 8.9s Delivery: Quicker, punchier.
Social ad?
Time: 9.2 – 11.4s Delivery: Shortest, slight anticipation — "one more..."
Launch clip?
Time: 11.7 – 13.5s Delivery: Drop into confidence. Period after "Now" creates a beat. "any site" gets gentle emphasis. Ends with a slight smile the listener can hear.
Now — any site becomes video.
Pronunciation cues: "website-to-hyperframes" is never spoken here; we save the product name for Act 4 to give it weight.
No VO. Music carries the full energy. Only on-screen monospace labels:
extracting fontscapturing colorsparsing assetsdetecting librariescomposing timelinerendering frames
These fade in/out sync'd to extraction beats — the visual is the story. Narration here would be over-explaining.
Eleven clips at 0.8s–3.5s each, variable pacing. Rachel reads each capability label rhythmic, period-stopped, trailer cadence. New in v2: logo-reveal trio (linear / dribbble / tailwind-logo) at the end + vercel world map swapped in + shader/figma removed.
Product launches.
Feature showcases.
Brand reels.
Music apps.
Three D scenes.
Pronunciation cue: 3D → three D
NEW line in v2 (was "Dev metrics" for the old 10.5× FASTER clip; changed to match world map / global infra visual).
Infrastructure.
Product demos.
NEW line in v2.
Brand openers.
NEW line in v2.
Logo reveals.
NEW line in v2.
Drawn in S V G.
Pronunciation cue: SVG → S V G (three letters, spaces force spell-out)
Typography.
Custom shaders.(clip-A shader was removed from Act 3 — visually too abstract for the montage)Portfolios.(dribbble now plays its logo-reveal, not the gallery grid)
Unused VO files (kept on disk for reference / future use): 09-custom-shaders.mp3, 12-portfolios.mp3, 18-team-design.mp3.
Three beats, sparse pacing. Room for breath between each — matches Thursday's "HyperFrames. Go make something." Calm, confident. (Shifted later due to Act 3 expansion; same internal structure.)
Time: 53.5 – 55.0s Delivery: "HyperFrames" as ONE compound word — not Hyper then Frames. Slight stress on the first syllable.
HyperFrames.
Pronunciation cue: hyper-frames (single word, camelCase is typographic only)
Time: 55.5 – 58.5s Delivery: Two-beat structure. Pause after "URL."
Give your agent a URL. It does the rest.
Pronunciation cue: URL → U R L (three letters)
Time: 59.5 – 61.0s Delivery: Slight smile in voice. Warm, confident, not sales-y.
Go make something.
| Line | Duration (approx) |
|---|---|
| 4 questions (Act 1) | ~4.8s combined |
| 1 announcement (Act 1) | ~1.8s |
| 8 capability labels (Act 3) | ~9s combined |
| 3 CTA lines (Act 4) | ~5.5s combined |
| Total voice time | ~21s of 50s |
The remaining ~29s is music + SFX + visual-only moments (Act 2 entirely, YEAHH through drop, final beat of Act 4). Voice sits above music via ducking (music -20dB during VO, full during gaps).
Each line becomes its own .mp3 file at launch-video-2/audio/vo/:
01-promo-for-your-app.mp3
02-brand-video.mp3
03-social-ad.mp3
04-launch-clip.mp3
05-announcement.mp3
06-product-launches.mp3
07-feature-showcases.mp3
08-brand-reels.mp3
09-custom-shaders.mp3
10-product-demos.mp3
11-three-d-scenes.mp3
12-portfolios.mp3
13-kinetic-typography.mp3
14-hyperframes.mp3
15-give-your-agent-a-url.mp3
16-go-make-something.mp3