Skip to content

Commit cfcd185

Browse files
jeremymanningclaude
andcommitted
Update discussion/recap slide titles, adjust scale, remove vocoder title link
- Rename "Questions to consider" → "Think about it!" - Rename "Think about it..." → "Recap" - Change discussion slide scale from 55 to 85 - Remove hyperlink from vocoder slide title (keep in body text) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6dbeacb commit cfcd185

3 files changed

Lines changed: 28 additions & 28 deletions

File tree

slides/week7/lecture23.html

Lines changed: 24 additions & 24 deletions
Large diffs are not rendered by default.

slides/week7/lecture23.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ Diffusion and autoregressive approaches are **converging** — many systems use
246246
---
247247
<!-- _class: scale-70 -->
248248

249-
# The [vocoder](https://en.wikipedia.org/wiki/Vocoder): from spectrograms to sound
249+
# The vocoder: from spectrograms to sound
250250

251251
<div class="definition-box" data-title="Why this step is non-trivial">
252252

@@ -347,11 +347,11 @@ Regulation works when enforced. But technical solutions (watermarking, detection
347347
</div>
348348

349349
---
350-
<!-- _class: scale-55 -->
350+
<!-- _class: scale-85 -->
351351

352352
# Discussion
353353

354-
<div class="tip-box" data-title="Questions to consider">
354+
<div class="tip-box" data-title="Think about it!">
355355

356356
1. **The world simulator question**: Sora generates videos with plausible physics. Does this mean it has learned a model of the physical world, or is it pattern-matching at a scale we find convincing? How would we tell the difference?
357357

@@ -367,7 +367,7 @@ Regulation works when enforced. But technical solutions (watermarking, detection
367367

368368
# Take-home messages
369369

370-
<div class="note-box" data-title="Think about it...">
370+
<div class="note-box" data-title="Recap">
371371

372372
- The same diffusion framework scales across modalities — images (Lecture 22), video (Sora), and audio (AudioLDM) — suggesting **iterative refinement from noise** is a general-purpose generation principle.
373373
- The **spectrogram trick** illustrates a powerful pattern: convert your data into a format where existing tools work, then convert back. Turning audio into "images" unlocks the entire latent diffusion pipeline.

slides/week7/lecture23.pdf

14.5 KB
Binary file not shown.

0 commit comments

Comments
 (0)