Rename algorithm captions from Code Fragment to Pseudocode across 18 files

apartsin · claude · apartsin · commit 1bd7ae2c9d46 · 2026-04-06T08:25:45.000+03:00
Algorithm callout boxes should use "Pseudocode X.Y.Z:" prefix instead of
"Code Fragment X.Y.Z:" since they contain pseudocode, not runnable code.
Also fix section 23.1 algorithm caption manually.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/part-1-foundations/module-02-tokenization-subword-models/section-2.2.html b/part-1-foundations/module-02-tokenization-subword-models/section-2.2.html
@@ -158,7 +158,7 @@ <h3>The BPE Merge Algorithm</h3>
 trainer = BpeTrainer(vocab_size=1000, special_tokens=["[UNK]"])
 tokenizer.train(files=["corpus.txt"], trainer=trainer)
 </code></pre>
-<div class="code-caption"><strong>Code Fragment 2.2.12:</strong> Pseudocode for the BPE training algorithm. Starting from individual characters, the algorithm repeatedly merges the most frequent adjacent pair until the target vocabulary size is reached, recording each merge in an ordered table used at inference time.</div>
+<div class="code-caption"><strong>Pseudocode 2.2.12:</strong> Pseudocode for the BPE training algorithm. Starting from individual characters, the algorithm repeatedly merges the most frequent adjacent pair until the target vocabulary size is reached, recording each merge in an ordered table used at inference time.</div>
 </div>
 
  <!-- DIAGRAM 1: BPE merge tree -->
diff --git a/part-1-foundations/module-04-transformer-architecture/section-4.4.html b/part-1-foundations/module-04-transformer-architecture/section-4.4.html
@@ -383,7 +383,7 @@ <h3>Online Softmax</h3>
  )
  return output
 </code></pre>
-<div class="code-caption"><strong>Code Fragment 4.4.6:</strong> The FlashAttention tiling algorithm in pseudocode. By processing Q, K, and V in SRAM-sized blocks and rescaling partial softmax accumulators on the fly, it computes exact attention while reducing HBM reads from quadratic to linear in sequence length.</div>
+<div class="code-caption"><strong>Pseudocode 4.4.6:</strong> The FlashAttention tiling algorithm in pseudocode. By processing Q, K, and V in SRAM-sized blocks and rescaling partial softmax accumulators on the fly, it computes exact attention while reducing HBM reads from quadratic to linear in sequence length.</div>
 </div>
 
 <p>
diff --git a/part-1-foundations/module-05-decoding-text-generation/section-5.1.html b/part-1-foundations/module-05-decoding-text-generation/section-5.1.html
@@ -387,7 +387,7 @@ <h3>Beam Search Step by Step</h3>
 print("Beam-5:", tokenizer.decode(beam_out[0]))</code></pre>
  <div class="code-output">Greedy: The future of artificial intelligence is not just about the ability to create machines that can think and act like humans.
 Beam-5: The future of artificial intelligence is a topic that has been discussed for decades, and it is one that has been</div>
-<div class="code-caption"><strong>Code Fragment 5.1.2:</strong> Each beam: (sequence_tensor, cumulative_log_prob).</div>
+<div class="code-caption"><strong>Pseudocode 5.1.2:</strong> Each beam: (sequence_tensor, cumulative_log_prob).</div>
 </div>
 
 <div class="callout tip">
diff --git a/part-10-frontiers/module-34-emerging-architectures/section-34.3.html b/part-10-frontiers/module-34-emerging-architectures/section-34.3.html
@@ -305,7 +305,7 @@ <h3>2.2 Mamba: Selective State Spaces</h3>
  g. y<sub>t</sub> = C<sub>t</sub> &middot; h <span class="algo-line-comment">// output from state</span>
 <span class="algo-line-keyword">return</span> y
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 34.3.5:</strong> The Mamba selective scan algorithm, showing how input-dependent parameters (B, C, and the step size delta) are computed at each timestep and used to update a compressed hidden state. This input-dependent gating is what distinguishes selective SSMs from their linear, time-invariant predecessors.</div>
+<div class="code-caption"><strong>Pseudocode 34.3.5:</strong> The Mamba selective scan algorithm, showing how input-dependent parameters (B, C, and the step size delta) are computed at each timestep and used to update a compressed hidden state. This input-dependent gating is what distinguishes selective SSMs from their linear, time-invariant predecessors.</div>
  </div>
 
  <p>
diff --git a/part-10-frontiers/module-35-ai-society/section-35.1.html b/part-10-frontiers/module-35-ai-society/section-35.1.html
@@ -122,7 +122,7 @@ <h3>AI Safety via Debate</h3>
 <span class="algo-line-comment">// because the opponent can expose any false claim</span>
 7. <span class="algo-line-keyword">return</span> (verdict, confidence)
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 35.1.1:</strong> The AI Safety via Debate algorithm, where two adversarial models argue opposing sides and a human judge evaluates the transcript. The Nash equilibrium property ensures that truthful argumentation is the dominant strategy, because any false claim can be exposed by the opponent.</div>
+<div class="code-caption"><strong>Pseudocode 35.1.1:</strong> The AI Safety via Debate algorithm, where two adversarial models argue opposing sides and a human judge evaluates the transcript. The Nash equilibrium property ensures that truthful argumentation is the dominant strategy, because any false claim can be exposed by the opponent.</div>
  </div>
 
  <h3>Recursive Reward Modeling</h3>
diff --git a/part-2-understanding-llms/module-07-modern-llm-landscape/section-7.3.html b/part-2-understanding-llms/module-07-modern-llm-landscape/section-7.3.html
@@ -423,7 +423,7 @@ <h3>4.1 The Mechanics</h3>
  # Very hard: tree search with large model
  return mcts_search(hard_model, reward_model, problem,
  n_iterations=200)</code></pre>
-<div class="code-caption"><strong>Code Fragment 7.3.3:</strong> Best-of-N sampling with reward model scoring.</div>
+<div class="code-caption"><strong>Pseudocode 7.3.3:</strong> Best-of-N sampling with reward model scoring.</div>
 </div>
 
  <p>Code Fragment 7.3.3 below puts this into practice.</p>
diff --git a/part-2-understanding-llms/module-08-reasoning-test-time-compute/section-8.3.html b/part-2-understanding-llms/module-08-reasoning-test-time-compute/section-8.3.html
@@ -102,7 +102,7 @@ <h3>1.1 The RLVR Training Loop</h3>
 5. Add KL penalty: L_total = mean(L_i) - beta * KL(pi || pi_ref)
 6. Update pi by gradient ascent on L_total
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 8.3.4:</strong> The RLVR training loop generates solutions, scores them with an automatic verifier, and updates the policy using the reward signal. Because verification is fully automatic, this loop scales to millions of training examples without human annotators.</div>
+<div class="code-caption"><strong>Pseudocode 8.3.4:</strong> The RLVR training loop generates solutions, scores them with an automatic verifier, and updates the policy using the reward signal. Because verification is fully automatic, this loop scales to millions of training examples without human annotators.</div>
 </div>
 
  <p>The power of RLVR lies in the verifier: because correctness checking is automatic, the system can generate millions of training signals without human annotators. This enables RL training at a scale that would be impractical with human feedback.</p>
@@ -121,7 +121,7 @@ <h3>2.1 How GRPO Works</h3>
 
 <div class="callout algorithm">
  <div class="callout-title">Algorithm: Group Relative Policy Optimization (GRPO)</div>
-<div class="code-caption"><strong>Code Fragment 8.3.3:</strong> GRPO computes advantages by normalizing rewards within a group of sampled solutions, eliminating the need for a separate critic model. This halves GPU memory compared to PPO while preserving stable policy updates through ratio clipping and a KL penalty.</div>
+<div class="code-caption"><strong>Pseudocode 8.3.3:</strong> GRPO computes advantages by normalizing rewards within a group of sampled solutions, eliminating the need for a separate critic model. This halves GPU memory compared to PPO while preserving stable policy updates through ratio clipping and a KL penalty.</div>
 </div>
 
  <p>The key insight is in step 3: by normalizing rewards within each group, GRPO converts absolute rewards into relative comparisons. A solution that scores 1.0 in a group where all others also score 1.0 receives zero advantage (nothing to learn from), while the same score in a group of mostly failures receives high positive advantage. This eliminates the need for a separate critic model to estimate expected reward, halving the GPU memory requirement.</p>
diff --git a/part-4-training-adapting/module-17-alignment-rlhf-dpo/section-17.1.html b/part-4-training-adapting/module-17-alignment-rlhf-dpo/section-17.1.html
@@ -316,7 +316,7 @@ <h3>2.2 Stage 2: Reward Model Training</h3>
  Update V to minimize value prediction error
 3. <b>return</b> pi* (the aligned policy)
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 17.1.3:</strong> This pseudocode outlines the PPO-based RLHF training loop, which iterates over prompts to sample responses from the current policy pi_theta, scores them with reward model R, and updates the policy using clipped surrogate loss with a KL penalty weighted by beta against the reference policy pi_ref.</div>
+<div class="code-caption"><strong>Pseudocode 17.1.3:</strong> This pseudocode outlines the PPO-based RLHF training loop, which iterates over prompts to sample responses from the current policy pi_theta, scores them with reward model R, and updates the policy using clipped surrogate loss with a KL penalty weighted by beta against the reference policy pi_ref.</div>
 </div>
 
  <p>The KL penalty in step 2b is critical: without it, the policy can "game" the reward model by producing outputs that score highly but are incoherent or repetitive (a phenomenon called reward hacking). The KL term anchors the policy near the SFT distribution, preserving the model's language capabilities while steering its behavior. The following code demonstrates the PPO training loop with TRL.</p>
diff --git a/part-5-retrieval-conversation/module-20-rag/section-20.1.html b/part-5-retrieval-conversation/module-20-rag/section-20.1.html
@@ -111,7 +111,7 @@ <h3>1.1 The Core RAG Loop</h3>
 4. <b>Generate:</b> response = G(prompt) // LLM generates grounded answer
 5. <b>return</b> response (with source citations from docs)
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 20.1.1:</strong> This pseudocode describes the core RAG pipeline: encode query q with <a class="cross-ref" href="../module-19-embeddings-vector-db/section-19.1.html">embedding model</a> E, retrieve <a class="cross-ref" href="../../part-1-foundations/module-05-decoding-text-generation/section-05.2.html">top-k</a> passages from knowledge base KB by <a class="cross-ref" href="../module-19-embeddings-vector-db/section-19.1.html">cosine similarity</a>, concatenate them into a context string, and pass the augmented prompt to generator G. The output is a grounded response conditioned on retrieved evidence.</div>
+<div class="code-caption"><strong>Pseudocode 20.1.1:</strong> This pseudocode describes the core RAG pipeline: encode query q with <a class="cross-ref" href="../module-19-embeddings-vector-db/section-19.1.html">embedding model</a> E, retrieve <a class="cross-ref" href="../../part-1-foundations/module-05-decoding-text-generation/section-05.2.html">top-k</a> passages from knowledge base KB by <a class="cross-ref" href="../module-19-embeddings-vector-db/section-19.1.html">cosine similarity</a>, concatenate them into a context string, and pass the augmented prompt to generator G. The output is a grounded response conditioned on retrieved evidence.</div>
 </div>
 
  <p>
diff --git a/part-6-agentic-ai/module-22-ai-agents/section-22.1.html b/part-6-agentic-ai/module-22-ai-agents/section-22.1.html
@@ -346,7 +346,7 @@ <h2>3. The ReAct Framework</h2>
  e. Append (Thought, Action, Observation) to context
 3. <b>return</b> "Max steps reached without resolution"
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 22.1.2:</strong> This pseudocode formalizes the ReAct agent loop: given a user task T, tool set, and LLM M, the agent iterates through Thought, Action, and Observation steps up to max_steps S. The loop terminates when the LLM emits a final_answer action or the step budget is exhausted, returning the accumulated trajectory.</div>
+<div class="code-caption"><strong>Pseudocode 22.1.2:</strong> This pseudocode formalizes the ReAct agent loop: given a user task T, tool set, and LLM M, the agent iterates through Thought, Action, and Observation steps up to max_steps S. The loop terminates when the LLM emits a final_answer action or the step budget is exhausted, returning the accumulated trajectory.</div>
 </div>
 
  <p>The key insight is that the explicit reasoning in step 2a (the "Thought") dramatically improves decision quality compared to acting without thinking or thinking without acting. Each thought provides a chain-of-reasoning that is also valuable for debugging when the agent makes mistakes.</p>
diff --git a/part-6-agentic-ai/module-22-ai-agents/section-22.3.html b/part-6-agentic-ai/module-22-ai-agents/section-22.3.html
@@ -84,7 +84,7 @@ <h3>Plan-and-Execute Architecture</h3>
 4. answer = M("Synthesize final answer from all results")
 <span class="algo-line-keyword">return</span> answer
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 22.3.1:</strong> The plan-and-execute algorithm with replanning. The LLM first decomposes a task into numbered steps, executes each one with tools, then reflects after every step to decide whether the remaining plan still makes sense or needs revision. This interleaving of execution and reflection balances forward progress with adaptability.</div>
+<div class="code-caption"><strong>Pseudocode 22.3.1:</strong> The plan-and-execute algorithm with replanning. The LLM first decomposes a task into numbered steps, executes each one with tools, then reflects after every step to decide whether the remaining plan still makes sense or needs revision. This interleaving of execution and reflection balances forward progress with adaptability.</div>
  </div>
 
 <pre><code class="language-python">from langgraph.graph import StateGraph, END
diff --git a/part-6-agentic-ai/module-23-tool-use-protocols/section-23.1.html b/part-6-agentic-ai/module-23-tool-use-protocols/section-23.1.html
@@ -90,7 +90,7 @@ <h2>1. The Function Calling Interface</h2>
  iv. Append tool result message (role="tool", id=tool_call.id) to messages
 3. <span class="algo-line-keyword">return</span> "Max iterations reached"
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 23.1.5:</strong> The function calling loop algorithm. The host sends a user message and tool schemas to the LLM, checks whether the response contains tool calls, executes each requested tool, and feeds the results back until the model produces a plain-text final answer or reaches the iteration limit.</div>
+<div class="code-caption"><strong>Pseudocode 23.1.1:</strong> The function calling loop. The host sends a user message and tool schemas to the LLM, checks whether the response contains tool calls, executes each requested tool, and feeds the results back until the model produces a plain-text final answer or reaches the iteration limit.</div>
  </div>
 
  <h3>OpenAI Function Calling</h3>
diff --git a/part-6-agentic-ai/module-23-tool-use-protocols/section-23.2.html b/part-6-agentic-ai/module-23-tool-use-protocols/section-23.2.html
@@ -98,7 +98,7 @@ <h3>MCP Architecture</h3>
  c. Server executes tool, returns result content
  d. Host feeds result back to LLM for next step
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 23.2.1:</strong> This snippet builds an MCP server using the Python SDK's Server class, registering tools via @server.tool() decorators with typed parameters. The run_database_query tool validates SQL against an allowed_tables list before execution, and the server is started with stdio transport for local process communication.</div>
+<div class="code-caption"><strong>Pseudocode 23.2.1:</strong> This snippet builds an MCP server using the Python SDK's Server class, registering tools via @server.tool() decorators with typed parameters. The run_database_query tool validates SQL against an allowed_tables list before execution, and the server is started with stdio transport for local process communication.</div>
  </div>
 
 <pre><code class="language-python"># Building a simple MCP server with the Python SDK
diff --git a/part-6-agentic-ai/module-24-multi-agent-systems/section-24.2.html b/part-6-agentic-ai/module-24-multi-agent-systems/section-24.2.html
@@ -106,7 +106,7 @@ <h2>2. Advanced Topologies</h2>
 3. final = M("Synthesize results into final answer for T")
 <span class="algo-line-keyword">return</span> final
  </code></pre>
-<div class="code-caption"><strong>Code Fragment 24.2.1:</strong> This snippet implements the supervisor pattern with LangGraph, where a supervisor_node uses an LLM to select the next worker agent (researcher, writer, or FINISH) based on current state. The conditional routing dispatches to the chosen worker's node, and results flow back through the supervisor until it signals completion.</div>
+<div class="code-caption"><strong>Pseudocode 24.2.1:</strong> This snippet implements the supervisor pattern with LangGraph, where a supervisor_node uses an LLM to select the next worker agent (researcher, writer, or FINISH) based on current state. The conditional routing dispatches to the chosen worker's node, and results flow back through the supervisor until it signals completion.</div>
  </div>
 
 <pre><code class="language-python"># Supervisor pattern with LangGraph
diff --git a/part-8-evaluation-production/module-29-evaluation-observability/section-29.2.html b/part-8-evaluation-production/module-29-evaluation-observability/section-29.2.html
@@ -166,9 +166,9 @@ <h2>2. Bootstrap Confidence Intervals</h2>
 <div class="code-output">Accuracy: 0.750
 95% CI: [0.600, 0.875]
 Standard Error: 0.0685</div>
-<div class="code-caption"><strong>Code Fragment 29.2.15:</strong> x0302; = f(S)</div>
+<div class="code-caption"><strong>Pseudocode 29.2.15:</strong> x0302; = f(S)</div>
 <!-- FIXME: stacked caption, needs manual review -->
-<div class="code-caption"><strong>Code Fragment 29.2.1:</strong> Generating bootstrap samples in a single vectorized operation for efficient paired comparison testing between two model variants.</div>
+<div class="code-caption"><strong>Pseudocode 29.2.1:</strong> Generating bootstrap samples in a single vectorized operation for efficient paired comparison testing between two model variants.</div>
 
  <div class="callout note">
  <div class="callout-title">How Many Bootstrap Samples?</div>
diff --git a/part-8-evaluation-production/module-31-production-engineering/section-31.3.html b/part-8-evaluation-production/module-31-production-engineering/section-31.3.html
diff --git a/part-9-safety-strategy/module-32-safety-ethics-regulation/section-32.3.html b/part-9-safety-strategy/module-32-safety-ethics-regulation/section-32.3.html
diff --git a/part-9-safety-strategy/module-32-safety-ethics-regulation/section-32.8.html b/part-9-safety-strategy/module-32-safety-ethics-regulation/section-32.8.html

Original file line number	Diff line number	Diff line change
`@@ -383,7 +383,7 @@ <h3>Online Softmax</h3>`
`383`	`383`	`)`
`384`	`384`	`return output`
`385`	`385`	`</code></pre>`
`386`		`-<div class="code-caption"><strong>Code Fragment 4.4.6:</strong> The FlashAttention tiling algorithm in pseudocode. By processing Q, K, and V in SRAM-sized blocks and rescaling partial softmax accumulators on the fly, it computes exact attention while reducing HBM reads from quadratic to linear in sequence length.</div>`
	`386`	`+<div class="code-caption"><strong>Pseudocode 4.4.6:</strong> The FlashAttention tiling algorithm in pseudocode. By processing Q, K, and V in SRAM-sized blocks and rescaling partial softmax accumulators on the fly, it computes exact attention while reducing HBM reads from quadratic to linear in sequence length.</div>`
`387`	`387`	`</div>`
`388`	`388`
`389`	`389`	`<p>`