ApartsinProjects
diff --git a/‎part-10-frontiers/index.html‎
Lines changed: 0 additions & 2 deletions b/‎part-10-frontiers/index.html‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎part-10-frontiers/module-34-emerging-architectures/index.html‎
Lines changed: 0 additions & 2 deletions b/‎part-10-frontiers/module-34-emerging-architectures/index.html‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎part-10-frontiers/module-34-emerging-architectures/section-34.1.html‎
Lines changed: 0 additions & 2 deletions b/‎part-10-frontiers/module-34-emerging-architectures/section-34.1.html‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎part-10-frontiers/module-34-emerging-architectures/section-34.2.html‎
Lines changed: 0 additions & 1 deletion b/‎part-10-frontiers/module-34-emerging-architectures/section-34.2.html‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎part-10-frontiers/module-34-emerging-architectures/section-34.3.html‎
Lines changed: 0 additions & 9 deletions b/‎part-10-frontiers/module-34-emerging-architectures/section-34.3.html‎
Lines changed: 0 additions & 9 deletions
diff --git a/‎part-10-frontiers/module-34-emerging-architectures/section-34.4.html‎
Lines changed: 0 additions & 18 deletions b/‎part-10-frontiers/module-34-emerging-architectures/section-34.4.html‎
Lines changed: 0 additions & 18 deletions
@@ -69,8 +69,6 @@ <h3>What's Next?</h3>
         <p>You have reached the final part of the book. Return to the <a href="../index.html">Book Home</a> to revisit any topic, or explore the <a href="../toc.html">Appendices</a> for additional reference material.</p>
     </div>
 
-    
-
     <footer>
         <p class="footer-title">Building Conversational AI with LLMs and Agents, Fifth Edition</p>
         <p>&copy; 2026 Alexander Apartsin &amp; Yehudit Aperstein &middot; <a href="../toc.html">Contents</a></p>
 
@@ -116,8 +116,6 @@ <h3>What's Next?</h3>
     <a class="next" href="section-34.1.html">Emergent Abilities: Real or Mirage?</a>
 </nav>
 
-    
-
     <footer>
         <p class="footer-title">Building Conversational AI with LLMs and Agents, Fifth Edition</p>
         <p>&copy; 2026 Alexander Apartsin &amp; Yehudit Aperstein &middot; <a href="../../toc.html">Contents</a></p>
 
@@ -275,13 +275,11 @@ <h2>Exercises <span class="level-badge intermediate" title="INTERMEDIATE">INTERM
         </details>
     </div>
 
-
 <div class="whats-next">
     <h2>What Comes Next</h2>
     <p>In the next section, <a href="section-34.2.html" class="cross-ref">Section 34.2: Scaling Frontiers</a>, we examine the data wall, compute economics, and architectural alternatives that will shape the next generation of frontier models.</p>
 </div>
 
-
 <div class="callout tip">
         <div class="callout-title">Tip: Prototype with Established Models, Ship with New Ones</div>
         <p>Build your prototype and evaluation pipeline using a well-understood model. Once you have a working system with clear metrics, swap in newer architectures and compare. This way, the novelty of the architecture does not mask pipeline bugs.</p>
 
@@ -325,7 +325,6 @@ <h2>What Comes Next</h2>
     <p>In the next section, <a href="section-34.3.html" class="cross-ref">Section 34.3: Alignment Research Frontiers</a>, we explore the open problems in aligning AI systems with human values, including scalable oversight, weak-to-strong generalization, and reward hacking.</p>
 </div>
 
-
 <div class="callout tip">
         <div class="callout-title">Tip: Watch for State Space Model Deployments</div>
         <p>Models like Mamba offer O(n) inference instead of O(<span class="math">$n^{2}$</span>) for transformers, making them promising for very long sequences. If your application processes documents over 100K tokens, benchmark an SSM variant alongside your transformer baseline.</p>
 
@@ -78,7 +78,6 @@ <h2>1. The Scaling Problem with Self-Attention <span class="level-badge basic" t
         for choosing the right architecture for a given deployment scenario.
     </p>
 
-
     <h2>2. State Space Models: S4, Mamba, and Mamba-2 <span class="level-badge advanced" title="Advanced">Advanced</span></h2>
 
     <p>
@@ -430,7 +429,6 @@ <h3>3.3 Griffin and RecurrentGemma</h3>
         outperforms pure SSM or pure attention models of the same size.
     </p>
 
-
     <div class="callout key-insight">
         <div class="callout-title">Key Insight</div>
         <p><strong>The attention versus efficiency tradeoff is not all-or-nothing.</strong> The research trajectory is moving away from "replace attention entirely" toward "use attention surgically." Pure SSM models sacrifice recall precision on tasks that require exact matching or retrieval from earlier in the context. Pure attention models pay quadratic cost for every token, even when most tokens do not need to attend to most other tokens. The emerging consensus is that hybrid architectures (attention for precision-critical layers, linear recurrence for everything else) may dominate both pure approaches. For practitioners, this means that the <a class="cross-ref" href="../../part-2-understanding-llms/module-09-inference-optimization/section-9.1.html">inference optimization techniques from Chapter 09</a> (KV-cache management, continuous batching) will remain relevant even as architectures evolve, because attention layers will likely persist in some form.</p>
@@ -553,7 +551,6 @@ <h3>4.2 Design Principles for Hybrid Architectures</h3>
         keeping memory costs bounded. The SSM layers handle global context propagation.
     </p>
 
-
     <h2>5. Efficiency Comparisons and Benchmarks <span class="level-badge intermediate" title="Intermediate">Intermediate</span></h2>
 
     <p>
@@ -667,7 +664,6 @@ <h2>5. Efficiency Comparisons and Benchmarks <span class="level-badge intermedia
         <p><strong>The "Needle in a Haystack" test reveals the retrieval gap.</strong> In this test, a specific fact is inserted at a random position in a long document, and the model must retrieve it from a distant point. Transformers with full attention achieve near-perfect accuracy at all positions and context lengths. Pure SSM models show degradation for facts placed in the middle of very long sequences (the "lost in the middle" effect is amplified by compressed state). Hybrid models split the difference, achieving strong retrieval when the fact falls within an attention window and moderate retrieval otherwise. This test remains the clearest diagnostic for evaluating alternative architectures.</p>
     </div>
 
-
     <h2>6. When to Consider Non-Transformer Architectures <span class="level-badge intermediate" title="Intermediate">Intermediate</span></h2>
 
     <p>
@@ -711,7 +707,6 @@ <h2>6. When to Consider Non-Transformer Architectures <span class="level-badge i
         longer sequence lengths, and the model availability gap is significant.
     </p>
 
-
     <h2>7. Neuromorphic and Event-Driven Approaches <span class="level-badge advanced" title="Advanced">Advanced</span></h2>
 
     <p>
@@ -737,13 +732,11 @@ <h2>7. Neuromorphic and Event-Driven Approaches <span class="level-badge advance
         remain in the research stage and are not yet practical for production deployment.
     </p>
 
-
     <div class="callout research-frontier">
         <div class="callout-title">Research Frontier</div>
         <p><strong>The convergence of architectures.</strong> Mamba-2's state space duality theorem suggests that SSMs and attention may be endpoints on a spectrum rather than fundamentally different approaches. Recent work on "linear attention" (Katharopoulos et al., Yang et al.) and "gated linear attention" further blurs the boundary. The research community is moving toward a unified framework where the architectural choice is a hyperparameter (how much to compress the state) rather than a philosophical commitment. Watch for architectures that can dynamically adjust their compression ratio per layer and per input, spending full attention on tokens that need it and using compressed state for the rest.</p>
     </div>
 
-
     <div class="callout exercise">
   <div class="callout-title">Exercise</div>
   <span class="exercise-type conceptual">Conceptual</span>
@@ -789,7 +782,6 @@ <h2>What Comes Next</h2>
         </p>
     </div>
 
-
     <section class="bibliography">
         <h2>Bibliography and Further Reading</h2>
         <ul>
@@ -810,7 +802,6 @@ <h2>Bibliography and Further Reading</h2>
     <a class="next" href="section-34.4.html">World Models: Video Generation, Simulation, and Embodied Reasoning</a>
 </nav>
 
-
     <footer>
         <p class="footer-title">Building Conversational AI with LLMs and Agents, Fifth Edition</p>
         <p>&copy; 2026 Alexander Apartsin &amp; Yehudit Aperstein &middot; <a href="../../toc.html">Contents</a></p>
 
@@ -47,7 +47,6 @@ <h3>Prerequisites</h3>
         <p><strong>Language models predict the next token. World models predict the next state of reality.</strong> A new class of models is learning to simulate physical environments: generating photorealistic video, modeling driving scenarios, and building interactive 3D worlds from single images. These world models bridge the gap between language understanding and physical reasoning, enabling agents to plan by imagining future outcomes rather than relying solely on textual inference. This section surveys the architectures, applications, and open problems in this rapidly evolving frontier.</p>
     </div>
 
-
     <h2>1. What Are World Models? <span class="level-badge basic" title="Basic">Basic</span></h2>
 
     <p>
@@ -96,7 +95,6 @@ <h2>1. What Are World Models? <span class="level-badge basic" title="Basic">Basi
         <p>The model is trained to minimize the prediction error between $\hat{z}_{t+1}$ and $z_{t+1}$ (the actual next latent state), enabling the agent to "imagine" trajectories without environment interaction.</p>
     </div>
 
-
     <h2>2. Video Generation as World Simulation <span class="level-badge intermediate" title="Intermediate">Intermediate</span></h2>
 
     <p>
@@ -194,7 +192,6 @@ <h3>2.4 Evaluating Physical Plausibility</h3>
         move closer or farther from the camera?
     </p>
 
-
     <h2>3. Autonomous Driving World Models <span class="level-badge intermediate" title="Intermediate">Intermediate</span></h2>
 
     <p>
@@ -306,7 +303,6 @@ <h3>3.3 Sim-to-Real Transfer</h3>
 
     <p class="caption"><strong>Table 34.4.1:</strong> Comparison of major autonomous driving world models. Parameter counts are approximate and reflect the world model component only.</p>
 
-
     <h2>4. Interactive World Models <span class="level-badge advanced" title="Advanced">Advanced</span></h2>
 
     <p>
@@ -358,7 +354,6 @@ <h3>4.2 Game Engine Replacement</h3>
         <p><strong>From image to playable world.</strong> Consider a game designer who has a concept painting of a medieval village. With a traditional engine, turning this into a playable environment requires weeks of 3D modeling, texturing, physics setup, and scripting. With an interactive world model like Genie 2, the designer uploads the painting and immediately gets a navigable environment: they can walk through the village streets, enter buildings, and interact with objects. The environment is not geometrically precise (it is hallucinated from the image), but it serves as a rapid prototype that can be refined. This workflow compresses the concepting phase from weeks to minutes.</p>
     </div>
 
-
     <h2>5. Agent Planning with World Models <span class="level-badge advanced" title="Advanced">Advanced</span></h2>
 
     <p>
@@ -442,7 +437,6 @@ <h3>5.3 LLM-Scale World Models for Agent Planning</h3>
         <p><strong>World models as the missing link for embodied AI.</strong> Current LLM-based agents (covered in <a class="cross-ref" href="../../part-6-agentic-ai/module-22-ai-agents/index.html">Chapter 22</a>) operate primarily in digital environments: browsing the web, writing code, managing files. Extending these agents to physical environments (robots, autonomous vehicles, manufacturing) requires grounding their reasoning in physical reality. World models provide this grounding by giving the agent an internal simulator it can query. The convergence of LLM reasoning and world model simulation is widely considered one of the most important research directions in AI as of 2026.</p>
     </div>
 
-
     <h2>6. Limitations and Open Problems <span class="level-badge intermediate" title="Intermediate">Intermediate</span></h2>
 
     <p>
@@ -546,7 +540,6 @@ <h3>6.5 When World Models Fail</h3>
         <p>A world model that is 99% accurate may still be catastrophically wrong in the 1% of cases that matter most. In safety-critical applications, world models should augment (not replace) traditional physics simulators and real-world testing. The convenience of unlimited synthetic data does not eliminate the need for rigorous validation against ground truth.</p>
     </div>
 
-
     <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class="level-badge advanced" title="Advanced">Advanced</span></h2>
 
     <p>
@@ -571,7 +564,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
 from einops import rearrange, repeat
 import math
 
-
 class SinusoidalPosEmb(nn.Module):
     """Sinusoidal positional embedding for diffusion timesteps."""
     def __init__(self, dim):
@@ -586,7 +578,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
         emb = t[:, None].float() * emb[None, :]
         return torch.cat([emb.sin(), emb.cos()], dim=-1)
 
-
 class PatchEmbed(nn.Module):
     """Convert image frames into patch tokens."""
     def __init__(self, img_size=64, patch_size=8, in_channels=3, embed_dim=256):
@@ -602,7 +593,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
         x = self.proj(x)
         return rearrange(x, "b c h w -> b (h w) c")
 
-
 class WorldModelBlock(nn.Module):
     """Transformer block with cross-attention for action conditioning."""
     def __init__(self, dim=256, heads=8, mlp_ratio=4):
@@ -629,7 +619,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
         x = x + self.mlp(self.norm3(x))
         return x
 
-
 class SimpleWorldModel(nn.Module):
     """
     A minimal diffusion-transformer world model.
@@ -757,7 +746,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
 from torch.utils.data import DataLoader, Dataset
 import numpy as np
 
-
 class SyntheticPhysicsDataset(Dataset):
     """
     Generate simple 2D physics scenarios: a ball bouncing in a box.
@@ -811,7 +799,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
             torch.from_numpy(action),
         )
 
-
 def train_world_model(epochs=50, batch_size=32, lr=1e-4):
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
@@ -847,7 +834,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
 
     return model
 
-
 if __name__ == "__main__":
     model = train_world_model()
     torch.save(model.state_dict(), "world_model_bouncing_ball.pt")
@@ -859,7 +845,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
         <p><strong>The bouncing ball is a microcosm of world modeling.</strong> Even this simple scenario exercises the core challenges: the model must learn that the ball persists between frames (object permanence), that it moves in a consistent direction (momentum), that it reverses direction at boundaries (collision physics), and that the applied action influences its trajectory (action conditioning). Scaling these same principles to photorealistic video of complex environments is an engineering challenge, not a conceptual one.</p>
     </div>
 
-
     <div class="callout exercise">
         <div class="callout-title">Exercise</div>
         <span class="exercise-type conceptual">Conceptual</span>
@@ -895,7 +880,6 @@ <h2>7. Practical Lab: Video Prediction with a Diffusion Transformer <span class=
         <p>Explain the "drift" problem in autoregressive world models: why do prediction errors compound over time? Describe three approaches to mitigating drift (re-anchoring, hierarchical generation, explicit memory) and analyze the tradeoffs of each. Calculate: if a model introduces an average positional error of 0.5 pixels per frame for a moving object, how far off will the predicted position be after 100 frames? After 1,000 frames? What does this imply about the maximum useful planning horizon?</p>
     </div>
 
-
     <div class="whats-next">
         <h2>What Comes Next</h2>
         <p>
@@ -906,7 +890,6 @@ <h2>What Comes Next</h2>
         </p>
     </div>
 
-
     <section class="bibliography">
         <h2>Bibliography and Further Reading</h2>
         <ul>
@@ -930,7 +913,6 @@ <h2>Bibliography and Further Reading</h2>
     <a class="next" href="../module-35-ai-society/index.html">Chapter 35: AI, Society &amp; Open Problems</a>
 </nav>
 
-
     <footer>
         <p class="footer-title">Building Conversational AI with LLMs and Agents, Fifth Edition</p>
         <p>&copy; 2026 Alexander Apartsin &amp; Yehudit Aperstein &middot; <a href="../../toc.html">Contents</a></p>