Skip to content

Latest commit

 

History

History
236 lines (204 loc) · 14.2 KB

File metadata and controls

236 lines (204 loc) · 14.2 KB

Task Registry: LLMBook Content Expansion (2026-04-04)

Status Legend

  • Completed
  • [~] In progress (background agent)
  • Pending

Phase 1: Appendix Stub Completion

  • Appendix K (HuggingFace): 5 sections written
  • Appendix L (LangChain): 5 sections written
  • Appendix M (LangGraph): 5 sections written
  • Appendix N (CrewAI): 5 sections written
  • Appendix O (LlamaIndex): 5 sections written
  • Appendix P (Semantic Kernel): 5 sections written
  • Appendix Q (DSPy): 5 sections written
  • Appendix R (Experiment Tracking): R.1-R.3 existed, R.4 (Model Registry), R.5 (LLM Eval Dashboards) written
  • Appendix S (Inference Serving): 5 sections written (vLLM, TGI, SGLang, Quantization, Scaling)
  • Appendix T (Distributed ML): 5 sections written (Databricks, Delta Lake, Ray, Feature Stores, Pipelines)
  • Appendix U (Docker Containers): 4 sections written
  • Appendix V (Tooling Ecosystem): 3 sections written

Phase 2: Missing Content from missing_topics2.md

  • OpenTelemetry for LLM Applications (section-30.5)
  • AI Gateways and Model Routing (section-31.5)
  • Voice Agents and Speech Interfaces (section-21.6)
  • RAG Ingestion Pipelines (Airbyte, Unstructured, Tika, Dagster) - section-20.8
  • Supply-chain Security for Agent Sandboxes (Trivy, Syft, Cosign, SLSA) - section-26.7
  • SWE-bench Agentic Software Engineering Evaluation - section-25.6
  • Edge/On-device LLM Deployment (MLX, ExecuTorch, Ollama) - section-31.7
  • Human Feedback Tooling (Label Studio, Argilla, LangSmith) - section-29.12
  • GraphRAG Implementation (section-20.7)
  • Reproducible Agent Benchmarks: covered in section-22.8

Phase 3: Missing Content from research_topics_gaps.md

  • Evaluation Harnesses (Inspect, lighteval, lm-eval-harness) - section-29.9
  • LLM-as-Judge Reliability (G-Eval, Prometheus, JudgeLM) - section-29.10
  • Automated Red Teaming (HarmBench, JailbreakBench, garak) - section-32.11
  • Agentic Security Benchmarks (b3, InjecAgent, tau-bench) - section-26.6
  • Research Replication Benchmarks (PaperBench, CORE-Bench, MLE-bench) - section-22.8
  • Long-context Benchmarks (LongBench v2, RULER, NIAH) - section-29.11
  • GraphRAG Research Module - section-20.7
  • Formal Reasoning with Proof Assistants (LeanDojo, miniF2F) - section-8.6
  • Expand Unlearning with WMDP Benchmark - expanded section-27.7
  • Expand Interpretability with SAE Scaling + sparsify - expanded section-18.2
  • Cross-cultural NLP and Pluralistic Alignment - section-32.10
  • GPU Kernel Programming (Triton tutorial) - section-9.7
  • Embodied Multimodal Agents (OpenVLA, Octo, Habitat) - section-27.5
  • Green AI / Environmental Impact - section-32.11
  • Privacy Attacks and Differential Privacy - section-32.12
  • Human-AI Interaction Patterns and UX Evaluation - section-21.7

Phase 4: Missing Content from llmbook_missing_content_table.md

  • Workflow Orchestration / Durable Execution (Temporal, Inngest) - section-31.6
  • Dataset Engineering for Applications (logs, conversations, data contracts) - section-12.8
  • Reliability Engineering for LLM Apps (failure taxonomy, SLOs, chaos testing) - section-31.8
  • Research Methodology for LLM Papers (experiment design, ablations, artifacts) - section-29.13
  • Enterprise Integration Patterns (auth, RBAC, tenant isolation, governance) - section-33.7
  • Economic Design of LLM Systems (token budgeting, cascade, caching costs) - section-33.8
  • Code/Work Workflows and Agentic Systems (Claude Code, Claude Works, Codex, Devin, Cursor) - section-25.7

Phase 4b: Additional Content (gap_scale.md, Two Minute Papers, awesome-* repos)

  • AI for Scientific Discovery & Research Automation - section-24.8
  • LLM Applications Across Industries - section-24.9
  • Automatic Prompt & Context Engineering (DSPy, OPRO, TextGrad) - section-11.6
  • Analysis and Quality of AI-Generated Code - section-25.8
  • Production LLM Training Systems (Megatron, Elastic Training, Fault Tolerance) - section-6.8
  • Kubernetes-Native LLM Operations (Scheduling, Serving, GPU Management) - section-31.9
  • LLM Performance Benchmarking and Cross-Hardware Portability - section-29.14
  • LLM-Powered Robotics - section-27.6
  • 3D Gaussian Splatting for LLM-Guided Scene Editing - section-27.7
  • World Models: Video Generation, Simulation, Embodied Reasoning - section-34.4
  • 71 library shortcut callouts added across Parts 1-9

Phase 4c: Frontier Topics (10 candidates under evaluation)

Group A: Engineering Frontier

  • Reliability engineering for agents under production stress (section-35.5)
  • Observability, testing, and CI/CD for agent workflows (section-35.6)
  • Memory architectures that improve execution (section-35.7)
  • Efficient multi-tool orchestration and tool economy (section-34.9)
  • Self-improving and adaptive agents in deployment loops (section-35.8)

Group B: Foundational/Theoretical

  • A theory of reasoning in LLMs (section-34.5)
  • World models and internal representations of reality (covered in existing section-34.4)
  • Memory as a computational primitive (section-34.6)
  • Mechanistic understanding and interpretability of learned computation (section-34.7)
  • The nature of agency: when does a model become an agent? (section-34.8)

Additional

  • The future of human-AI collaboration (section-35.9)

Phase 5: Low Priority / Optional

  • Grammar-constrained Decoding Expansion (LMQL, SGLang, Outlines FSM, jsonformer, Guidance, comparison table, 5 bib entries)

Phase 6: Pre-commit Tasks

  • Launch scout agents for all new chapters/sections (search content, libs, 2025-2026 trends)
  • Update Table of Contents (toc.html) with all new sections and appendices K-T
  • Fix math blocks across entire book ($$\textbf mixing prose with LaTeX) - script + audit check completed
  • SVG text clipping audit check + fix (128 issues fixed, 5 intentional remaining)
  • Math blocks fix script (fix_math_blocks.py ran, 0 remaining issues)
  • LaTeX formula fixes: Bradley-Terry, DPO, PPO, KTO, IPO, SimPO, ORPO across 4 alignment files
  • LaTeX syntax audit check (p1_latex_syntax.py) + fix script (fix_latex_funcs.py): 16 blocks in 15 files
  • Prose-in-formula fixes: 11 files cleaned by background agent
  • SVG text right-clip audit check (p1_svg_text_right_clip.py) + fix script: 37 SVGs in 35 files
  • KV memory formula fix (section-9.2.html)
  • Extend audit script with new checks, run full audit, fix all issues
    • 7 new audit checks created: BROKEN_FIGURE_REF, FIGURE_SEQUENCE, MIXED_CAPTION_STYLE, TOC_LINK_TARGET, ORPHAN_TAG_BEFORE_MAIN, UNESCAPED_AMPERSAND_TITLE, TRIPLE_DOLLAR_MATH
    • Fixed: 74 unescaped ampersands in titles, 6 triple-dollar math, 7 TOC links, 2 orphan divs
    • Agent-created checks: TH_SCOPE_MISMATCH (937 issues), CHAPTER_LABEL_ON_ANCHOR
    • 5 analysis agents completed (Parts 1-2, 3-4, 5-6, 7-8, 9-10+appendices)
    • Fix 937 th scope mismatch issues (fix_th_scope.py, 161 files)
    • Fix 91 unclosed p-in-div issues (fix_unclosed_p.py, 32 files)
    • Fix 1439 manual highlight spans in code blocks (fix_manual_highlights.py, 31 files)
    • Fix remaining broken figure refs (0 remaining after current audit)
    • Consolidate callout system: 13 types, quiz->self-check, key-takeaway->key-insight, practical-tip/best-practice->tip
    • Fix Knowledge Check h3 -> callout-title Self-Check (62 files)
    • Fix double bibliography icons (51 files)
    • Fix unclosed callout divs (8 files)
    • New audit checks: FM4_PROMISE, SECTION_STRUCTURE, SVG_OVERLAP
    • Fix 360 double-encoded ampersands across 190 files
    • Remove 46 fun-notes from 25 module index files
    • Fix element ordering in 8 chapter indexes (prereqs before objectives)
    • Fix Part 2 missing chapter cards (Modules 08, 18)
    • Add audit checks: CHAPTER_STARTER, INDEX_ORDER, PART_INDEX, SECTION_CALLOUT
    • Fix PART_LABEL_FORMAT check (accept Roman numerals, 148 false positives eliminated)
    • Fix P0 duplicate Code Fragment 5.3.3 (renumbered to 5.3.3-8)
    • Fix remaining P1 issues (PLACEHOLDER_CONTENT, UNCLOSED_P, BARE_CODE, misc agents completed)
    • Fix 227 unclosed callout divs via SECTION_STRUCTURE nesting checker (depth-tracking stack)
    • Fix SECTION_STRUCTURE nesting checker false positives (1015 reduced to 227 real issues, then 0)
    • Fix 6 UNUSED_VENDOR issues (unused KaTeX/Prism removed from 6 files)
    • Refine PLACEHOLDER_CONTENT check (72 false positives eliminated)
    • Final P0+P1 audit: 0 P0, 0 P1 remaining (207 P2 cosmetic issues)
  • Create CONTENT_GUIDELINES.md (prevention guide for content-generating agents)
  • Audit appendices vs book content for duplication (5 critical, 390 missing cross-refs)
    • Add cross-reference callouts (Appendix K/S/G already cross-referenced in relevant chapters)
    • Deduplicate Appendix B vs Module 0 (reframed as companion, cross-refs added)
    • Deduplicate Appendix S vs Module 9.2/9.4 (residual dupes removed, bidirectional links)
  • Split FM.1 into FM.1a (What This Book Covers) + FM.1b (Who Should Read This Book)
  • Renumber front matter index (FM.1 through FM.9)
  • Fix appendix header font color in CSS (#5a6672 to #ffffff)
  • Author bio updates (removed workshop line from first author, added defense/grants to second author)
  • Update front matter to reflect current book content and features
  • Resolve duplicate content in sections 29.6 and 30.2 (30.2 kept as canonical, 29.6 redirects)
  • Reorganize scripts/ into subfolders: fix/ (24), detect/ (18), generate/ (8), data/ (4), _archive/ (113)
  • Archive 23 root-level one-shot _*.py scripts into scripts/_archive/
  • Merge old _scripts_archive/ (64 files) into scripts/_archive/
  • Move author photos to images/, remove empty dirs, clean root
  • Add scripts/README.md to book-skills documenting generalizable scripts
  • Resolve duplicate content in sections 29.6 and 30.2 (30.2 kept as canonical, 29.6 redirects)
  • Update front matter to reflect current book content and features
  • Merge duplicate part-6 directories (88 files rewritten, old archived to _archive/old-part-dirs/)
  • Merge duplicate part-7 directories (same pass, 114 total path rewrites)
  • Clean each subfolder of old/orphan files (132 build artifacts + 99 old part files archived)
  • Cross-reference hyperlinks pass (346 links across 166 files)
  • Fix 22 broken figure refs across 12 files
  • Fix P0 audit issues (broken xrefs, dup figures, SVG title)
  • Update front matter to reflect current 10-part structure and appendices A-V
  • Standardize .part-overview CSS to match .overview (accent border-left, max-width 750px)
  • Remove non-canonical content from indexes (bare fun-notes, time-estimate, stray h2)
  • Standardize bibliography format across all section files (41 files: card-based "References & Further Reading")
  • Fix SECTION_ORDER: callouts/labs between whats-next and bibliography (44 files, fix_post_whatsnext_content.py)
  • Update agent skills with "Right Tool" principle (SKILL.md, 00, 02, 08, 33, 36, 40)
  • Commit and push all changes

Phase 6b: Depth and Cross-referencing

  • Model internals depth pass, Parts 1-4 (reasoning models, architectures, DPO variants, PEFT)
    • 14 numeric examples + 11 library shortcuts added across Parts 1-5
  • Model internals depth pass, Parts 5-10 (RAG, agents, multimodal, evaluation, safety)
    • 11 numeric examples + 8 library shortcuts added across Parts 6-10
  • [~] Cross-reference hyperlinks pass across all HTML (agent completed, needs review)
  • Update MetaAgent skills for depth/inner working requirements

Phase 6c: Missing Features (from FM4_PROMISE + SECTION_STRUCTURE audit)

Agent: exercise-designer (07) + writing agents

  • Add level badges to Modules 0-5 (Part 1) and Module 7 (199 badges across 27 files)
  • Add Warning callout to Module 35 (AI and Society)

Agent: research-scientist (18) + writing agents

  • Add Research Frontier section to Module 23 (Tool Use and Protocols)
  • Add Research Frontier section to Module 24 (Multi-Agent Systems)
  • Add Research Frontier section to Module 26 (Agent Safety and Production)

Agent: chapter-lead (00) + writing agents

  • Add annotated bibliography to Module 23 (11 refs)
  • Add annotated bibliography to Module 24 (12 refs)

Agent: exercise-designer (07) - lab format standardization

  • Create LAB_COVERAGE audit check (p2_lab_coverage.py)
  • Lab gap analysis: 16/36 modules have labs, 20 need labs created
    • Missing labs: Modules 0-12, 27-31, 34-35
    • Existing labs: Modules 13-26, 32-33 (Parts 4-6, 9)
  • Create 20 hands-on labs for modules without labs
    • Modules 0-6: 6 labs (timeline, embeddings, text processing, attention algebra, transformer block, decoding)
    • Modules 7-12: 6 labs (multi-head attention, GPT-2 generation, quantization, data curation, pretraining, LoRA)
    • Modules 27-31: 5 labs (vision-language, speech-to-text, evaluation suite, MLOps, deployment)
    • Modules 34-35: 2 labs (interpretability, responsible AI dashboard)
    • All labs follow "Right Tool" pattern: from-scratch then library shortcut

Phase 7: Full Agent Passes

  • Agent pass 1: Full book-writing agent pass over ALL HTML files
  • Agent pass 2: Illustrations, diagrams, mental models, and analogies for key concepts
  • Agent pass 3: Optimize and streamline book structure (redundancy, flow, chapter ordering)
  • Agent pass 4: For each code example, add library shortcut (search popular libs implementing same in fewer lines)

Previously Completed (from earlier sessions)

  • Create 5 new audit check plugins from deep review findings
  • Reorder front matter sections for readability
  • Update author bios
  • Fix section-22.1 audit issues
  • Compact pathway chapter guides (291 pw-reason descriptors across 20 pathways)
  • Add prerequisites to 9 syllabus pages
  • Promote repeating styles to book.css (syllabus-table, agent-card)
  • Fix 27 broken cross-reference links
  • Fix 9 orphan content files (epigraph/prereqs outside main)
  • Fix 9 vague headings
  • Fix 45 section ordering violations
  • Remove 351 redundant SVG titles from 192 files
  • Delete 5 orphaned agent avatars
  • Add 7 new audit check plugins
  • Generate 24 pathway/course icons via Gemini