Commit b05460d
Add Reinforcement Learning section (#822)
* Add Reinforcement Learning section with inventory Q-learning lecture
Add a new 'Reinforcement Learning' section to the book containing:
- inventory_q.md: a new lecture on inventory management via DP and Q-learning
- mccall_q.md: moved from the Search section
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix errors and improve clarity in inventory Q-learning lecture
Mathematical fixes:
- Fix argument order in transition function: h(X_t, D_{t+1}, A_t) → h(X_t, A_t, D_{t+1})
to match the definition h(x, a, d) := (x - d) ∨ 0 + a
- Rename reward function from r(x, a, d) to π(x, a, d) to resolve notation
clash with interest rate r and align with profit notation π_t
- Fix action space typography: A := X → 𝖠 := 𝖷 (mathsf consistency)
- Fix inconsistent notation in modified update rule: π_{t+1} → R_{t+1}
Prose improvements:
- Clarify timing language: "after the firm caters to current demand D_{t+1}"
→ "after demand D_{t+1} is realized and served"
- Rewrite Q-table and behavior policy section to carefully distinguish
the max in the update target (a scalar value computation) from the
behavior policy (the action actually taken). The previous text claimed
random actions still yield convergence, which is only true if you
understand the max stays in the update — a distinction the text did
not make explicit.
- Introduce on-policy vs off-policy terminology with explanation
- Contrast the optimality operator (max → q*) with the evaluation
operator (fixed σ → q^σ) to make the role of the max rigorous
- Improve code comments to separate the max value (update target) from
the argmax action (behavior policy)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* misc
* Fix argmax rendering: update MathJax macro to use operatorname*
Updated the global MathJax macros for \argmax and \argmin in _config.yml
to use \operatorname*{} so subscripts render directly below in display
mode, matching the style of \max. Reverted inline workarounds in
inventory_q.md back to \argmax.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Matt McKay <mmcky@users.noreply.github.com>1 parent 24ad7be commit b05460d
File tree
4 files changed
+740
-3
lines changed- lectures
- _static
4 files changed
+740
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
111 | | - | |
| 110 | + | |
| 111 | + | |
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
76 | 80 | | |
77 | 81 | | |
78 | 82 | | |
| |||
0 commit comments