Skip to content

Commit b05460d

Browse files
jstacclaudemmcky
authored
Add Reinforcement Learning section (#822)
* Add Reinforcement Learning section with inventory Q-learning lecture Add a new 'Reinforcement Learning' section to the book containing: - inventory_q.md: a new lecture on inventory management via DP and Q-learning - mccall_q.md: moved from the Search section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix errors and improve clarity in inventory Q-learning lecture Mathematical fixes: - Fix argument order in transition function: h(X_t, D_{t+1}, A_t) → h(X_t, A_t, D_{t+1}) to match the definition h(x, a, d) := (x - d) ∨ 0 + a - Rename reward function from r(x, a, d) to π(x, a, d) to resolve notation clash with interest rate r and align with profit notation π_t - Fix action space typography: A := X → 𝖠 := 𝖷 (mathsf consistency) - Fix inconsistent notation in modified update rule: π_{t+1} → R_{t+1} Prose improvements: - Clarify timing language: "after the firm caters to current demand D_{t+1}" → "after demand D_{t+1} is realized and served" - Rewrite Q-table and behavior policy section to carefully distinguish the max in the update target (a scalar value computation) from the behavior policy (the action actually taken). The previous text claimed random actions still yield convergence, which is only true if you understand the max stays in the update — a distinction the text did not make explicit. - Introduce on-policy vs off-policy terminology with explanation - Contrast the optimality operator (max → q*) with the evaluation operator (fixed σ → q^σ) to make the role of the max rigorous - Improve code comments to separate the max value (update target) from the argmax action (behavior policy) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * misc * Fix argmax rendering: update MathJax macro to use operatorname* Updated the global MathJax macros for \argmax and \argmin in _config.yml to use \operatorname*{} so subscripts render directly below in display mode, matching the style of \max. Reverted inline workarounds in inventory_q.md back to \argmax. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Matt McKay <mmcky@users.noreply.github.com>
1 parent 24ad7be commit b05460d

File tree

4 files changed

+740
-3
lines changed

4 files changed

+740
-3
lines changed

lectures/_config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,8 @@ sphinx:
107107
mathjax3_config:
108108
tex:
109109
macros:
110-
"argmax" : "arg\\,max"
111-
"argmin" : "arg\\,min"
110+
"argmax" : ["\\operatorname*{argmax}", 0]
111+
"argmin" : ["\\operatorname*{argmin}", 0]
112112
mathjax_path: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
113113
# Local Redirects
114114
rediraffe_redirects:

lectures/_static/quant-econ.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
###
55
66
@article{evans2005interview,
7-
title={An interview with thomas j. sargent},
7+
title={An interview with Thomas J. Sargent},
88
author={Evans, George W and Honkapohja, Seppo},
99
journal={Macroeconomic Dynamics},
1010
volume={9},

lectures/_toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@ parts:
7373
- file: career
7474
- file: jv
7575
- file: odu
76+
- caption: Reinforcement Learning
77+
numbered: true
78+
chapters:
79+
- file: inventory_q
7680
- file: mccall_q
7781
- caption: Introduction to Optimal Savings
7882
numbered: true

0 commit comments

Comments
 (0)