|
67 | 67 | "id": "c409c9b2", |
68 | 68 | "metadata": {}, |
69 | 69 | "source": [ |
70 | | - "<div align=\"center\"><img src=\"T-Maze.png\" width=\"200\"></div>" |
| 70 | + "" |
71 | 71 | ] |
72 | 72 | }, |
73 | 73 | { |
|
165 | 165 | "\n", |
166 | 166 | "For a candidate policy $u$, the **Expected Free Energy** $G(u)$ can be decomposed to a form with preferences over states, defined as:\n", |
167 | 167 | "\n", |
168 | | - "$$\n", |
169 | | - "G(u) \\;=\\;\n", |
| 168 | + "$$G(u) \\;=\\;\n", |
170 | 169 | "\\underbrace{D_{KL}\\!\\bigl[q(x\\!\\mid\\!u)\\;\\|\\;\\hat{p}(x)\\bigr]}_{\\text{risk}}\n", |
171 | | - "+\\underbrace{\\mathbb{E}_{q(x|u)}\\!\\bigl[H[q(y\\!\\mid\\!x)]\\bigr]}_{\\text{ambiguity}}\n", |
172 | | - "$$\n", |
| 170 | + "+\\underbrace{\\mathbb{E}_{q(x|u)}\\!\\bigl[H[q(y\\!\\mid\\!x)]\\bigr]}_{\\text{ambiguity}}$$\n", |
173 | 171 | "\n", |
174 | 172 | "This decomposition shows that the cost function is composed of two primary drivers: risk, which measures the divergence between predicted outcomes and preferred states $\\hat{p}(x)$ to keep the agent goal-oriented, and ambiguity, which calculates the expected uncertainty of future observations to encourage states with clear, informative data. By minimizing $G(u)$, the agent naturally balances these terms to produce behavior that is simultaneously goal-directed and information-seeking—offering a principled solution to the classic exploration–exploitation trade-off.\n", |
175 | 173 | "\n", |
|
197 | 195 | "\n", |
198 | 196 | "The central insight of [Paper 1 (Theorem 1)](https://arxiv.org/pdf/2504.14898#Theorem.1) is that EFE minimisation *arises naturally* from minimising a standard **Variational Free Energy (VFE)** functional if we *augment* the generative model with few prior terms:\n", |
199 | 197 | "\n", |
200 | | - "$$\n", |
201 | | - "\\mathcal{F}[q] \\;\\triangleq\\;\n", |
| 198 | + "$$\\mathcal{F}[q] \\;\\triangleq\\;\n", |
202 | 199 | "\\mathbb{E}_{q(y,x,\\theta,u)}\\!\\left[\n", |
203 | 200 | " \\log\\frac{q(y,x,\\theta,u)}{p(y,x,\\theta,u)\\;\\hat{p}(x)\\;\\tilde{p}(u)\\;\\tilde{p}(x)}\n", |
204 | | - "\\right]\n", |
205 | | - "$$\n", |
| 201 | + "\\right]$$\n", |
206 | 202 | "\n", |
207 | 203 | "The denominator is the ordinary generative model $p$ *augmented* by:\n", |
208 | 204 | "- a **preference prior** $\\hat{p}(x)$ over desired future states, and\n", |
209 | 205 | "- two **epistemic priors** $\\tilde{p}(u),\\tilde{p}(x)$ that encode ambiguity-seeking and novelty-seeking drives.\n", |
210 | 206 | "\n", |
211 | 207 | "**Theorem 1** states that with the specific choices\n", |
212 | 208 | "\n", |
213 | | - "$$\n", |
214 | | - "\\tilde{p}(u) \\;\\propto\\; \\exp\\!\\bigl(H[q(x\\!\\mid\\!u)]\\bigr) \\\\\n", |
215 | | - "\\tilde{p}(x) \\;\\propto\\; \\exp\\!\\bigl(-H[q(y\\!\\mid\\!x)]\\bigr)\n", |
216 | | - "$$\n", |
| 209 | + "$$\\tilde{p}(u) \\;\\propto\\; \\exp\\!\\bigl(H[q(x\\!\\mid\\!u)]\\bigr) \\\\\n", |
| 210 | + "\\tilde{p}(x) \\;\\propto\\; \\exp\\!\\bigl(-H[q(y\\!\\mid\\!x)]\\bigr)$$\n", |
217 | 211 | "\n", |
218 | 212 | "the VFE decomposes exactly as\n", |
219 | 213 | "\n", |
220 | | - "$$\n", |
221 | | - "\\boxed{\\mathcal{F}[q] \\;=\\; \\mathbb{E}_{q(u)}[G(u)] \\;+\\; \\underbrace{\\mathbb{E}_{q(y,x,\\theta,u)}\\!\\left[\\log\\tfrac{q(y,x,\\theta|u)}{p(y,x,\\theta|u)}\\right]}_{\\text{complexity } C(u)}\\;+\\; const.}\n", |
222 | | - "$$\n", |
| 214 | + "$$\\boxed{\\mathcal{F}[q] \\;=\\; \\mathbb{E}_{q(u)}[G(u)] \\;+\\; \\underbrace{\\mathbb{E}_{q(y,x,\\theta,u)}\\!\\left[\\log\\tfrac{q(y,x,\\theta|u)}{p(y,x,\\theta|u)}\\right]}_{\\text{complexity } C(u)}\\;+\\; const.}$$\n", |
223 | 215 | "\n", |
224 | 216 | "**What this buys us**: minimising $\\mathcal{F}[q]$ over the variational posterior $q$ simultaneously\n", |
225 | 217 | "\n", |
|
248 | 240 | "source": [ |
249 | 241 | "Theorem 1 gives the priors in terms of global quantities $H[q(x|u)]$ and $H[q(y|x)]$. To take advantage of local computations, we **factorize** the state-space model into\n", |
250 | 242 | "\n", |
251 | | - "$$\n", |
252 | | - "p(y,x,u) \\;=\\; p(x_0)\\prod_{t=1}^{T} p(y_t|x_t)\\,p(x_t|x_{t-1},u_t)\\,p(u_t)\n", |
253 | | - "$$\n", |
| 243 | + "$$p(y,x,u) \\;=\\; p(x_0)\\prod_{t=1}^{T} p(y_t|x_t)\\,p(x_t|x_{t-1},u_t)\\,p(u_t)$$\n", |
254 | 244 | "\n", |
255 | 245 | "With this factorized SSM [**Corollary 1** (Paper 2)](https://arxiv.org/pdf/2508.02197#corollary.1.1) reduces the priors to *per-timestep, local* expressions:\n", |
256 | 246 | "\n", |
257 | | - "$$\n", |
258 | | - "\\tilde{p}(u_t) \\;\\propto\\; \\exp\\!\\bigl(H[q(x_t, x_{t-1}\\!\\mid\\!u_t)] - H[q(x_{t-1}\\!\\mid\\!u_t)]\\bigr)\n", |
259 | | - "$$\n", |
| 247 | + "$$\\tilde{p}(u_t) \\;\\propto\\; \\exp\\!\\bigl(H[q(x_t, x_{t-1}\\!\\mid\\!u_t)] - H[q(x_{t-1}\\!\\mid\\!u_t)]\\bigr)$$\n", |
260 | 248 | "\n", |
261 | | - "$$\n", |
262 | | - "\\tilde{p}(x_t) \\;\\propto\\; \\exp\\!\\bigl(-H[q(y_t\\!\\mid\\!x_t)]\\bigr)\n", |
263 | | - "$$\n", |
| 249 | + "$$\\tilde{p}(x_t) \\;\\propto\\; \\exp\\!\\bigl(-H[q(y_t\\!\\mid\\!x_t)]\\bigr)$$\n", |
264 | 250 | "\n", |
265 | 251 | "These are exactly the two prior nodes we add to the factor graph:\n", |
266 | 252 | "\n", |
|
628 | 614 | "\n", |
629 | 615 | "Since the epistemic priors depend on the current posterior, inference is run iteratively as explained in [Algorithm 1 (Paper 2)](https://arxiv.org/pdf/2508.02197#algorithm.1):\n", |
630 | 616 | "\n", |
631 | | - "> **Input**: generative model $p(y,x,u)$, preference prior $\\hat{p}(x)$, $\\tau_{max}$ iterations <br>\n", |
632 | | - "> **Output**: policy posterior $q(u)$ <br>\n", |
633 | | - "> $q_0(y,x,u) ←$ uninformative <br>\n", |
634 | | - "> **for** $\\tau = 1$ … $\\tau_{max}$:<br>\n", |
635 | | - "> **for** each timestep t:<br>\n", |
636 | | - "> $p̃_τ(u_t) ← σ( H[q_{τ-1}(x_t, x_{t-1} | u_t)] − H[q_{τ-1}(x_{t-1} | u_t)] )$<br>\n", |
637 | | - "> $p̃_τ(x_t) ← σ( −H[q_{τ-1}(y_t | x_t)] )$<br>\n", |
638 | | - "> **end** <br>\n", |
639 | | - "> $q_\\tau(y,x,u) ←$ infer( $p(y,x,u)$ with updated priors )<br>\n", |
640 | | - "> **end** <br>\n", |
| 617 | + "> **Input**: generative model $p(y,x,u)$, preference prior $\\hat{p}(x)$, $\\tau_{max}$ iterations \n", |
| 618 | + "> **Output**: policy posterior $q(u)$ \n", |
| 619 | + "> $q_0(y,x,u) ←$ uninformative \n", |
| 620 | + "> **for** $\\tau = 1$ … $\\tau_{max}$: \n", |
| 621 | + "> **for** each timestep t: \n", |
| 622 | + "> $p̃_τ(u_t) ← σ( H[q_{τ-1}(x_t, x_{t-1} | u_t)] − H[q_{τ-1}(x_{t-1} | u_t)] )$ \n", |
| 623 | + "> $p̃_τ(x_t) ← σ( −H[q_{τ-1}(y_t | x_t)] )$ \n", |
| 624 | + "> **end** \n", |
| 625 | + "> $q_\\tau(y,x,u) ←$ infer( $p(y,x,u)$ with updated priors ) \n", |
| 626 | + "> **end** \n", |
641 | 627 | "> **return** $q_{τ_{max}}(u)$\n", |
642 | 628 | "\n", |
643 | 629 | "In the RxInfer implementation:\n", |
|
755 | 741 | }, |
756 | 742 | { |
757 | 743 | "cell_type": "code", |
758 | | - "execution_count": 14, |
| 744 | + "execution_count": null, |
759 | 745 | "id": "7221d3b5", |
760 | 746 | "metadata": {}, |
761 | 747 | "outputs": [], |
|
770 | 756 | "function plot_tmaze(env::TMaze)\n", |
771 | 757 | " p = Plots.plot(\n", |
772 | 758 | " aspect_ratio=:equal, legend=false, axis=false, grid=false, ticks=false,\n", |
773 | | - " background_color=MAZE_THEME.background, size=(600, 600), frame=:none, margin=0Plots.mm\n", |
| 759 | + " background_color=MAZE_THEME.background, size=(300, 300), frame=:none, margin=0Plots.mm\n", |
774 | 760 | " )\n", |
775 | 761 | " scale = 20\n", |
776 | 762 | " Plots.plot!(p, [1, 2, 2, 1], [1, 1, 4, 4], seriestype=:shape, c=MAZE_THEME.corridor, lw=0)\n", |
|
0 commit comments