docs: clarify IV example labels (#499) (#907)

anevolbap · drbenvincent · web-flow · commit d56b7819cad7 · 2026-05-16T08:31:19.000+01:00
The third panel showed the posterior of chol_cov_corr[0, 1], a model-level
residual correlation, but was labelled 'Correlation between Outcome and
Treatment', which a reader will read as raw Y vs raw T. Rename the title
and axis to 'Modelled correlation between outcome Y and instrumented
treatment X-hat' / 'Posterior correlation', add a footnote naming the
parameter, and rewrite the section intro so the distinction with the raw
Y ~ X panel is explicit. Also fix the OlS -&gt; OLS typo in the legend.

Co-authored-by: Benjamin T. Vincent &lt;inferencelab@gmail.com&gt;
diff --git a/docs/source/notebooks/iv_pymc.ipynb b/docs/source/notebooks/iv_pymc.ipynb
@@ -757,9 +757,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Multivariate Outcomes and Measures of Correlation\n",
+    "### Multivariate outcomes and modelled correlation\n",
     "\n",
-    "As we stated above, one of the benefits of the Bayesian approach is that we directly measure the bivariate relationship between the instrument and the treatment. We can see (in two dimensions) a representation of how the difference in the estimated treatment coefficients skews the expected outcomes. "
+    "One benefit of the Bayesian formulation is that we directly estimate the correlation between the outcome and the modelled (instrumented) treatment. A non-zero posterior for this correlation is the formal signal of endogeneity that motivates IV in the first place: unobserved factors driving the treatment also drive the outcome, so naive OLS is biased. The figure below puts that modelled correlation next to the OLS vs IV fits taken on the raw treatment scale, which is why the signs in the two panels need not agree: they describe different objects (raw `X` vs modelled `X̂`).\n"
    ]
   },
   {
@@ -921,8 +921,8 @@
     "n_samples = min(500, len(uncertainty))\n",
     "uncertainty.sample(n_samples).T.plot(legend=False, color=\"orange\", alpha=0.4, ax=axs[1])\n",
     "axs[1].plot(x, ols, color=\"black\", label=\"OLS fit\")\n",
-    "axs[1].set_title(\"OLS versus Instrumental Regression Fits\", fontsize=20)\n",
-    "axs[1].legend(custom_lines, [\"IV fits\", \"OlS fit\"])\n",
+    "axs[1].set_title(\"OLS vs IV regression fits (Y on raw X)\", fontsize=20)\n",
+    "axs[1].legend(custom_lines, [\"IV fits\", \"OLS fit\"])\n",
     "axs[1].set_xlabel(\"Treatment Scale/ Risk\")\n",
     "axs[1].set_ylabel(\"Outcome Scale/ Log GDP\")\n",
     "\n",
@@ -931,9 +931,19 @@
     ")\n",
     "\n",
     "corr = az.extract(data=iv.model.idata, var_names=[\"chol_cov_corr\"])[0, 1, :]\n",
-    "axs[2].hist(corr, bins=30, ec=\"black\", color=\"C2\", label=\"correlation\")\n",
-    "axs[2].set_xlabel(\"Correlation Measure\")\n",
-    "axs[2].set_title(\"Correlation between \\n Outcome and Treatment\", fontsize=20);"
+    "axs[2].hist(corr, bins=30, ec=\"black\", color=\"C2\", label=\"posterior\")\n",
+    "axs[2].set_xlabel(\"Posterior correlation\")\n",
+    "axs[2].set_title(\n",
+    "    \"Modelled correlation between \\n outcome Y and instrumented treatment X̂\",\n",
+    "    fontsize=20,\n",
+    ");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The right panel is the posterior of `chol_cov_corr[0, 1]`, the residual correlation in the bivariate normal likelihood that links the outcome and treatment equations.\n"
    ]
   },
   {