Skip to content

Commit 368dd7a

Browse files
committed
updates
1 parent a862bc6 commit 368dd7a

1 file changed

Lines changed: 26 additions & 18 deletions

File tree

lectures/likelihood_ratio_process.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -121,12 +121,12 @@ inferences using a classic frequentist approach due to Neyman and
121121
Pearson {cite}`Neyman_Pearson`.
122122

123123
To help us appreciate how things work, the following Python code evaluates $f$ and $g$ as two different
124-
beta distributions, then computes and simulates an associated likelihood
124+
Beta distributions, then computes and simulates an associated likelihood
125125
ratio process by generating a sequence $w^t$ from one of the two
126126
probability distributions, for example, a sequence of IID draws from $g$.
127127

128128
```{code-cell} ipython3
129-
# Parameters in the two beta distributions.
129+
# Parameters in the two Beta distributions.
130130
F_a, F_b = 1, 1
131131
G_a, G_b = 3, 1.2
132132
@@ -508,7 +508,7 @@ If for a fixed $t$ we now free up and move $c$, we will sweep out the probabilit
508508
of detection as a function of the probability of false alarm.
509509

510510
This produces a [receiver operating characteristic
511-
curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
511+
curve (ROC curve)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
512512

513513
Below, we plot receiver operating characteristic curves for different
514514
sample sizes $t$.
@@ -679,6 +679,8 @@ In the simulation, we generate multiple paths using Beta distributions $f$, $g$,
679679
We consider three cases: (1) $h$ is closer to $f$, (2) $f$ and $g$ are approximately equidistant from $h$, and (3) $h$ is closer to $g$.
680680
681681
```{code-cell} ipython3
682+
:tags: [hide-input]
683+
682684
# Define test scenarios
683685
scenarios = [
684686
{
@@ -701,8 +703,9 @@ scenarios = [
701703
fig, axes = plt.subplots(2, 3, figsize=(15, 12))
702704
703705
for i, scenario in enumerate(scenarios):
704-
# Define distributions
705-
h = lambda x: p(x, scenario["h_params"][0], scenario["h_params"][1])
706+
# Define h
707+
h = lambda x: p(x, scenario["h_params"][0],
708+
scenario["h_params"][1])
706709
707710
# Compute KL divergences
708711
Kf, Kg = compute_KL(h, f, g)
@@ -711,17 +714,23 @@ for i, scenario in enumerate(scenarios):
711714
# Simulate paths
712715
N_paths = 100
713716
T = 150
714-
h_data = np.random.beta(scenario["h_params"][0], scenario["h_params"][1], (N_paths, T))
717+
718+
# Generate data from h
719+
h_data = np.random.beta(scenario["h_params"][0],
720+
scenario["h_params"][1], (N_paths, T))
715721
l_ratios = f(h_data) / g(h_data)
716722
l_cumulative = np.cumprod(l_ratios, axis=1)
717723
log_l_cumulative = np.log(l_cumulative)
718724
719725
# Plot distributions
720726
ax = axes[0, i]
721727
x_range = np.linspace(0.001, 0.999, 200)
722-
ax.plot(x_range, [f(x) for x in x_range], 'b-', label='f', linewidth=2)
723-
ax.plot(x_range, [g(x) for x in x_range], 'r-', label='g', linewidth=2)
724-
ax.plot(x_range, [h(x) for x in x_range], 'g--', label='h (data)', linewidth=2)
728+
ax.plot(x_range, [f(x) for x in x_range],
729+
'b-', label='f', linewidth=2)
730+
ax.plot(x_range, [g(x) for x in x_range],
731+
'r-', label='g', linewidth=2)
732+
ax.plot(x_range, [h(x) for x in x_range],
733+
'g--', label='h (data)', linewidth=2)
725734
ax.set_xlabel('w')
726735
ax.set_ylabel('density')
727736
ax.set_title(scenario["name"], fontsize=16)
@@ -734,7 +743,7 @@ for i, scenario in enumerate(scenarios):
734743
735744
# Plot theoretical expectation
736745
theory_line = kl_diff * np.arange(1, T+1)
737-
ax.plot(theory_line, 'k--', linewidth=2, label=f'Theory: {kl_diff:.3f}×t')
746+
ax.plot(theory_line, 'k--', linewidth=2, label=r'$t \times (K_g - K_f)$')
738747
739748
ax.set_xlabel('t')
740749
ax.set_ylabel('$log L_t$')
@@ -751,7 +760,7 @@ Note that
751760
- In the first figure, $\log L(w^t)$ diverges to $\infty$ because $K_g > K_f$.
752761
- In the second figure, we still have $K_g > K_f$, but the difference is smaller, so $L(w^t)$ diverges to infinity at a slower pace.
753762
- In the last figure, $\log L(w^t)$ diverges to $-\infty$ because $K_g < K_f$.
754-
- The black dotted line, $t \cdot \left(KL(h,g) - KL(h, f)\right)$, closely fits the paths verifying {eq}`eq:kl_likelihood_link`.
763+
- The black dotted line, $t \left(KL(h,g) - KL(h, f)\right)$, closely fits the paths verifying {eq}`eq:kl_likelihood_link`.
755764
756765
These observations align with the theory.
757766
@@ -781,8 +790,7 @@ We assume that $f$ and $g$ both put positive probabilities on the same intervals
781790
782791
783792
784-
In the simulations below, we specify that $f$ is a $\text{Beta}(1, 1)$ distribution and that $g$ is $\text{Beta}(3, 1.2)$ distribution,
785-
just as we did often earlier in this lecture.
793+
In the simulations below, we specify that $f$ is a $\text{Beta}(1, 1)$ distribution and that $g$ is $\text{Beta}(3, 1.2)$ distribution.
786794
787795
We consider two alternative timing protocols.
788796
@@ -1028,7 +1036,7 @@ plt.tight_layout()
10281036
plt.show()
10291037
```
10301038
1031-
To the left of the green vertical line $f < g $, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual.
1039+
To the left of the green vertical line $g < f$, so $l_t < 1$; therefore a $w_t$ that falls to the left of the green line is classified as a type $g$ individual.
10321040
10331041
* The shaded orange area equals $\beta$ -- the probability of classifying someone as a type $g$ individual when it is really a type $f$ individual.
10341042
@@ -1226,7 +1234,7 @@ Because in general $KL(f, g) \neq KL(g, f)$, KL divergence is not symmetric, but
12261234
As {eq}`eq:js_divergence` shows, the Jensen-Shannon divergence computes average of the KL divergence of $f$ and $g$ with respect to a particular reference distribution $m$ defined below the equation.
12271235
```
12281236
1229-
Now let's create a comparison table showing KL divergence, Jensen-Shannon divergence, and Chernoff entropy
1237+
Now let's create a comparison table showing KL divergence, Jensen-Shannon divergence, and Chernoff entropy for a set of pairs of Beta distributions.
12301238
12311239
```{code-cell} ipython3
12321240
def js_divergence(f, g):
@@ -1973,7 +1981,7 @@ print(f"KL divergences: \nKL(f,g)={Kf_g:.3f}, KL(g,f)={Kg_f:.3f}")
19731981
print(f"KL(h,f)={Kf_h:.3f}, KL(h,g)={Kg_h:.3f}")
19741982
```
19751983

1976-
We find that $KL(f,g) > KL(g,f)$ and $KL(g,h) > KL(f,h)$.
1984+
We find that $KL(f,g) > KL(g,f)$ and $KL(h,g) > KL(h,f)$.
19771985

19781986
The first inequality tells us that the average "surprise" or "inefficiency" of using belief $g$ when nature chooses $f$ is greater than the "surprise" of using belief $f$ when nature chooses $g$.
19791987

@@ -2110,9 +2118,9 @@ print(f"KL(h,f)={Kf_h:.3f}, KL(h,g)={Kg_h:.3f}")
21102118

21112119
We find that in the first case, $KL(f,g) \approx KL(g,f)$ and both are relatively small, so although either agent 1 or agent 2 will eventually consume everything, convergence displaying in first two panels on the top is pretty slowly.
21122120

2113-
In the first two panels at the bottom, we see convergence occurring faster because the divergence gap $KL(f, g) > KL(g, f)$ is larger (as indicated by the black dashed line).
2121+
In the first two panels at the bottom, we see convergence occurring faster (as indicated by the black dashed line) because the divergence gaps $KL(f, g)$ and $KL(g, f)$ are larger.
21142122

2115-
We see faster convergence in the first panel at the bottom when nature chooses $f$ than in the second panel where nature chooses $g$.
2123+
Since $KL(f,g) > KL(g,f)$, we see faster convergence in the first panel at the bottom when nature chooses $f$ than in the second panel where nature chooses $g$.
21162124

21172125
This ties in nicely with {eq}`eq:kl_likelihood_link`.
21182126

0 commit comments

Comments
 (0)