Skip to content

Commit 8c8b4fd

Browse files
Tom's JUne 13 edits of likelihood ratio process lecture
1 parent 0448c02 commit 8c8b4fd

1 file changed

Lines changed: 139 additions & 7 deletions

File tree

lectures/likelihood_ratio_process.md

Lines changed: 139 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,11 @@ Among things that we'll learn are
3636
* A peculiar property of likelihood ratio processes
3737
* How a likelihood ratio process is a key ingredient in frequentist hypothesis testing
3838
* How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing
39+
* How a Bayesian statistician combines frequentist probabilities of type I and type II errors to form posterior probabilities of erroneous model selection or missclassification of individuals
3940
* How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged, a topic to be studied in {doc}`this lecture <wald_friedman>`
4041

4142

43+
4244
Let's start by importing some Python tools.
4345

4446

@@ -115,7 +117,7 @@ Pearson {cite}`Neyman_Pearson`.
115117
To help us appreciate how things work, the following Python code evaluates $f$ and $g$ as two different
116118
beta distributions, then computes and simulates an associated likelihood
117119
ratio process by generating a sequence $w^t$ from one of the two
118-
probability distributionss, for example, a sequence of IID draws from $g$.
120+
probability distributions, for example, a sequence of IID draws from $g$.
119121

120122
```{code-cell} python3
121123
# Parameters in the two beta distributions.
@@ -330,10 +332,15 @@ We specify
330332
Neyman and Pearson proved that the best way to test this hypothesis is to use a **likelihood ratio test** that takes the
331333
form:
332334

335+
- accept $H_0$ if $L(W^t) > c$,
333336
- reject $H_0$ if $L(W^t) < c$,
334-
- accept $H_0$ otherwise.
335337

336-
where $c$ is a given discrimination threshold, to be chosen in a way we'll soon describe.
338+
339+
where $c$ is a given discrimination threshold.
340+
341+
Setting $c =1$ is a common choice.
342+
343+
We'll discuss consequences of other choices of $c$ below.
337344

338345
This test is *best* in the sense that it is **uniformly most powerful**.
339346

@@ -343,20 +350,37 @@ threshold $c$.
343350

344351
The two probabilities are:
345352

346-
- Probability of detection (= power = 1 minus probability
347-
of Type II error):
348353

354+
- Probability of a Type I error in which we reject $H_0$ when it is true:
355+
349356
$$
350-
1-\beta \equiv \Pr\left\{ L\left(w^{t}\right)<c\mid q=g\right\}
357+
\alpha \equiv \Pr\left\{ L\left(w^{t}\right)<c\mid q=f\right\}
358+
$$
359+
360+
- Probability of a Type II error in which we accept $H_0$ when it is false:
361+
362+
$$
363+
\beta \equiv \Pr\left\{ L\left(w^{t}\right)>c\mid q=g\right\}
351364
$$
352365

366+
These two probabilities underly the following two concepts:
367+
368+
353369
- Probability of false alarm (= significance level = probability of
354370
Type I error):
355371

356372
$$
357373
\alpha \equiv \Pr\left\{ L\left(w^{t}\right)<c\mid q=f\right\}
358374
$$
359375

376+
- Probability of detection (= power = 1 minus probability
377+
of Type II error):
378+
379+
$$
380+
1-\beta \equiv \Pr\left\{ L\left(w^{t}\right)<c\mid q=g\right\}
381+
$$
382+
383+
360384

361385
The [Neyman-Pearson
362386
Lemma](https://en.wikipedia.org/wiki/Neyman–Pearson_lemma)
@@ -415,7 +439,7 @@ $q=f$ from $q=g$.
415439

416440
```{code-cell} python3
417441
fig, axs = plt.subplots(2, 2, figsize=(12, 8))
418-
fig.suptitle('distribution of $log(L(w^t))$ under f or g', fontsize=15)
442+
fig.suptitle('distribution of $log(L(w^t))$ under f or under g', fontsize=15)
419443
420444
for i, t in enumerate([1, 7, 14, 21]):
421445
nr = i // 2
@@ -439,6 +463,12 @@ for i, t in enumerate([1, 7, 14, 21]):
439463
plt.show()
440464
```
441465

466+
In the above graphs,
467+
* the blue areas are related to but not equal to probabilities $\alpha $ of a type I error because
468+
they are integrals of $\log L_t$, not integrals of $L_t$, over rejection region $L_t < 1$
469+
* the orange areas are related to but not equal to probabilities $\beta $ of a type II error because
470+
they are integrals of $\log L_t$, not integrals of $L_t$, over acceptance region $L_t > 1$
471+
442472

443473
When we hold $c$ fixed at $c=1$, the following graph shows that
444474
* the probability of detection monotonically increases with increases in
@@ -686,10 +716,112 @@ N, T = l_arr_h.shape
686716
plt.plot(range(T), np.sum(l_seq_h > 10000, axis=0) / N)
687717
```
688718

719+
720+
721+
## Bayesian Classification and Hypothesis Testing
722+
723+
We now describe how a Bayesian statistician can combine frequentist probabilities of type I and type II errors in order to
724+
725+
* compute a posterior probability of selecting a wrong model
726+
* compute an anticipated error rate in a classification problem
727+
728+
We consider a situation in which nature generates data by mixing known densities $f$ and $g$ with known mixing
729+
parameter $\pi_{-1} \in (0,1)$ so that the random variable $w$ is drawn from the density
730+
731+
$$
732+
h (w) = \pi_{-1} f(w) + (1-\pi_{-1}) g(w)
733+
$$
734+
735+
We'll often set $\pi_{-1} = .5$.
736+
737+
We assume that $f$ and $g$ both put positive probabilities on the same intervals of possible realizations of the random variable $W$.
738+
739+
740+
We consider two alternative timing protocols.
741+
742+
**Protocol 1:** Nature flips a coin once at time $t=-1$ and with probability $\pi_{-1}$ generates a sequence $\{w_t\}_{t=1}^T$
743+
of IID draws from $f$ and with probability $1-\pi_{-1}$ generates a sequence $\{w_t\}_{t=1}^T$
744+
of IID draws from $g$.
745+
746+
**Protocol 2.** At each time $t \geq 0$, nature flips a coin and with probability $\pi_{-1}$ draws $w_t$ from $f$ and with probability $1-\pi_{-1}$ draws $w_t$ from $g$.
747+
748+
**Remark:** Under protocol 2, the $\{w_t\}_{t=1}^T$ is a sequence of IID draws from $h(w)$. Under protocol 1, the the $\{w_t\}_{t=1}^T$ is
749+
not IID. It is **conditionally IID** -- meaning that with probability $\pi_{-1}$ it is a sequence of IID draws from $f(w)$ and with probability $1-\pi_{-1}$ it is a sequence of IID draws from $g(w)$. For more about this, see {doc}`this lecture about exchangeability <exchangeable>`.
750+
751+
We again deploy a **likelihood ratio process** with time $t$ component being the likelihood ratio
752+
753+
$$
754+
\ell (w_t)=\frac{f\left(w_t\right)}{g\left(w_t\right)},\quad t\geq1.
755+
$$
756+
757+
The **likelihood ratio process** for sequence $\left\{ w_{t}\right\} _{t=1}^{\infty}$ is
758+
759+
$$
760+
L\left(w^{t}\right)=\prod_{i=1}^{t} \ell (w_i),
761+
$$
762+
763+
For shorthand we'll write $L_t = L(w^t)$.
764+
765+
## Bayesian Model Selection
766+
767+
We first study a problem that assumes timing protocol 1.
768+
769+
Consider a decision maker who wants to know whether model $f$ or model $g$ governs the data.
770+
771+
The decision makers has observed a sequence $\{w_t\}_{t=1}^T$.
772+
773+
On the basis of that observed sequence, a likelihood ratio test selects model $f$ when
774+
$L_T > 1 $ and model $g$ when $L_T < 1$.
775+
776+
When model $f$ generates the data, the probability that the likelihood ratio test selects the wrong model is
777+
778+
$$
779+
p_f = {\rm Prob}\left(L_T < 1\Big| f\right) = \alpha_T .
780+
$$
781+
782+
When model $g$ generates the data, the probability that the likelihood ratio test selects the wrong model is
783+
784+
$$
785+
p_g = {\rm Prob}\left(L_T >1 \Big|g \right) = \beta_T.
786+
$$
787+
788+
We can form a Bayesian prior probability that the likelihood ratio selects the wrong model by assigning a prior probability of $\pi_{-1} = .5$ that it selects the wrong model and then averaging $p_f$ and $p_g$ to form the Bayesian posterior probability of a detection error equal to
789+
790+
$$
791+
p(\textrm{wrong decision}) = {1 \over 2} (\alpha_T + \beta_T) .
792+
$$ (eq:detectionerrorprob)
793+
794+
## Classification
795+
796+
We now consider a problem that assumes timing protocol 2.
797+
798+
A decision maker wants to classify components of an observed sequence $\{w_t\}_{t=1}^T$ as having been drawn from either $f$ or $g$.
799+
800+
The decision maker uses the following classification rule:
801+
802+
$$
803+
\begin{align*}
804+
w_t & \ {\rm is \ from \ f \ if \ } l_t > 1 \\
805+
w_t & \ {\rm is \ from \ g \ if \ } l_t < 1 .
806+
\end{align*}
807+
$$
808+
809+
Under this rule, the expected misclassification rate is
810+
811+
$$
812+
p(\textrm{misclassification}) = {1 \over 2} (\alpha_1 + \beta_1)
813+
$$ (eq:classerrorprob)
814+
815+
816+
689817
## Sequels
690818
691819
Likelihood processes play an important role in Bayesian learning, as described in {doc}`this lecture <likelihood_bayes>`
692820
and as applied in {doc}`this lecture <odu>`.
693821
694822
Likelihood ratio processes appear again in [this lecture](https://python-advanced.quantecon.org/additive_functionals.html), which contains another illustration
695823
of the **peculiar property** of likelihood ratio processes described above.
824+
825+
826+
827+

0 commit comments

Comments
 (0)