Skip to content

Commit ce7ea9e

Browse files
committed
Add standard deviation formula and enhance PMF/CDF visualizations
- Add SD formula to Discrete Uniform distribution - Add mean line (red dashed) to visualizations - Add mean ± 1 SD shaded region to visualizations - Completes systematic update of all discrete distributions
1 parent 0139eda commit ce7ea9e

1 file changed

Lines changed: 44 additions & 8 deletions

File tree

chapter_07.md

Lines changed: 44 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1737,6 +1737,8 @@ The formula is essentially: **(favorable outcomes) / (total possible outcomes)**
17371737

17381738
**Variance:** $Var(X) = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \left(\frac{N-n}{N-1}\right)$
17391739

1740+
**Standard Deviation:** $SD(X) = \sqrt{n \frac{K}{N} \left(1 - \frac{K}{N}\right) \left(\frac{N-n}{N-1}\right)}$
1741+
17401742
The term $\frac{N-n}{N-1}$ is the *finite population correction factor*. As $N \to \infty$, this approaches 1, and Hypergeometric → Binomial with $p = K/N$.
17411743

17421744
**Visualizing the Distribution**
@@ -1752,45 +1754,59 @@ K_viz = 4
17521754
n_viz = 5
17531755
hypergeom_viz = stats.hypergeom(M=N_viz, n=K_viz, N=n_viz)
17541756
1757+
# Calculate mean and std
1758+
mean_viz = hypergeom_viz.mean()
1759+
std_viz = hypergeom_viz.std()
1760+
17551761
# Plotting the PMF
17561762
k_values_viz = np.arange(0, min(n_viz, K_viz) + 1)
17571763
pmf_values_viz = hypergeom_viz.pmf(k_values_viz)
17581764
1759-
plt.figure(figsize=(8, 4))
1765+
plt.figure(figsize=(10, 5))
17601766
plt.bar(k_values_viz, pmf_values_viz, color='skyblue', edgecolor='black', alpha=0.7)
1767+
1768+
# Add mean line
1769+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
1770+
17611771
plt.title(f"Hypergeometric PMF (N={N_viz}, K={K_viz}, n={n_viz})")
17621772
plt.xlabel("Number of Successes in Sample (k)")
17631773
plt.ylabel("Probability P(X=k)")
17641774
plt.xticks(k_values_viz)
1775+
plt.legend(loc='upper right', fontsize=10)
17651776
plt.grid(axis='y', linestyle='--', alpha=0.6)
17661777
plt.savefig('ch07_hypergeometric_pmf_generic.svg', format='svg', bbox_inches='tight')
17671778
plt.show()
17681779
```
17691780

17701781
![Hypergeometric PMF](ch07_hypergeometric_pmf_generic.svg)
17711782

1772-
The PMF shows most likely to get 0 Aces (about 0.66 probability), less likely to get 1 or 2.
1783+
The PMF shows most likely to get 0 Aces (about 0.66 probability), less likely to get 1 or 2. The red dashed line marks the mean.
17731784

17741785
```{code-cell} ipython3
17751786
:tags: [remove-input, remove-output]
17761787
17771788
# Plotting the CDF
17781789
cdf_values_viz = hypergeom_viz.cdf(k_values_viz)
17791790
1780-
plt.figure(figsize=(8, 4))
1791+
plt.figure(figsize=(10, 5))
17811792
plt.step(k_values_viz, cdf_values_viz, where='post', color='darkgreen', linewidth=2)
1793+
1794+
# Add mean line
1795+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
1796+
17821797
plt.title(f"Hypergeometric CDF (N={N_viz}, K={K_viz}, n={n_viz})")
17831798
plt.xlabel("Number of Successes in Sample (k)")
17841799
plt.ylabel("Cumulative Probability P(X <= k)")
17851800
plt.xticks(k_values_viz)
1801+
plt.legend(loc='lower right', fontsize=10)
17861802
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
17871803
plt.savefig('ch07_hypergeometric_cdf_generic.svg', format='svg', bbox_inches='tight')
17881804
plt.show()
17891805
```
17901806

17911807
![Hypergeometric CDF](ch07_hypergeometric_cdf_generic.svg)
17921808

1793-
The CDF shows P(X ≤ k), useful for questions like "What's the probability of getting at most 1 Ace?"
1809+
The CDF shows P(X ≤ k), useful for questions like "What's the probability of getting at most 1 Ace?" The red dashed line marks the mean.
17941810

17951811
:::{admonition} Example: Lottery Tickets with N=100, K=20, n=10
17961812
:class: tip
@@ -1996,6 +2012,8 @@ This directly implements the classical definition of probability: **(favorable o
19962012

19972013
**Variance:** $Var(X) = \frac{(b-a+1)^2 - 1}{12}$
19982014

2015+
**Standard Deviation:** $SD(X) = \sqrt{\frac{(b-a+1)^2 - 1}{12}}$
2016+
19992017
**Relationship to Other Distributions:** The Discrete Uniform distribution is a special case of the **Categorical distribution** where all $k$ categories have equal probability $p_i = 1/k$. If outcomes aren't equally likely, use Categorical instead.
20002018

20012019
**Visualizing the Distribution**
@@ -2012,47 +2030,65 @@ from scipy.stats import randint
20122030
# scipy.stats.randint uses [low, high) so we add 1 to b
20132031
uniform_viz = randint(low=a_viz, high=b_viz+1)
20142032
2033+
# Calculate mean and std
2034+
mean_viz = uniform_viz.mean()
2035+
std_viz = uniform_viz.std()
2036+
20152037
# Plotting the PMF
20162038
k_values_viz = np.arange(a_viz, b_viz+1)
20172039
pmf_values_viz = uniform_viz.pmf(k_values_viz)
20182040
2019-
plt.figure(figsize=(8, 4))
2041+
plt.figure(figsize=(10, 5))
20202042
plt.bar(k_values_viz, pmf_values_viz, color='skyblue', edgecolor='black', alpha=0.7)
2043+
2044+
# Add mean line
2045+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
2046+
2047+
# Add mean ± 1 std region
2048+
plt.axvspan(mean_viz - std_viz, mean_viz + std_viz, alpha=0.2, color='orange',
2049+
label=f'Mean ± 1 SD = [{mean_viz - std_viz:.2f}, {mean_viz + std_viz:.2f}]')
2050+
20212051
plt.title(f"Discrete Uniform PMF (a={a_viz}, b={b_viz})")
20222052
plt.xlabel("Outcome")
20232053
plt.ylabel("Probability")
20242054
plt.ylim(0, 0.25)
20252055
plt.xticks(k_values_viz)
2056+
plt.legend(loc='upper right', fontsize=10)
20262057
plt.grid(axis='y', linestyle='--', alpha=0.6)
20272058
plt.savefig('ch07_discrete_uniform_pmf.svg', format='svg', bbox_inches='tight')
20282059
plt.show()
20292060
```
20302061

20312062
![Discrete Uniform PMF](ch07_discrete_uniform_pmf.svg)
20322063

2033-
The PMF shows six equal bars, each with probability 1/6, representing the fair die.
2064+
The PMF shows six equal bars, each with probability 1/6, representing the fair die. The shaded region shows mean ± 1 standard deviation.
20342065

20352066
```{code-cell} ipython3
20362067
:tags: [remove-input, remove-output]
20372068
20382069
# Plotting the CDF
20392070
cdf_values_viz = uniform_viz.cdf(k_values_viz)
20402071
2041-
plt.figure(figsize=(8, 4))
2072+
plt.figure(figsize=(10, 5))
20422073
plt.step(k_values_viz, cdf_values_viz, where='post', color='darkgreen', linewidth=2)
2074+
2075+
# Add mean line
2076+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
2077+
20432078
plt.title(f"Discrete Uniform CDF (a={a_viz}, b={b_viz})")
20442079
plt.xlabel("Outcome")
20452080
plt.ylabel("Cumulative Probability P(X <= k)")
20462081
plt.ylim(0, 1.1)
20472082
plt.xticks(k_values_viz)
2083+
plt.legend(loc='lower right', fontsize=10)
20482084
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
20492085
plt.savefig('ch07_discrete_uniform_cdf.svg', format='svg', bbox_inches='tight')
20502086
plt.show()
20512087
```
20522088

20532089
![Discrete Uniform CDF](ch07_discrete_uniform_cdf.svg)
20542090

2055-
The CDF increases in equal steps of 1/6 at each value, reaching 1.0 at the maximum value.
2091+
The CDF increases in equal steps of 1/6 at each value, reaching 1.0 at the maximum value. The red dashed line marks the mean.
20562092

20572093
:::{admonition} Example: Random Selection from 1 to 20
20582094
:class: tip

0 commit comments

Comments
 (0)