Skip to content

Commit 0139eda

Browse files
committed
Add SD formulas and enhanced visualizations to multiple distributions
Completed for: - Bernoulli: Add SD formula and mean line to PMF/CDF - Geometric: Add SD formula, mean line, and mean±1SD region - Negative Binomial: Add SD formula, mean line, and mean±1SD region - Poisson: Add SD formula, mean line, and mean±1SD region All visualizations now show: - Red dashed line marking the mean - Orange shaded region showing mean ± 1 standard deviation (where applicable) - Legends with calculated values - Increased figure sizes (10x5) for better readability Still to do: Hypergeometric, Discrete Uniform, Categorical
1 parent a166823 commit 0139eda

1 file changed

Lines changed: 93 additions & 17 deletions

File tree

chapter_07.md

Lines changed: 93 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@ Let's verify this works for our example where $p = 0.3$:
120120

121121
**Variance:** $Var(X) = p(1-p)$
122122

123+
**Standard Deviation:** $SD(X) = \sqrt{p(1-p)}$
124+
123125
**Visualizing the Distribution**
124126

125127
Let's visualize a Bernoulli distribution with $p = 0.3$ (our medical test example from above):
@@ -131,24 +133,33 @@ Let's visualize a Bernoulli distribution with $p = 0.3$ (our medical test exampl
131133
p_viz = 0.3
132134
bernoulli_viz = stats.bernoulli(p=p_viz)
133135
136+
# Calculate mean and std
137+
mean_viz = bernoulli_viz.mean()
138+
std_viz = bernoulli_viz.std()
139+
134140
# Plotting the PMF
135141
k_values_viz = [0, 1]
136142
pmf_values_viz = bernoulli_viz.pmf(k_values_viz)
137143
138-
plt.figure(figsize=(8, 4))
144+
plt.figure(figsize=(10, 5))
139145
plt.bar(k_values_viz, pmf_values_viz, tick_label=["Failure (0)", "Success (1)"], color='skyblue', edgecolor='black', alpha=0.7)
146+
147+
# Add mean line
148+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
149+
140150
plt.title(f"Bernoulli PMF (p={p_viz})")
141151
plt.xlabel("Outcome")
142152
plt.ylabel("Probability")
143153
plt.ylim(0, 1)
154+
plt.legend(loc='upper right', fontsize=10)
144155
plt.grid(axis='y', linestyle='--', alpha=0.6)
145156
plt.savefig('ch07_bernoulli_pmf_generic.svg', format='svg', bbox_inches='tight')
146157
plt.show()
147158
```
148159

149160
![Bernoulli PMF](ch07_bernoulli_pmf_generic.svg)
150161

151-
The PMF shows two bars: P(X=0) = 0.7 for a negative test and P(X=1) = 0.3 for a positive test.
162+
The PMF shows two bars: P(X=0) = 0.7 for a negative test and P(X=1) = 0.3 for a positive test. The red dashed line marks the mean ($p = 0.3$).
152163

153164
```{code-cell} ipython3
154165
:tags: [remove-input, remove-output]
@@ -157,23 +168,28 @@ The PMF shows two bars: P(X=0) = 0.7 for a negative test and P(X=1) = 0.3 for a
157168
k_values_viz = [0, 1]
158169
cdf_values_viz = bernoulli_viz.cdf(k_values_viz)
159170
160-
plt.figure(figsize=(8, 4))
171+
plt.figure(figsize=(10, 5))
161172
# Add points to show the full step function including the start at 0
162173
plt.step([-0.5] + k_values_viz, [0] + list(cdf_values_viz), where='post', color='darkgreen', linewidth=2)
174+
175+
# Add mean line
176+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
177+
163178
plt.title(f"Bernoulli CDF (p={p_viz})")
164179
plt.xlabel("Outcome")
165180
plt.ylabel("Cumulative Probability P(X <= k)")
166181
plt.ylim(0, 1.1)
167182
plt.xlim(-0.5, 1.5)
168183
plt.xticks([0, 1])
184+
plt.legend(loc='lower right', fontsize=10)
169185
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
170186
plt.savefig('ch07_bernoulli_cdf_generic.svg', format='svg', bbox_inches='tight')
171187
plt.show()
172188
```
173189

174190
![Bernoulli CDF](ch07_bernoulli_cdf_generic.svg)
175191

176-
The CDF shows the step function: starts at 0 for x < 0, jumps to 0.7 at x=0 (the value when outcome is 0), stays flat at 0.7 until x=1, then jumps to 1.0 at x=1 (the value when including both outcomes 0 and 1).
192+
The CDF shows the step function: starts at 0 for x < 0, jumps to 0.7 at x=0 (the value when outcome is 0), stays flat at 0.7 until x=1, then jumps to 1.0 at x=1 (the value when including both outcomes 0 and 1). The red dashed line marks the mean.
177193

178194
Note: Here, P(X ≤ 0) = P(X = 0) = 0.7 because X can't take negative values; in general, "X ≤ 0" means "at or below 0", not "exactly 0".
179195

@@ -865,6 +881,8 @@ This is why the formula captures "trials until first success" - it requires all
865881

866882
**Variance:** $Var(X) = \frac{1-p}{p^2}$
867883

884+
**Standard Deviation:** $SD(X) = \frac{\sqrt{1-p}}{p}$
885+
868886
**Relationship to Other Distributions:** The Geometric distribution is built from independent **Bernoulli trials** and is a special case of the **Negative Binomial distribution** with $r=1$ (waiting for just one success instead of $r$ successes).
869887

870888
:::{admonition} Note
@@ -884,45 +902,63 @@ Let's visualize a Geometric distribution with $p = 0.4$ (our free throw example)
884902
p_viz = 0.4
885903
geom_viz = stats.geom(p=p_viz)
886904
905+
# Calculate mean and std (adjusted for trial number definition)
906+
mean_viz = 1 / p_viz
907+
std_viz = np.sqrt((1 - p_viz) / p_viz**2)
908+
887909
# Plotting the PMF
888910
k_values_viz = np.arange(1, 11)
889911
pmf_values_viz = geom_viz.pmf(k_values_viz - 1)
890912
891-
plt.figure(figsize=(8, 4))
913+
plt.figure(figsize=(10, 5))
892914
plt.bar(k_values_viz, pmf_values_viz, color='skyblue', edgecolor='black', alpha=0.7)
915+
916+
# Add mean line
917+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
918+
919+
# Add mean ± 1 std region
920+
plt.axvspan(mean_viz - std_viz, mean_viz + std_viz, alpha=0.2, color='orange',
921+
label=f'Mean ± 1 SD = [{mean_viz - std_viz:.1f}, {mean_viz + std_viz:.1f}]')
922+
893923
plt.title(f"Geometric PMF (p={p_viz})")
894924
plt.xlabel("Trial Number (k)")
895925
plt.ylabel("Probability P(X=k)")
896926
plt.xticks(k_values_viz)
927+
plt.legend(loc='upper right', fontsize=10)
897928
plt.grid(axis='y', linestyle='--', alpha=0.6)
898929
plt.savefig('ch07_geometric_pmf_generic.svg', format='svg', bbox_inches='tight')
899930
plt.show()
900931
```
901932

902933
![Geometric PMF](ch07_geometric_pmf_generic.svg)
903934

904-
The PMF shows exponentially decreasing probabilities - you're most likely to succeed on the first few trials.
935+
The PMF shows exponentially decreasing probabilities - you're most likely to succeed on the first few trials. The shaded region shows mean ± 1 standard deviation.
905936

906937
```{code-cell} ipython3
907938
:tags: [remove-input, remove-output]
908939
909940
# Plotting the CDF
910941
cdf_values_viz = geom_viz.cdf(k_values_viz - 1)
911942
912-
plt.figure(figsize=(8, 4))
943+
plt.figure(figsize=(10, 5))
913944
plt.step(k_values_viz, cdf_values_viz, where='post', color='darkgreen', linewidth=2)
945+
946+
# Add mean line
947+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.2f}')
948+
914949
plt.title(f"Geometric CDF (p={p_viz})")
915950
plt.xlabel("Trial Number (k)")
916951
plt.ylabel("Cumulative Probability P(X <= k)")
917952
plt.xticks(k_values_viz)
953+
plt.legend(loc='lower right', fontsize=10)
918954
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
919955
plt.savefig('ch07_geometric_cdf_generic.svg', format='svg', bbox_inches='tight')
920956
plt.show()
921957
```
922958

923959
![Geometric CDF](ch07_geometric_cdf_generic.svg)
924960

925-
The CDF shows P(X ≤ k), approaching 1 as k increases (eventually you'll succeed).
961+
The CDF shows P(X ≤ k), approaching 1 as k increases (eventually you'll succeed). The red dashed line marks the mean.
926962

927963
:::{admonition} Example: Certification Exam with p = 0.6
928964
:class: tip
@@ -1134,6 +1170,8 @@ The binomial coefficient ensures we count all possible arrangements where the $r
11341170

11351171
**Variance:** $Var(X) = \frac{r(1-p)}{p^2}$
11361172

1173+
**Standard Deviation:** $SD(X) = \frac{\sqrt{r(1-p)}}{p}$
1174+
11371175
:::{admonition} Note
11381176
:class: note
11391177

@@ -1152,43 +1190,61 @@ r_viz = 3
11521190
p_viz = 0.2
11531191
nbinom_viz = stats.nbinom(n=r_viz, p=p_viz)
11541192
1193+
# Calculate mean and std
1194+
mean_viz = r_viz / p_viz
1195+
std_viz = np.sqrt(r_viz * (1 - p_viz)) / p_viz
1196+
11551197
# Plotting the PMF
11561198
k_values_viz = np.arange(r_viz, 30) # Total trials from r to 30
11571199
pmf_values_viz = nbinom_viz.pmf(k_values_viz - r_viz) # Adjust for scipy
11581200
1159-
plt.figure(figsize=(8, 4))
1201+
plt.figure(figsize=(10, 5))
11601202
plt.bar(k_values_viz, pmf_values_viz, color='skyblue', edgecolor='black', alpha=0.7)
1203+
1204+
# Add mean line
1205+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.1f}')
1206+
1207+
# Add mean ± 1 std region
1208+
plt.axvspan(mean_viz - std_viz, mean_viz + std_viz, alpha=0.2, color='orange',
1209+
label=f'Mean ± 1 SD = [{mean_viz - std_viz:.1f}, {mean_viz + std_viz:.1f}]')
1210+
11611211
plt.title(f"Negative Binomial PMF (r={r_viz}, p={p_viz})")
11621212
plt.xlabel("Total Number of Trials (k)")
11631213
plt.ylabel("Probability P(X=k)")
1214+
plt.legend(loc='upper right', fontsize=10)
11641215
plt.grid(axis='y', linestyle='--', alpha=0.6)
11651216
plt.savefig('ch07_negative_binomial_pmf_generic.svg', format='svg', bbox_inches='tight')
11661217
plt.show()
11671218
```
11681219

11691220
![Negative Binomial PMF](ch07_negative_binomial_pmf_generic.svg)
11701221

1171-
The PMF shows the distribution is centered around the expected value r/p = 3/0.2 = 15 trials.
1222+
The PMF shows the distribution is centered around the expected value r/p = 3/0.2 = 15 trials. The shaded region shows mean ± 1 standard deviation.
11721223

11731224
```{code-cell} ipython3
11741225
:tags: [remove-input, remove-output]
11751226
11761227
# Plotting the CDF
11771228
cdf_values_viz = nbinom_viz.cdf(k_values_viz - r_viz)
11781229
1179-
plt.figure(figsize=(8, 4))
1230+
plt.figure(figsize=(10, 5))
11801231
plt.step(k_values_viz, cdf_values_viz, where='post', color='darkgreen', linewidth=2)
1232+
1233+
# Add mean line
1234+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.1f}')
1235+
11811236
plt.title(f"Negative Binomial CDF (r={r_viz}, p={p_viz})")
11821237
plt.xlabel("Total Number of Trials (k)")
11831238
plt.ylabel("Cumulative Probability P(X <= k)")
1239+
plt.legend(loc='lower right', fontsize=10)
11841240
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
11851241
plt.savefig('ch07_negative_binomial_cdf_generic.svg', format='svg', bbox_inches='tight')
11861242
plt.show()
11871243
```
11881244

11891245
![Negative Binomial CDF](ch07_negative_binomial_cdf_generic.svg)
11901246

1191-
The CDF shows P(X ≤ k), the cumulative probability of achieving r successes within k trials.
1247+
The CDF shows P(X ≤ k), the cumulative probability of achieving r successes within k trials. The red dashed line marks the mean.
11921248

11931249
:::{admonition} Example: Quality Control with r = 3, p = 0.05
11941250
:class: tip
@@ -1406,7 +1462,9 @@ For example, "4 calls per hour" could be modeled as 3600 one-second intervals wh
14061462

14071463
**Variance:** $Var(X) = \lambda$
14081464

1409-
Note: Mean and variance are equal in a Poisson distribution.
1465+
**Standard Deviation:** $SD(X) = \sqrt{\lambda}$
1466+
1467+
Note: Mean and variance are equal in a Poisson distribution, so the standard deviation is simply the square root of λ.
14101468

14111469
**Relationship to Other Distributions:** The Poisson distribution is an approximation to the **Binomial distribution** when $n$ is large, $p$ is small, and $\lambda = np$ is moderate. Rule of thumb: use Poisson approximation when $n \ge 20$ and $p \le 0.05$.
14121470

@@ -1421,45 +1479,63 @@ Let's visualize a Poisson distribution with $\lambda = 4$ (our call center examp
14211479
lambda_viz = 4
14221480
poisson_viz = stats.poisson(mu=lambda_viz)
14231481
1482+
# Calculate mean and std
1483+
mean_viz = poisson_viz.mean()
1484+
std_viz = poisson_viz.std()
1485+
14241486
# Plotting the PMF
14251487
k_values_viz = np.arange(0, 15)
14261488
pmf_values_viz = poisson_viz.pmf(k_values_viz)
14271489
1428-
plt.figure(figsize=(8, 4))
1490+
plt.figure(figsize=(10, 5))
14291491
plt.bar(k_values_viz, pmf_values_viz, color='skyblue', edgecolor='black', alpha=0.7)
1492+
1493+
# Add mean line
1494+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.1f}')
1495+
1496+
# Add mean ± 1 std region
1497+
plt.axvspan(mean_viz - std_viz, mean_viz + std_viz, alpha=0.2, color='orange',
1498+
label=f'Mean ± 1 SD = [{mean_viz - std_viz:.1f}, {mean_viz + std_viz:.1f}]')
1499+
14301500
plt.title(f"Poisson PMF (λ={lambda_viz})")
14311501
plt.xlabel("Number of Events (k)")
14321502
plt.ylabel("Probability P(X=k)")
14331503
plt.xticks(k_values_viz)
1504+
plt.legend(loc='upper right', fontsize=10)
14341505
plt.grid(axis='y', linestyle='--', alpha=0.6)
14351506
plt.savefig('ch07_poisson_pmf_generic.svg', format='svg', bbox_inches='tight')
14361507
plt.show()
14371508
```
14381509

14391510
![Poisson PMF](ch07_poisson_pmf_generic.svg)
14401511

1441-
The PMF shows the distribution centered around λ = 4 with reasonable probability for nearby values.
1512+
The PMF shows the distribution centered around λ = 4 with reasonable probability for nearby values. The shaded region shows mean ± 1 standard deviation ($\sqrt{4} = 2$).
14421513

14431514
```{code-cell} ipython3
14441515
:tags: [remove-input, remove-output]
14451516
14461517
# Plotting the CDF
14471518
cdf_values_viz = poisson_viz.cdf(k_values_viz)
14481519
1449-
plt.figure(figsize=(8, 4))
1520+
plt.figure(figsize=(10, 5))
14501521
plt.step(k_values_viz, cdf_values_viz, where='post', color='darkgreen', linewidth=2)
1522+
1523+
# Add mean line
1524+
plt.axvline(mean_viz, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_viz:.1f}')
1525+
14511526
plt.title(f"Poisson CDF (λ={lambda_viz})")
14521527
plt.xlabel("Number of Events (k)")
14531528
plt.ylabel("Cumulative Probability P(X <= k)")
14541529
plt.xticks(k_values_viz)
1530+
plt.legend(loc='lower right', fontsize=10)
14551531
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)
14561532
plt.savefig('ch07_poisson_cdf_generic.svg', format='svg', bbox_inches='tight')
14571533
plt.show()
14581534
```
14591535

14601536
![Poisson CDF](ch07_poisson_cdf_generic.svg)
14611537

1462-
The CDF shows P(X ≤ k), useful for questions like "What's the probability of 6 or fewer calls?"
1538+
The CDF shows P(X ≤ k), useful for questions like "What's the probability of 6 or fewer calls?" The red dashed line marks the mean.
14631539

14641540
:::{admonition} Example: Email Arrivals with λ = 5
14651541
:class: tip

0 commit comments

Comments
 (0)