Skip to content

Commit 53c6eab

Browse files
committed
fix explanation of ml adjustment and fix stratum comparison
1 parent 8fdc26d commit 53c6eab

2 files changed

Lines changed: 84 additions & 32 deletions

File tree

-40.3 KB
Loading

docs/source/tutorials/oregon.rst

Lines changed: 84 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,11 @@ The analysis produces the following local distribution treatment effects visuali
203203
- **ML-Adjusted Local Estimator**: Shows a smaller effect of LDTE ≈ -0.15 at zero costs, with similar convergence patterns.
204204
- **Key Finding**: Both estimators reveal insurance primarily affects the lower tail (zero to ~$10,000), shifting the distribution rightward. This indicates insurance increases ED access among those who would otherwise not seek care, while having minimal impact on high-cost users.
205205

206-
**2. Covariate Adjustment Effects and Confidence Intervals**
207206

208-
The confidence intervals are not substantially narrower with ML adjustment. Both methods show comparably wide confidence bands, indicating limited efficiency gains. This suggests: (1) covariates have limited predictive power for ED costs, (2) the linear regression model may be too simple, or (3) the simple estimator is already reasonably efficient.
207+
The confidence intervals are not substantially narrower with ML adjustment. Both methods show comparably wide confidence bands, indicating limited efficiency gains. This result reflects the **limited predictive power of available covariates** (R² ≈ 0.21 when predicting ED costs from pre-treatment ED history and demographics).
208+
209+
ML adjustment provides efficiency gains proportional to covariate predictive power. When covariates weakly predict outcomes (R² < 0.3), as in this case, ML adjustment yields minimal improvements over simple estimation. This is a characteristic of the data—pre-treatment healthcare utilization and basic demographics cannot strongly predict future emergency department costs—not a failure of the ML methodology.
210+
209211

210212
Cost Analysis with Local PTE
211213
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -391,9 +393,8 @@ Visits Analysis with Local PTE
391393
- **ML-Adjusted Local Estimator**: Shows a larger negative effect at zero visits (LPTE ≈ -0.14) and positive effects in the 1-5 visit range (LPTE ≈ 0.03-0.04). Effects converge to zero at higher visit frequencies.
392394
- **Key Finding**: Insurance reduces the probability mass at zero visits while increasing it in the low-to-moderate visit range (1-5 visits). This represents a redistribution of probability mass from non-users to low-frequency ED users, with minimal effect on frequent visitors.
393395

394-
**2. Covariate Adjustment Effects and Confidence Intervals**
395396

396-
The confidence intervals remain wide for both estimators, particularly at zero and low visit counts. The limited precision suggests: (1) substantial heterogeneity in treatment effects within visit frequency bins, (2) limited predictive power of covariates for specific visit levels, or (3) relatively small sample sizes within individual bins.
397+
The confidence intervals remain wide for both estimators, with minimal differences between simple and ML-adjusted approaches. This limited precision reflects the same fundamental constraint as in the cost analysis: covariates have limited predictive power for ED visit frequency (R² ≈ 0.21). The substantial heterogeneity in treatment effects, combined with weak covariate prediction, means ML adjustment provides minimal efficiency gains over the simpler approach.
397398

398399

399400
Stratified Analysis by Household Registration
@@ -487,53 +488,101 @@ Visualization: Comparing Overall Population vs Stratified Results
487488
.. code-block:: python
488489
489490
# Comparison: Overall vs Individual Strata (Local Estimators)
490-
fig, axes = plt.subplots(2, 3, figsize=(24, 12))
491+
fig, axes = plt.subplots(2, 2, figsize=(24, 12))
492+
493+
# Calculate global y-axis limits across all plots (to align y-axis)
494+
all_ydatas = []
495+
all_yerr_lowers = []
496+
all_yerr_uppers = []
497+
498+
# Collect all y values (means and error bounds) for ALL subplots
499+
# Overall population: Simple and ML-adjusted
500+
all_ydatas.append(ldte_simple)
501+
all_yerr_lowers.append(lower_simple)
502+
all_yerr_uppers.append(upper_simple)
503+
all_ydatas.append(ldte_ml)
504+
all_yerr_lowers.append(lower_ml)
505+
all_yerr_uppers.append(upper_ml)
506+
507+
# Each stratum: Simple and ML-adjusted
508+
for stratum, results in individual_results.items():
509+
if stratum == 'signed self up + others':
510+
continue
511+
if results is None:
512+
continue
513+
all_ydatas.append(results['simple']['ldte'])
514+
all_yerr_lowers.append(results['simple']['lower'])
515+
all_yerr_uppers.append(results['simple']['upper'])
516+
all_ydatas.append(results['ml']['ldte'])
517+
all_yerr_lowers.append(results['ml']['lower'])
518+
all_yerr_uppers.append(results['ml']['upper'])
519+
520+
# Determine min/max y for unified y-axis
521+
y_min = np.min([np.min(dat) for dat in all_yerr_lowers if dat is not None])
522+
y_max = np.max([np.max(dat) for dat in all_yerr_uppers if dat is not None])
491523
492524
# Row 1: Simple local estimators
493525
# Overall (all data)
494-
plot(outcome_ed_costs_locations, ldte_simple, lower_simple, upper_simple,
495-
title="ED Costs: Overall Population\n(Simple Local Estimator)",
496-
xlabel="Emergency Department Costs",
497-
ylabel="Local Distribution Treatment Effect",
498-
color="black", ax=axes[0, 0])
526+
plot(
527+
outcome_ed_costs_locations, ldte_simple, lower_simple, upper_simple,
528+
title="ED Costs: Overall Population\n(Simple Local Estimator)",
529+
xlabel="Emergency Department Costs",
530+
ylabel="Local Distribution Treatment Effect",
531+
color="black", ax=axes[0, 0]
532+
)
533+
axes[0, 0].set_ylim(y_min, y_max)
499534
500535
# Individual strata
501536
col_idx = 1
502537
for stratum, results in individual_results.items():
538+
if stratum == 'signed self up + others':
539+
continue
503540
if results is None or col_idx > 2:
504541
continue
505-
506-
plot(results['locations'], results['simple']['ldte'],
507-
results['simple']['lower'], results['simple']['upper'],
508-
title=f"ED Costs: {stratum}\n(Simple Local Estimator, n={results['sample_size']:,})",
509-
xlabel="Emergency Department Costs",
510-
ylabel="Local Distribution Treatment Effect",
511-
color="blue" if col_idx == 1 else "green", ax=axes[0, col_idx])
542+
plot(
543+
results['locations'], results['simple']['ldte'],
544+
results['simple']['lower'], results['simple']['upper'],
545+
title=f"ED Costs: {stratum}\n(Simple Local Estimator, n={results['sample_size']:,})",
546+
xlabel="Emergency Department Costs",
547+
ylabel="Local Distribution Treatment Effect",
548+
color="blue" if col_idx == 1 else "green", ax=axes[0, col_idx]
549+
)
550+
axes[0, col_idx].set_ylim(y_min, y_max)
512551
col_idx += 1
513552
514553
# Row 2: ML-Adjusted local estimators
515554
# Overall (all data)
516-
plot(outcome_ed_costs_locations, ldte_ml, lower_ml, upper_ml,
517-
title="ED Costs: Overall Population\n(ML-Adjusted Local Estimator)",
518-
xlabel="Emergency Department Costs",
519-
ylabel="Local Distribution Treatment Effect",
520-
color="black", ax=axes[1, 0])
555+
plot(
556+
outcome_ed_costs_locations, ldte_ml, lower_ml, upper_ml,
557+
title="ED Costs: Overall Population\n(ML-Adjusted Local Estimator)",
558+
xlabel="Emergency Department Costs",
559+
ylabel="Local Distribution Treatment Effect",
560+
color="black", ax=axes[1, 0]
561+
)
562+
axes[1, 0].set_ylim(y_min, y_max)
521563
522564
# Individual strata
523565
col_idx = 1
524566
for stratum, results in individual_results.items():
567+
if stratum == 'signed self up + others':
568+
continue
525569
if results is None or col_idx > 2:
526570
continue
527-
528-
plot(results['locations'], results['ml']['ldte'],
529-
results['ml']['lower'], results['ml']['upper'],
530-
title=f"ED Costs: {stratum}\n(ML-Adjusted Local Estimator, n={results['sample_size']:,})",
531-
xlabel="Emergency Department Costs",
532-
ylabel="Local Distribution Treatment Effect",
533-
color="red" if col_idx == 1 else "purple", ax=axes[1, col_idx])
571+
plot(
572+
results['locations'], results['ml']['ldte'],
573+
results['ml']['lower'], results['ml']['upper'],
574+
title=f"ED Costs: {stratum}\n(ML-Adjusted Local Estimator, n={results['sample_size']:,})",
575+
xlabel="Emergency Department Costs",
576+
ylabel="Local Distribution Treatment Effect",
577+
color="blue" if col_idx == 1 else "green", ax=axes[1, col_idx]
578+
)
579+
axes[1, col_idx].set_ylim(y_min, y_max)
534580
col_idx += 1
535581
536-
plt.suptitle("Comparison: Overall Population vs Individual Household Registration Strata (Local Estimators)", fontsize=16)
582+
plt.suptitle(
583+
"Comparison: Overall Population vs Individual Household Registration Strata (Local Estimators)",
584+
fontsize=16
585+
)
537586
plt.tight_layout()
538587
plt.show()
539588
@@ -608,9 +657,10 @@ The LPTE analysis reveals insurance does not uniformly increase ED utilization.
608657

609658
Stratified analysis uncovers dramatic treatment effect heterogeneity: single-person households ("signed self up") show moderate effects (LDTE ≈ -0.18 to -0.20), while multi-person households ("signed self up + others") exhibit 3-4x larger effects (LDTE ≈ -0.55). This suggests household structure is a critical moderator—insurance enables care-seeking for multiple family members when households include dependents.
610659

611-
**4. Limited Efficiency Gains from ML Adjustment**
660+
**4. ML Adjustment Effectiveness Depends on Covariate Predictive Power**
661+
662+
With baseline covariates (pre-randomization ED utilization + demographics, R² ≈ 0.21), ML-adjusted estimators show minimal efficiency gains—confidence intervals remain comparably wide or even slightly wider than simple estimators. However, enhanced feature engineering could be improve predictive power, enabling ML adjustment to narrow confidence intervals.
612663

613-
Despite using pre-randomization ED utilization history and demographic covariates, ML-adjusted estimators show minimal efficiency gains over simple estimators. Confidence intervals remain comparably wide for both methods, suggesting: (1) the covariates have limited predictive power for ED outcomes, (2) the linear regression model may be too simple, or (3) substantial residual heterogeneity exists even after covariate adjustment. Notably, ML adjustment becomes unstable in small strata (n=4,068), producing implausible estimates (LDTE reaching +20), highlighting that model complexity must match sample informativeness.
614664

615665
**5. Policy Implications for Targeted Interventions**
616666

@@ -619,6 +669,8 @@ The distributional analysis reveals that Medicaid's primary benefit is enabling
619669
Next Steps
620670
~~~~~~~~~~
621671

672+
**For Your Own Data**:
673+
622674
- Try with your own randomized experiment data
623675
- Experiment with different ML models (XGBoost, Neural Networks) for adjustment
624676
- Explore stratified estimators for covariate-adaptive randomization designs

0 commit comments

Comments
 (0)