CyberAgentAILab · TomeHirata · Sep 18, 2025 · Sep 18, 2025
diff --git a/docs/source/tutorials/hillstrom.rst b/docs/source/tutorials/hillstrom.rst
@@ -67,53 +67,149 @@ Data Setup and Loading
     print(f"Men's Email: {df[df['segment']=='Mens E-Mail']['conversion'].mean():.3f}")
     print(f"Women's Email: {df[df['segment']=='Women E-Mail']['conversion'].mean():.3f}")
 
-Comparing Men's vs Women's Email Campaigns
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Email Campaign Effectiveness Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: python
 
-    print(f"Email campaign comparison sample: {len(D_email):,} customers")
-    print(f"Men's Email: {(D_email==0).sum():,}")
-    print(f"Women's Email: {(D_email==1).sum():,}")
-
-    # Initialize estimators for email comparison
-    simple_email = dte_adj.SimpleDistributionEstimator()
-    ml_email = dte_adj.AdjustedDistributionEstimator(
+    # Initialize estimators
+    simple_estimator = dte_adj.SimpleDistributionEstimator()
+    ml_estimator = dte_adj.AdjustedDistributionEstimator(
         LinearRegression(),
         folds=5
     )
 
-    # Fit estimators
-    simple_email.fit(X, D, revenue)
-    ml_email.fit(X, D, revenue)
+    # Fit estimators on the full dataset
+    simple_estimator.fit(X, D, revenue)
+    ml_estimator.fit(X, D, revenue)
 
     # Define revenue evaluation points
     revenue_locations = np.linspace(0, 500, 51)
 
+Control vs Women's Email Campaign
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First, let's examine how the Women's email campaign performs compared to no email (control):
+
+.. code-block:: python
+
+    # Compute DTE: Women's email vs Control
+    dte_women_ctrl, lower_women_ctrl, upper_women_ctrl = simple_estimator.predict_dte(
+        target_treatment_arm=2,  # Women's email
+        control_treatment_arm=0,  # No email control
+        locations=revenue_locations,
+        variance_type="moment"
+    )
+
+    # Visualize Women's vs Control using dte_adj's plot function
+    plot(revenue_locations, dte_women_ctrl, lower_women_ctrl, upper_women_ctrl,
+         title="Women's Email Campaign vs Control",
+         xlabel="Spending ($)", ylabel="Distribution Treatment Effect")
+
+    # Statistical summary
+    positive_dte_women = (dte_women_ctrl > 0).mean()
+    significant_dte_women = ((lower_women_ctrl > 0) | (upper_women_ctrl < 0)).mean()
+
+    print(f"Women's Email vs Control Results:")
+    print(f"Locations where Women's > Control: {positive_dte_women:.1%}")
+    print(f"Statistically significant differences: {significant_dte_women:.1%}")
+    print(f"Average DTE: {dte_women_ctrl.mean():.3f}")
+
+Control vs Men's Email Campaign
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Next, let's examine how the Men's email campaign performs compared to no email (control):
+
+.. code-block:: python
+
+    # Compute DTE: Men's email vs Control
+    dte_men_ctrl, lower_men_ctrl, upper_men_ctrl = simple_estimator.predict_dte(
+        target_treatment_arm=1,  # Men's email
+        control_treatment_arm=0,  # No email control
+        locations=revenue_locations,
+        variance_type="moment"
+    )
+
+    # Visualize Men's vs Control using dte_adj's plot function
+    plot(revenue_locations, dte_men_ctrl, lower_men_ctrl, upper_men_ctrl,
+         title="Men's Email Campaign vs Control",
+         xlabel="Spending ($)", ylabel="Distribution Treatment Effect", color="purple")
+
+    # Statistical summary
+    positive_dte_men = (dte_men_ctrl > 0).mean()
+    significant_dte_men = ((lower_men_ctrl > 0) | (upper_men_ctrl < 0)).mean()
+
+    print(f"Men's Email vs Control Results:")
+    print(f"Locations where Men's > Control: {positive_dte_men:.1%}")
+    print(f"Statistically significant differences: {significant_dte_men:.1%}")
+    print(f"Average DTE: {dte_men_ctrl.mean():.3f}")
+
+Both Campaigns vs Control Comparison
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The control vs email campaigns analysis produces the following comparison:
+
+.. image:: ../_static/hillstorm_dte_control.png
+   :alt: Hillstrom Email Campaigns vs Control Analysis
+   :width: 800px
+   :align: center
+
+**Interpreting the Control Comparison Results**: These plots show how each email campaign performs against the no-email control group across different spending levels:
+
+**Women's Email vs Control**:
+- **Positive DTE values** indicate that Women's email campaign increases the probability of spending at those levels compared to no email
+- **Distribution pattern** shows where Women's email is most effective in driving customer spending
+- **Confidence intervals** reveal statistical significance of the treatment effects
+
+**Men's Email vs Control**:
+- **Comparative effectiveness** can be assessed by comparing the magnitude and patterns of effects
+- **Different spending ranges** may show varying campaign effectiveness
+- **Statistical significance** indicated by confidence intervals not crossing zero
+
+**Key Control Analysis Findings**:
+
+1. **Campaign Effectiveness**: Both campaigns show positive effects compared to no email, confirming that email marketing drives incremental spending
+
+2. **Differential Patterns**: The shape and magnitude of effects differ between campaigns, revealing:
+   - Which campaign has stronger overall effects
+   - Different spending ranges where each campaign excels
+   - Varying confidence in treatment effects across spending levels
+
+3. **Business Implications**:
+   - **ROI Assessment**: Compare effect sizes to determine which campaign provides better return on investment
+   - **Customer Segmentation**: Identify spending ranges where each campaign is most/least effective
+   - **Resource Allocation**: Data-driven decisions on campaign budget allocation
+
+4. **Statistical Rigor**: Confidence intervals provide guidance on where observed differences are statistically reliable vs. potentially due to sampling variation
+
+This analysis answers the fundamental question: "Do email campaigns work?" and establishes the baseline effectiveness of each campaign against no email.
+
+Direct Campaign Comparison: Men's vs Women's Email
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Finally, let's directly compare the two email campaigns to answer the key research question:
+
+.. code-block:: python
+
     # Compute DTE: Women's vs Men's email campaigns
-    dte_simple, lower_simple, upper_simple = simple_email.predict_dte(
+    dte_women_men, lower_women_men, upper_women_men = simple_estimator.predict_dte(
         target_treatment_arm=2,  # Women's email
         control_treatment_arm=1,  # Men's email (as "control")
         locations=revenue_locations,
         variance_type="moment"
     )
 
-    dte_ml, lower_ml, upper_ml = ml_email.predict_dte(
+    dte_ml, lower_ml, upper_ml = ml_estimator.predict_dte(
         target_treatment_arm=2,  # Women's email
         control_treatment_arm=1,  # Men's email
         locations=revenue_locations,
         variance_type="moment"
     )
 
-Distribution Treatment Effects Analysis
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. code-block:: python
-
     # Visualize the distribution treatment effects using dte_adj's built-in plot function
 
     # Simple estimator
-    plot(revenue_locations, dte_simple, lower_simple, upper_simple,
+    plot(revenue_locations, dte_women_men, lower_women_men, upper_women_men,
          title="Email Campaign Comparison: Women's vs Men's (Simple Estimator)",
          xlabel="Spending ($)", ylabel="Distribution Treatment Effect")
 
@@ -126,7 +222,7 @@ Distribution Treatment Effects Analysis
     positive_dte = (dte_ml > 0).mean()
     significant_dte = ((lower_ml > 0) | (upper_ml < 0)).mean()
 
-    print(f"\nDistributional Analysis Results:")
+    print(f"\nDirect Campaign Comparison Results:")
     print(f"Locations where Women's > Men's: {positive_dte:.1%}")
     print(f"Statistically significant differences: {significant_dte:.1%}")
     print(f"Average DTE: {dte_ml.mean():.3f}")
@@ -138,7 +234,7 @@ The analysis produces the following distribution treatment effects visualization
    :width: 800px
    :align: center
 
-**Interpreting the Results**: The plot shows the distribution treatment effects (DTE) comparing Women's vs Men's email campaigns across different spending levels. Key observations:
+**Interpreting the Campaign Comparison Results**: The plot shows the distribution treatment effects (DTE) comparing Women's vs Men's email campaigns across different spending levels. Key observations:
 
 - **Positive DTE values** (above zero line) indicate that Women's email campaign increases the probability of spending at that level compared to Men's campaign
 - **Confidence intervals** (shaded areas) show statistical uncertainty - where intervals don't cross zero, effects are statistically significant
@@ -152,15 +248,15 @@ Revenue Category Analysis with PTE
 
 .. code-block:: python
 
-    # Compute Probability Treatment Effects
-    pte_simple, pte_lower_simple, pte_upper_simple = simple_email.predict_pte(
+    # Compute Probability Treatment Effects for Women's vs Men's comparison
+    pte_simple, pte_lower_simple, pte_upper_simple = simple_estimator.predict_pte(
         target_treatment_arm=2,  # Women's email
         control_treatment_arm=1,  # Men's email
         locations=revenue_locations,
         variance_type="moment"
     )
 
-    pte_ml, pte_lower_ml, pte_upper_ml = ml_email.predict_pte(
+    pte_ml, pte_lower_ml, pte_upper_ml = ml_estimator.predict_pte(
         target_treatment_arm=2,  # Women's email
         control_treatment_arm=1,  # Men's email
         locations=revenue_locations,
@@ -204,70 +300,6 @@ The Probability Treatment Effects analysis produces the following visualization:
 
 This granular analysis helps marketers understand not just which campaign generates more revenue overall, but specifically which spending behaviors each campaign drives.
 
-Control vs Email Campaigns Analysis
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. code-block:: python
-
-    dte_mens_ctrl, lower_mens_ctrl, upper_mens_ctrl = simple_email.predict_dte(
-        target_treatment_arm=1, control_treatment_arm=0,
-        locations=revenue_locations, variance_type="moment"
-    )
-
-    dte_women_ctrl, lower_women_ctrl, upper_women_ctrl = simple_email.predict_dte(
-        target_treatment_arm=2, control_treatment_arm=0,
-        locations=revenue_locations, variance_type="moment"
-    )
-
-    # Visualize both campaigns vs control using dte_adj's plot function
-
-    # Men's vs Control
-    plot(revenue_locations, dte_mens_ctrl, lower_mens_ctrl, upper_mens_ctrl,
-         title="Men's Email Campaign vs Control",
-         xlabel="Spending ($)", ylabel="Distribution Treatment Effect", color="purple")
-
-    # Women's vs Control
-    plot(revenue_locations, dte_women_ctrl, lower_women_ctrl, upper_women_ctrl,
-         title="Women's Email Campaign vs Control",
-         xlabel="Spending ($)", ylabel="Distribution Treatment Effect")
-
-The control vs email campaigns analysis produces the following comparison:
-
-.. image:: ../_static/hillstorm_dte_control.png
-   :alt: Hillstrom Email Campaigns vs Control Analysis
-   :width: 800px
-   :align: center
-
-**Interpreting the Control Comparison Results**: These side-by-side plots show how each email campaign performs against the no-email control group across different spending levels:
-
-**Men's Email vs Control (Top Panel)**:
-- **Positive DTE values** indicate that Men's email campaign increases the probability of spending at those levels compared to no email
-- **Distribution pattern** shows where Men's email is most effective in driving customer spending
-- **Confidence intervals** reveal statistical significance of the treatment effects
-
-**Women's Email vs Control (Bottom Panel)**:
-- **Comparative effectiveness** can be assessed by comparing the magnitude and patterns of effects
-- **Different spending ranges** may show varying campaign effectiveness
-- **Statistical significance** indicated by confidence intervals not crossing zero
-
-**Key Control Analysis Findings**:
-
-1. **Campaign Effectiveness**: Both campaigns show positive effects compared to no email, confirming that email marketing drives incremental spending
-
-2. **Differential Patterns**: The shape and magnitude of effects differ between campaigns, revealing:
-   - Which campaign has stronger overall effects
-   - Different spending ranges where each campaign excels
-   - Varying confidence in treatment effects across spending levels
-
-3. **Business Implications**:
-   - **ROI Assessment**: Compare effect sizes to determine which campaign provides better return on investment
-   - **Customer Segmentation**: Identify spending ranges where each campaign is most/least effective
-   - **Resource Allocation**: Data-driven decisions on campaign budget allocation
-
-4. **Statistical Rigor**: Confidence intervals provide guidance on where observed differences are statistically reliable vs. potentially due to sampling variation
-
-This analysis answers the fundamental question: "Do email campaigns work?" and more importantly, "Which one works better and for which customer segments?"
-
 **Key Findings**: Using the real Hillstrom dataset with 64,000 customers, the distributional analysis reveals nuanced patterns in how email campaigns affect customer spending. The analysis goes beyond simple average comparisons to show how treatment effects vary across the entire spending distribution, providing insights into which customer segments respond best to different campaign types. This demonstrates the power of distribution treatment effect analysis for understanding heterogeneous responses in digital marketing experiments.
 
 Next Steps