add walkthrough and result interpretation#806
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
👋 Welcome to CausalPy, @arthurmello! Thank you for opening your first pull request! We're excited to have you contribute to the project. 🎉 Here are a few tips to help your PR get merged smoothly:
A maintainer will review your changes soon. Thanks for helping make CausalPy better! 🚀 💼 LinkedIn Shoutout: Once your PR is merged, we'd love to give you a shoutout on LinkedIn to thank you for your contribution! If you're interested, just drop your LinkedIn profile URL in a comment below. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #806 +/- ##
=======================================
Coverage 94.26% 94.26%
=======================================
Files 78 78
Lines 12005 12005
Branches 699 699
=======================================
Hits 11317 11317
Misses 496 496
Partials 192 192 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
drbenvincent
left a comment
There was a problem hiding this comment.
Thanks for this! This definitely gets towards closing #791, but I've got a few requests that would take it to the next level. Let me know if you think you can give this a go, fully or partially?
1. Acceptance criterion gap: failure patterns and recommended follow-ups are still too thin
Issue #791 explicitly asks to "Document typical failure patterns and recommended follow-ups." The new section currently stops at:
"If it fails, treat that as a red flag for misspecification or pre-existing divergence ... as a reason to investigate further."
That is directionally correct, but it is still quite generic. It does not yet give readers concrete troubleshooting guidance such as:
- what a failure pattern looks like in practice
- how to distinguish noisy pre-period estimates from systematic pre-trends
- what follow-up actions to take next
Suggested fix:
- add a short "If this check fails" subsection with 3-4 concrete bullets
- examples could include: inspect the event-study plot for directional pre-trends, check whether pre-treatment periods are sparse, revisit the outcome model / covariates / fixed effects, and optionally compare robustness with
PriorSensitivity
My take: until this is added, I would say PR #806 only partially satisfies #791.
2. Minor clarity risk: PlaceboInTime is name-dropped even though it is not applicable here
The "Related checks" paragraph says:
"Other pipeline checks (for example placebo-in-time or design-based tests from other experiment classes) are not wired by default for this estimator; use
SensitivityAnalysis(checks=[...])explicitly when applicable."
This is not strictly false because of the "when applicable" qualifier, but it may still mislead readers into thinking PlaceboInTime is a nearby option for Staggered DiD. In the current implementation, PlaceboInTime is only applicable to ITS and Synthetic Control, not StaggeredDifferenceInDifferences.
Suggested fix:
- either remove the
PlaceboInTimeexample from this notebook - or rephrase to say that for Staggered DiD the main additional pipeline check to consider is
PriorSensitivity, while other checks are method-specific elsewhere in the docs
9643ef3 to
29718e2
Compare
|
Rebased onto Outstanding review feedback from the previous change request still applies. No rush, but are you still able to continue with that? |
@drbenvincent sorry, haven't really had the time recently, for personal reasons. I think it's best if someone else takes this on, so I won't block its development (not sure when I'll have the time). |
|
@arthurmello no problem, I understand. Happy if you want to pick something up in the future :) |
|
Continued in #846 — rebased onto current
Thanks @arthurmello for the initial implementation! |
Updated docs/source/notebooks/staggered_did_pymc.ipynb with a focused Staggered DiD sensitivity walkthrough for PreTreatmentPlaceboCheck.
Adresses #791