feat(hackathon-be): A/B experiment analytics endpoint#6702
feat(hackathon-be): A/B experiment analytics endpoint#6702gagantrivedi merged 2 commits intomainfrom
Conversation
Add trait-based experiment tracking with statistical analysis:
- G-test (log-likelihood ratio) for significance testing
- Bayesian "chance to win" via Monte Carlo simulation
- Support for 2+ variants with lift calculation
- Sample size warnings for reliability guidance
Endpoint: GET /api/v1/environments/{key}/experiments/results/?feature=name
Uses optimised single aggregated query for performance at scale.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| try: | ||
| _, p_value, _, _ = scipy_stats.chi2_contingency( | ||
| table_safe, correction=True, lambda_="log-likelihood" | ||
| ) |
There was a problem hiding this comment.
G-test uses invalid continuity correction
Medium Severity
chi2_contingency is called with both lambda_="log-likelihood" and correction=True. This applies continuity correction while claiming a G-test, so the reported p_value is not the intended log-likelihood test result. It can make significance decisions overly conservative and hide true experiment winners.
Additional Locations (1)
| required=True, | ||
| max_length=200, | ||
| help_text="The feature name to analyse (without the 'exp_' prefix)", | ||
| ) |
There was a problem hiding this comment.
Feature length allows impossible trait keys
Low Severity
ExperimentResultsQuerySerializer permits feature up to 200 chars, but the endpoint builds exp_{feature}_variant and exp_{feature}_converted. Those keys can exceed Trait.trait_key max length, so valid-looking requests can never match stored traits and return misleading empty experiment data.
| for other in variants: | ||
| if other != v: | ||
| wins &= variant_samples[v] > variant_samples[other] | ||
| chance_to_win[v] = round(float(wins.mean()), 3) |
There was a problem hiding this comment.
Monte Carlo scales quadratically with variant count
Medium Severity
_calculate_multi_variant_stats always generates 50000 beta samples per variant, then performs pairwise comparisons across all variants. With no cap on distinct variant values, runtime and memory grow quickly (O(V*50000) storage and O(V^2*50000) comparisons), so noisy or malformed experiment data can make this endpoint extremely slow or unstable.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6702 +/- ##
==========================================
+ Coverage 98.20% 98.23% +0.02%
==========================================
Files 1298 1311 +13
Lines 47172 48480 +1308
==========================================
+ Hits 46327 47623 +1296
- Misses 845 857 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|


Summary
exp_{feature}_variantandexp_{feature}_convertedEndpoint
Example Response
{ "feature": "checkout", "variants": [ {"variant": "blue", "evaluations": 60, "conversions": 15, "conversion_rate": 25.0}, {"variant": "green", "evaluations": 60, "conversions": 30, "conversion_rate": 50.0} ], "statistics": { "p_value": 0.0079, "significant": true, "chance_to_win": {"blue": 0.003, "green": 0.997}, "lift": "+100.0%", "winner": "green", "recommendation": "green wins with 99.7% confidence", "sample_size_warning": "Sample size (60) is modest - consider collecting more data" } }Test plan
🚀 Hackathon A-Team