feat(hackathon-be): A/B experiment analytics endpoint by gagantrivedi · Pull Request #6702 · Flagsmith/flagsmith

gagantrivedi · 2026-02-12T10:09:32Z

Summary

Add new endpoint for A/B experiment results with statistical analysis
Uses trait-based tracking: exp_{feature}_variant and exp_{feature}_converted
Implements G-test (log-likelihood ratio) for significance testing
Implements Bayesian "chance to win" via Monte Carlo simulation
Supports 2+ variants with lift calculation and sample size warnings
Uses optimised single aggregated query for performance at scale

Endpoint

GET /api/v1/environments/{api_key}/experiments/results/?feature={name}

Example Response

{
  "feature": "checkout",
  "variants": [
    {"variant": "blue", "evaluations": 60, "conversions": 15, "conversion_rate": 25.0},
    {"variant": "green", "evaluations": 60, "conversions": 30, "conversion_rate": 50.0}
  ],
  "statistics": {
    "p_value": 0.0079,
    "significant": true,
    "chance_to_win": {"blue": 0.003, "green": 0.997},
    "lift": "+100.0%",
    "winner": "green",
    "recommendation": "green wins with 99.7% confidence",
    "sample_size_warning": "Sample size (60) is modest - consider collecting more data"
  }
}

Test plan

Unit tests for statistics calculations (G-test, Bayesian, edge cases)
Integration tests for API endpoint
Tested on staging with sample data
Manual verification after deployment

🚀 Hackathon A-Team

Add trait-based experiment tracking with statistical analysis: - G-test (log-likelihood ratio) for significance testing - Bayesian "chance to win" via Monte Carlo simulation - Support for 2+ variants with lift calculation - Sample size warnings for reliability guidance Endpoint: GET /api/v1/environments/{key}/experiments/results/?feature=name Uses optimised single aggregated query for performance at scale.

vercel · 2026-02-12T10:09:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments

Project	Deployment	Updated (UTC)
docs	Ignored	Feb 12, 2026 10:09am
flagsmith-frontend-preview	Ignored	Feb 12, 2026 10:09am
flagsmith-frontend-staging	Ignored	Feb 12, 2026 10:09am

github-actions · 2026-02-12T10:10:36Z

Docker builds report

Image	Build Status	Security report
`ghcr.io/flagsmith/flagsmith-e2e:pr-6702`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-frontend:pr-6702`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-api-test:pr-6702`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api:pr-6702`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith:pr-6702`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-private-cloud:pr-6702`	Finished ✅	Results ✅

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-02-12T10:13:10Z

+    try:
+        _, p_value, _, _ = scipy_stats.chi2_contingency(
+            table_safe, correction=True, lambda_="log-likelihood"
+        )


G-test uses invalid continuity correction

Medium Severity

chi2_contingency is called with both lambda_="log-likelihood" and correction=True. This applies continuity correction while claiming a G-test, so the reported p_value is not the intended log-likelihood test result. It can make significance decisions overly conservative and hide true experiment winners.

Additional Locations (1)

api/app_analytics/experiments.py#L252-L255

cursor · 2026-02-12T10:13:10Z

+        required=True,
+        max_length=200,
+        help_text="The feature name to analyse (without the 'exp_' prefix)",
+    )


Feature length allows impossible trait keys

Low Severity

ExperimentResultsQuerySerializer permits feature up to 200 chars, but the endpoint builds exp_{feature}_variant and exp_{feature}_converted. Those keys can exceed Trait.trait_key max length, so valid-looking requests can never match stored traits and return misleading empty experiment data.

cursor · 2026-02-12T10:13:10Z

+        for other in variants:
+            if other != v:
+                wins &= variant_samples[v] > variant_samples[other]
+        chance_to_win[v] = round(float(wins.mean()), 3)


Monte Carlo scales quadratically with variant count

Medium Severity

_calculate_multi_variant_stats always generates 50000 beta samples per variant, then performs pairwise comparisons across all variants. With no cap on distinct variant values, runtime and memory grow quickly (O(V*50000) storage and O(V^2*50000) comparisons), so noisy or malformed experiment data can make this endpoint extremely slow or unstable.

codecov · 2026-02-12T10:14:46Z

Codecov Report

❌ Patch coverage is 93.43434% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.23%. Comparing base (345eb99) to head (3b5eb90).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
api/app_analytics/experiments.py	90.00%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6702      +/-   ##
==========================================
+ Coverage   98.20%   98.23%   +0.02%     
==========================================
  Files        1298     1311      +13     
  Lines       47172    48480    +1308     
==========================================
+ Hits        46327    47623    +1296     
- Misses        845      857      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gagantrivedi added 2 commits February 11, 2026 13:03

deps: add numpy and scipy for experiment statistics

3b5eb90

gagantrivedi requested a review from a team as a code owner February 12, 2026 10:09

gagantrivedi requested review from khvn26 and removed request for a team February 12, 2026 10:09

github-actions Bot added the api Issue related to the REST API label Feb 12, 2026

Zaimwa9 requested review from Zaimwa9 and removed request for khvn26 February 12, 2026 10:10

github-actions Bot added the feature New feature or request label Feb 12, 2026

Zaimwa9 approved these changes Feb 12, 2026

View reviewed changes

gagantrivedi merged commit 1e2c498 into main Feb 12, 2026
31 checks passed

gagantrivedi deleted the a-team branch February 12, 2026 10:11

flagsmithdev mentioned this pull request Feb 12, 2026

chore(main): release 2.216.0 #6703

Merged

cursor Bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hackathon-be): A/B experiment analytics endpoint#6702

feat(hackathon-be): A/B experiment analytics endpoint#6702
gagantrivedi merged 2 commits intomainfrom
a-team

gagantrivedi commented Feb 12, 2026

Uh oh!

vercel Bot commented Feb 12, 2026

Uh oh!

github-actions Bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Feb 12, 2026

Uh oh!

cursor Bot Feb 12, 2026

Uh oh!

cursor Bot Feb 12, 2026

Uh oh!

codecov Bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gagantrivedi commented Feb 12, 2026

Summary

Endpoint

Example Response

Test plan

Uh oh!

vercel Bot commented Feb 12, 2026

Uh oh!

github-actions Bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docker builds report

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor Bot Feb 12, 2026

Choose a reason for hiding this comment

G-test uses invalid continuity correction

Uh oh!

cursor Bot Feb 12, 2026

Choose a reason for hiding this comment

Feature length allows impossible trait keys

Uh oh!

cursor Bot Feb 12, 2026

Choose a reason for hiding this comment

Monte Carlo scales quadratically with variant count

Uh oh!

codecov Bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Feb 12, 2026 •

edited

Loading

codecov Bot commented Feb 12, 2026 •

edited

Loading