Skip to content

feat(hackathon-be): A/B experiment analytics endpoint#6702

Merged
gagantrivedi merged 2 commits intomainfrom
a-team
Feb 12, 2026
Merged

feat(hackathon-be): A/B experiment analytics endpoint#6702
gagantrivedi merged 2 commits intomainfrom
a-team

Conversation

@gagantrivedi
Copy link
Copy Markdown
Member

Summary

  • Add new endpoint for A/B experiment results with statistical analysis
  • Uses trait-based tracking: exp_{feature}_variant and exp_{feature}_converted
  • Implements G-test (log-likelihood ratio) for significance testing
  • Implements Bayesian "chance to win" via Monte Carlo simulation
  • Supports 2+ variants with lift calculation and sample size warnings
  • Uses optimised single aggregated query for performance at scale

Endpoint

GET /api/v1/environments/{api_key}/experiments/results/?feature={name}

Example Response

{
  "feature": "checkout",
  "variants": [
    {"variant": "blue", "evaluations": 60, "conversions": 15, "conversion_rate": 25.0},
    {"variant": "green", "evaluations": 60, "conversions": 30, "conversion_rate": 50.0}
  ],
  "statistics": {
    "p_value": 0.0079,
    "significant": true,
    "chance_to_win": {"blue": 0.003, "green": 0.997},
    "lift": "+100.0%",
    "winner": "green",
    "recommendation": "green wins with 99.7% confidence",
    "sample_size_warning": "Sample size (60) is modest - consider collecting more data"
  }
}

Test plan

  • Unit tests for statistics calculations (G-test, Bayesian, edge cases)
  • Integration tests for API endpoint
  • Tested on staging with sample data
  • Manual verification after deployment

🚀 Hackathon A-Team

Add trait-based experiment tracking with statistical analysis:
- G-test (log-likelihood ratio) for significance testing
- Bayesian "chance to win" via Monte Carlo simulation
- Support for 2+ variants with lift calculation
- Sample size warnings for reliability guidance

Endpoint: GET /api/v1/environments/{key}/experiments/results/?feature=name

Uses optimised single aggregated query for performance at scale.
@gagantrivedi gagantrivedi requested a review from a team as a code owner February 12, 2026 10:09
@gagantrivedi gagantrivedi requested review from khvn26 and removed request for a team February 12, 2026 10:09
@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs Ignored Ignored Feb 12, 2026 10:09am
flagsmith-frontend-preview Ignored Ignored Feb 12, 2026 10:09am
flagsmith-frontend-staging Ignored Ignored Feb 12, 2026 10:09am

Request Review

@github-actions github-actions Bot added the api Issue related to the REST API label Feb 12, 2026
@Zaimwa9 Zaimwa9 requested review from Zaimwa9 and removed request for khvn26 February 12, 2026 10:10
@github-actions github-actions Bot added the feature New feature or request label Feb 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 12, 2026

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-6702 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-6702 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api-test:pr-6702 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api:pr-6702 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-6702 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-6702 Finished ✅ Results

@gagantrivedi gagantrivedi merged commit 1e2c498 into main Feb 12, 2026
31 checks passed
@gagantrivedi gagantrivedi deleted the a-team branch February 12, 2026 10:11
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

try:
_, p_value, _, _ = scipy_stats.chi2_contingency(
table_safe, correction=True, lambda_="log-likelihood"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G-test uses invalid continuity correction

Medium Severity

chi2_contingency is called with both lambda_="log-likelihood" and correction=True. This applies continuity correction while claiming a G-test, so the reported p_value is not the intended log-likelihood test result. It can make significance decisions overly conservative and hide true experiment winners.

Additional Locations (1)

Fix in Cursor Fix in Web

required=True,
max_length=200,
help_text="The feature name to analyse (without the 'exp_' prefix)",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature length allows impossible trait keys

Low Severity

ExperimentResultsQuerySerializer permits feature up to 200 chars, but the endpoint builds exp_{feature}_variant and exp_{feature}_converted. Those keys can exceed Trait.trait_key max length, so valid-looking requests can never match stored traits and return misleading empty experiment data.

Fix in Cursor Fix in Web

for other in variants:
if other != v:
wins &= variant_samples[v] > variant_samples[other]
chance_to_win[v] = round(float(wins.mean()), 3)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monte Carlo scales quadratically with variant count

Medium Severity

_calculate_multi_variant_stats always generates 50000 beta samples per variant, then performs pairwise comparisons across all variants. With no cap on distinct variant values, runtime and memory grow quickly (O(V*50000) storage and O(V^2*50000) comparisons), so noisy or malformed experiment data can make this endpoint extremely slow or unstable.

Fix in Cursor Fix in Web

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 93.43434% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.23%. Comparing base (345eb99) to head (3b5eb90).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
api/app_analytics/experiments.py 90.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6702      +/-   ##
==========================================
+ Coverage   98.20%   98.23%   +0.02%     
==========================================
  Files        1298     1311      +13     
  Lines       47172    48480    +1308     
==========================================
+ Hits        46327    47623    +1296     
- Misses        845      857      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants