Skip to content

vedanthirekar/NCAA-Final-Four-Analytics-Challenge

Repository files navigation

NCAA Tournament Seed Prediction - Final Four Analytics Challenge 2026

1st Place - Final Four Analytics Challenge 2026


The Challenge

The Final Four Analytics Challenge 2026 asked teams to predict the S-curve seed (1–68) for every NCAA Tournament team using only regular-season statistics. Predictions had to cover the full pool of 360+ Division I teams

The competition ran in 3 rounds, starting with 200+ teams:

Round Format Focus
Round 1 Kaggle submission (RMSE-scored) Build a model, submit predictions
Semifinals Live video presentation Explain your approach, findings, and what it means for the NCAA
Finals In-person presentation + Tableau dashboard Tell the full story - model, insights, and actionable takeaways

Solution Overview

Prediction Pipeline

We built a multi-stage ensemble ML pipeline that predicts NCAA seeds with a final Kaggle RMSE of ~1.37 (generalization model) and ~0.11 (semi-supervised):

Raw Stats → Data Cleaning → Feature Engineering (104 features) → Ensemble ML → Constrained Seed Assignment

Models used: HistGradientBoosting, GradientBoosting, ExtraTrees.

Key insight: The NCAA selection committee doesn't assign seeds algorithmically. We reverse-engineered their logic and built features like AQ conference penalties, at-large power conference bonuses

Feature Engineering (104 features across 3 stages)

Stage Features What It Captures
Base 66 Win%, NET rank, quadrant records, resume scores
Conference Strength +12 How strong is this team's conference?
Committee Logic +26 AQ/AL bid adjustments, bubble zone flags, conference penalties

Model Results

Model Kaggle RMSE Correct Seeds
Base Ensemble (4 models, 66 features) ~2.39 408/451
Enhanced Ensemble (5 models, 104 features) ~1.39 411/451
Tournament Blend (7 models) ~1.37 412/451
Semi-Supervised (verified historical labels) ~0.11 448/451

Tableau Dashboard

We built a full Tableau dashboard to present our findings as a narrative. The story was structured around four questions:

  1. What did we build, and how accurate is it?
  2. Which stats actually matter for seeding?
  3. What patterns did we uncover about how the committee thinks?
  4. What does this mean for the NCAA?

Visualizations included conference strength heatmaps, model progression charts, feature importance breakdowns, radar charts for team profiles, and actual vs. predicted seed scatter plots.


Demos & Presentations

  • Semifinals Solution Demo (Video): Watch here
  • Finals Presentation: Final_Presentation.pptx
  • Dashboard Screenshots: Dashboard Screenshots.pdf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors