Skip to content

feat(skore/checks): Add baseline comparison checks (SKD009, SKD010)#2906

Merged
jeromedockes merged 10 commits into
probabl-ai:mainfrom
GaetandeCast:cursor/474e10b7
May 22, 2026
Merged

feat(skore/checks): Add baseline comparison checks (SKD009, SKD010)#2906
jeromedockes merged 10 commits into
probabl-ai:mainfrom
GaetandeCast:cursor/474e10b7

Conversation

@GaetandeCast
Copy link
Copy Markdown
Collaborator

@GaetandeCast GaetandeCast commented May 13, 2026

Change description

Adds two new automated checks to EstimatorReport.checks.summarize (and via aggregation to CrossValidationReport / ComparisonReport):

  • SKD009 — Model worse than baseline: flags when test scores are not significantly better than a skrub.tabular_pipeline wrapped HistGradientBoostingClassifier / HistGradientBoostingRegressor baseline (same train/test data, same metric set). Vote per non-timing metric, strict majority triggers the issue.
  • SKD010 — Model slower than baseline: flags when fit_time_ is at least 2x the fit time of a skrub.tabular_pipeline wrapped fast linear baseline (LogisticRegression for classification, RidgeCV for regression), AND the strict-majority "not significantly better than baseline" vote holds on test metrics. A 0.05s absolute floor on the time gap avoids noise on very fast fits.

Both checks raise CheckNotApplicable for unsupported ml tasks (multioutput, clustering, unknown) and when train+test data is unavailable. They share a single _baseline_estimator_report(report, kind=...) factory with kinds "dummy" / "performance" / "fast" — also used by the existing SKD002 underfitting check.

Companion refactor of SKD001 / SKD002 / SKD003 to read MetricsSummaryDisplay.rows directly (the MetricsSummaryRow TypedDict) instead of parsing the summary DataFrame. Metrics are now aligned by an identity tuple (verbose_name, label, average, output) rather than relying on row-order between two summarize() calls, which also lets _TIMING_METRICS shrink to the two canonical Metric.verbose_name values.

Closes #2668.

Contribution checklist

  • Unit tests were added or updated (if necessary)
  • Documentation was added or updated (if necessary)
  • The code passes our style conventions (you can check this by running pre-commit on your code with `pre-commit run --all-files`)
  • All the tests pass (please test locally before pushing)
  • The documentation builds and renders properly (if it does, our bot will add a comment linking to a preview of the documentation to review it visually)
  • A new changelog entry was added to CHANGELOG.rst (if necessary; typically, if your change requires updating tests)
  • All the commits in the PR are signed (more information here)
  • The pull request title respects the Conventional Commits convention (more information here)

AI usage disclosure

AI tools were involved for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Made with Cursor

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Documentation preview @ d051986

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Coverage

Coverage Report for skore/
FileStmtsMissBranchBrPartCoverMissing
skore/src/skore
   __init__.py3922094%112–113
   _config.py58312194%71, 118–119
   exceptions.py44000%4, 15, 19, 23
skore/src/skore/_plugins
   __init__.py12000100% 
skore/src/skore/_plugins/hub
   __init__.py920077%15, 20
   exception.py2000100% 
   json.py1012190%16
skore/src/skore/_plugins/hub/artifact
   __init__.py0000100% 
   artifact.py23040100% 
   serializer.py27020100% 
   upload.py26040100% 
skore/src/skore/_plugins/hub/artifact/media
   __init__.py5000100% 
   data.py20000100% 
   inspection.py721020886%46–49, 51, 53, 60, 62, 68, 107
   media.py10000100% 
   model.py10000100% 
   performance.py10618199%43
skore/src/skore/_plugins/hub/artifact/pickle
   __init__.py2000100% 
   pickle.py24020100% 
skore/src/skore/_plugins/hub/authentication
   __init__.py0000100% 
   apikey.py7000100% 
   login.py2844285%37, 42–43, 52
   token.py80080100% 
   uri.py6000100% 
skore/src/skore/_plugins/hub/client
   __init__.py0000100% 
   client.py881018388%140, 187–189, 191–192, 194, 196, 198, 230
skore/src/skore/_plugins/hub/metric
   __init__.py10000100% 
   accuracy.py35000100% 
   brier_score.py35000100% 
   log_loss.py35000100% 
   metric.py5542192%38, 77–78, 84
   precision.py57000100% 
   r2.py35000100% 
   recall.py59000100% 
   rmse.py35000100% 
   roc_auc.py35000100% 
   timing.py7640094%45–46, 104–105
skore/src/skore/_plugins/hub/project
   __init__.py0000100% 
   project.py138626595%84, 109, 124, 323, 403, 433
skore/src/skore/_plugins/hub/report
   __init__.py3000100% 
   cross_validation_report.py121228398%224, 264
   estimator_report.py10000100% 
   report.py60060100% 
skore/src/skore/_plugins/local
   __init__.py2000100% 
   metadata.py8138196%29, 141–142
   project.py93130198%238
   storage.py4226195%45, 189
skore/src/skore/_plugins/mlflow
   __init__.py5000100% 
   project.py2091554592%202, 233–234, 380, 382, 398, 400–405, 407–409
   reports.py155634496%129, 170, 207–208, 270, 283
skore/src/skore/_project
   __init__.py0000100% 
   _summary.py80138398%121
   _widget.py1910442100% 
   dependencies.py19060100% 
   git.py25040100% 
   login.py1736282%62, 71–72
   plugin.py12020100% 
   project.py56216396%132, 141
   types.py3000100% 
skore/src/skore/_sklearn
   __init__.py8000100% 
   _base.py54110098%44
   compare.py5000100% 
   evaluate.py430240100% 
   feature_names.py260120100% 
   find_ml_task.py610462100% 
   metrics.py3300740100% 
   types.py1910094%31
skore/src/skore/_sklearn/_checks
   __init__.py3000100% 
   _utils.py58320394%75, 175, 183
   accessor.py33114096%17
   base.py76524393%128–129, 169, 257–258
   model_checks.py221556597%359, 367, 403, 407, 516
skore/src/skore/_sklearn/_comparison
   __init__.py9000100% 
   inspection_accessor.py2712096%347
   metrics_accessor.py125418496%259–260, 329, 1164
   report.py160568096%577, 583, 641–643
skore/src/skore/_sklearn/_cross_validation
   __init__.py11000100% 
   data_accessor.py36212294%48, 74
   inspection_accessor.py2712096%319
   metrics_accessor.py123318397%214–215, 1129
   report.py2021048695%73, 78, 83, 324, 598, 642, 648, 734–736
skore/src/skore/_sklearn/_estimator
   __init__.py11000100% 
   data_accessor.py48220195%61, 178
   inspection_accessor.py3718297%278
   metrics_accessor.py1270220100% 
   report.py32914921095%60, 74, 295, 368, 453, 710, 799, 818, 820, 826–827, 914–916
skore/src/skore/_sklearn/_plot
   __init__.py3000100% 
   base.py61214196%61–62
   utils.py145366397%254–255, 428
skore/src/skore/_sklearn/_plot/data
   __init__.py2000100% 
   table_report.py177160199%670
skore/src/skore/_sklearn/_plot/inspection
   __init__.py0000100% 
   coefficients.py1810881100% 
   impurity_decrease.py103234398%423, 467
   permutation_importance.py198190199%585
   utils.py320100100% 
skore/src/skore/_sklearn/_plot/metrics
   __init__.py6000100% 
   confusion_matrix.py1980662100% 
   metrics_summary_display.py152174299%290
   precision_recall_curve.py1130321100% 
   prediction_error.py1660542100% 
   roc_curve.py1190342100% 
skore/src/skore/_sklearn/train_test_split
   __init__.py2000100% 
   train_test_split.py710342100% 
skore/src/skore/_sklearn/train_test_split/warning
   __init__.py8000100% 
   high_class_imbalance_too_few_examples_warning.py1916194%83
   high_class_imbalance_warning.py20060100% 
   random_state_unset_warning.py10020100% 
   shuffle_true_warning.py9020100% 
   stratify_is_set_warning.py10020100% 
   time_based_column_warning.py21040100% 
   train_test_split_warning.py3000100% 
skore/src/skore/_utils
   __init__.py620066%8, 13
   _accessor.py10628301373%12–13, 15, 36, 61–65, 68, 70–71, 76, 81, 83, 85, 92–94, 120–121, 123–124, 126, 132, 164, 218, 238
   _cache.py37021100% 
   _cache_key.py35522585%22, 24, 51, 59, 68
   _callable_name.py9040100% 
   _dataframe.py43418490%27, 46, 48, 63
   _environment.py33210393%46, 49
   _fixes.py8020100% 
   _index.py5020100% 
   _jupyter.py820075%13–14
   _measure_time.py10000100% 
   _parallel.py17000100% 
   _patch.py21128842%30, 35–39, 42–43, 46–47, 58, 60
   _progress_bar.py4244090%53–54, 64–65
   _show_versions.py380120100% 
   _skrub.py37040100% 
   _testing.py1281412289%24, 33, 71–72, 94, 193, 202, 213–218, 220
skore/src/skore/_utils/repr
   __init__.py2000100% 
   base.py54040100% 
   data.py1280301100% 
   html_repr.py40000100% 
   rich_repr.py800223100% 
   utils.py11020100% 
TOTAL7393229185215096% 

Tests Skipped Failures Errors Time
2585 6 💤 0 ❌ 0 🔥 8m 56s ⏱️

@GaetandeCast GaetandeCast marked this pull request as ready for review May 20, 2026 13:21
Copy link
Copy Markdown
Collaborator

@auguste-probabl auguste-probabl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks but this is pretty much good to go!

Comment thread sphinx/user_guide/automated_checks.rst
Comment thread skore/src/skore/_sklearn/_checks/model_checks.py Outdated
Comment thread skore/src/skore/_sklearn/_checks/_utils.py Outdated
@jeromedockes jeromedockes enabled auto-merge May 22, 2026 09:11
@jeromedockes jeromedockes disabled auto-merge May 22, 2026 09:11
@jeromedockes jeromedockes enabled auto-merge May 22, 2026 09:11
@jeromedockes jeromedockes added this pull request to the merge queue May 22, 2026
Merged via the queue into probabl-ai:main with commit 4f3dddd May 22, 2026
42 of 63 checks passed
@GaetandeCast GaetandeCast deleted the cursor/474e10b7 branch May 22, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add model checks with baseline comparisons

3 participants