feat(skore/checks): Add baseline comparison checks (SKD009, SKD010) by GaetandeCast · Pull Request #2906 · probabl-ai/skore

GaetandeCast · 2026-05-13T12:55:31Z

Change description

Adds two new automated checks to EstimatorReport.checks.summarize (and via aggregation to CrossValidationReport / ComparisonReport):

SKD009 — Model worse than baseline: flags when test scores are not significantly better than a skrub.tabular_pipeline wrapped HistGradientBoostingClassifier / HistGradientBoostingRegressor baseline (same train/test data, same metric set). Vote per non-timing metric, strict majority triggers the issue.
SKD010 — Model slower than baseline: flags when fit_time_ is at least 2x the fit time of a skrub.tabular_pipeline wrapped fast linear baseline (LogisticRegression for classification, RidgeCV for regression), AND the strict-majority "not significantly better than baseline" vote holds on test metrics. A 0.05s absolute floor on the time gap avoids noise on very fast fits.

Both checks raise CheckNotApplicable for unsupported ml tasks (multioutput, clustering, unknown) and when train+test data is unavailable. They share a single _baseline_estimator_report(report, kind=...) factory with kinds "dummy" / "performance" / "fast" — also used by the existing SKD002 underfitting check.

Companion refactor of SKD001 / SKD002 / SKD003 to read MetricsSummaryDisplay.rows directly (the MetricsSummaryRow TypedDict) instead of parsing the summary DataFrame. Metrics are now aligned by an identity tuple (verbose_name, label, average, output) rather than relying on row-order between two summarize() calls, which also lets _TIMING_METRICS shrink to the two canonical Metric.verbose_name values.

Closes #2668.

Contribution checklist

Unit tests were added or updated (if necessary)
Documentation was added or updated (if necessary)
The code passes our style conventions (you can check this by running pre-commit on your code with `pre-commit run --all-files`)
All the tests pass (please test locally before pushing)
The documentation builds and renders properly (if it does, our bot will add a comment linking to a preview of the documentation to review it visually)
A new changelog entry was added to CHANGELOG.rst (if necessary; typically, if your change requires updating tests)
All the commits in the PR are signed (more information here)
The pull request title respects the Conventional Commits convention (more information here)

AI usage disclosure

AI tools were involved for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding

Made with Cursor

github-actions · 2026-05-13T13:20:07Z

Documentation preview @ d051986

github-actions · 2026-05-13T13:29:29Z

Coverage Report for skore/

File	Stmts	Miss	Branch	BrPart	Cover	Missing
skore/src/skore
__init__.py	39	2	2	0	94%	112–113
_config.py	58	3	12	1	94%	71, 118–119
exceptions.py	4	4	0	0	0%	4, 15, 19, 23
skore/src/skore/_plugins
__init__.py	12	0	0	0	100%
skore/src/skore/_plugins/hub
__init__.py	9	2	0	0	77%	15, 20
exception.py	2	0	0	0	100%
json.py	10	1	2	1	90%	16
skore/src/skore/_plugins/hub/artifact
__init__.py	0	0	0	0	100%
artifact.py	23	0	4	0	100%
serializer.py	27	0	2	0	100%
upload.py	26	0	4	0	100%
skore/src/skore/_plugins/hub/artifact/media
__init__.py	5	0	0	0	100%
data.py	20	0	0	0	100%
inspection.py	72	10	20	8	86%	46–49, 51, 53, 60, 62, 68, 107
media.py	10	0	0	0	100%
model.py	10	0	0	0	100%
performance.py	106	1	8	1	99%	43
skore/src/skore/_plugins/hub/artifact/pickle
__init__.py	2	0	0	0	100%
pickle.py	24	0	2	0	100%
skore/src/skore/_plugins/hub/authentication
__init__.py	0	0	0	0	100%
apikey.py	7	0	0	0	100%
login.py	28	4	4	2	85%	37, 42–43, 52
token.py	80	0	8	0	100%
uri.py	6	0	0	0	100%
skore/src/skore/_plugins/hub/client
__init__.py	0	0	0	0	100%
client.py	88	10	18	3	88%	140, 187–189, 191–192, 194, 196, 198, 230
skore/src/skore/_plugins/hub/metric
__init__.py	10	0	0	0	100%
accuracy.py	35	0	0	0	100%
brier_score.py	35	0	0	0	100%
log_loss.py	35	0	0	0	100%
metric.py	55	4	2	1	92%	38, 77–78, 84
precision.py	57	0	0	0	100%
r2.py	35	0	0	0	100%
recall.py	59	0	0	0	100%
rmse.py	35	0	0	0	100%
roc_auc.py	35	0	0	0	100%
timing.py	76	4	0	0	94%	45–46, 104–105
skore/src/skore/_plugins/hub/project
__init__.py	0	0	0	0	100%
project.py	138	6	26	5	95%	84, 109, 124, 323, 403, 433
skore/src/skore/_plugins/hub/report
__init__.py	3	0	0	0	100%
cross_validation_report.py	121	2	28	3	98%	224, 264
estimator_report.py	10	0	0	0	100%
report.py	60	0	6	0	100%
skore/src/skore/_plugins/local
__init__.py	2	0	0	0	100%
metadata.py	81	3	8	1	96%	29, 141–142
project.py	93	1	30	1	98%	238
storage.py	42	2	6	1	95%	45, 189
skore/src/skore/_plugins/mlflow
__init__.py	5	0	0	0	100%
project.py	209	15	54	5	92%	202, 233–234, 380, 382, 398, 400–405, 407–409
reports.py	155	6	34	4	96%	129, 170, 207–208, 270, 283
skore/src/skore/_project
__init__.py	0	0	0	0	100%
_summary.py	80	1	38	3	98%	121
_widget.py	191	0	44	2	100%
dependencies.py	19	0	6	0	100%
git.py	25	0	4	0	100%
login.py	17	3	6	2	82%	62, 71–72
plugin.py	12	0	2	0	100%
project.py	56	2	16	3	96%	132, 141
types.py	3	0	0	0	100%
skore/src/skore/_sklearn
__init__.py	8	0	0	0	100%
_base.py	54	1	10	0	98%	44
compare.py	5	0	0	0	100%
evaluate.py	43	0	24	0	100%
feature_names.py	26	0	12	0	100%
find_ml_task.py	61	0	46	2	100%
metrics.py	330	0	74	0	100%
types.py	19	1	0	0	94%	31
skore/src/skore/_sklearn/_checks
__init__.py	3	0	0	0	100%
_utils.py	58	3	20	3	94%	75, 175, 183
accessor.py	33	1	14	0	96%	17
base.py	76	5	24	3	93%	128–129, 169, 257–258
model_checks.py	221	5	56	5	97%	359, 367, 403, 407, 516
skore/src/skore/_sklearn/_comparison
__init__.py	9	0	0	0	100%
inspection_accessor.py	27	1	2	0	96%	347
metrics_accessor.py	125	4	18	4	96%	259–260, 329, 1164
report.py	160	5	68	0	96%	577, 583, 641–643
skore/src/skore/_sklearn/_cross_validation
__init__.py	11	0	0	0	100%
data_accessor.py	36	2	12	2	94%	48, 74
inspection_accessor.py	27	1	2	0	96%	319
metrics_accessor.py	123	3	18	3	97%	214–215, 1129
report.py	202	10	48	6	95%	73, 78, 83, 324, 598, 642, 648, 734–736
skore/src/skore/_sklearn/_estimator
__init__.py	11	0	0	0	100%
data_accessor.py	48	2	20	1	95%	61, 178
inspection_accessor.py	37	1	8	2	97%	278
metrics_accessor.py	127	0	22	0	100%
report.py	329	14	92	10	95%	60, 74, 295, 368, 453, 710, 799, 818, 820, 826–827, 914–916
skore/src/skore/_sklearn/_plot
__init__.py	3	0	0	0	100%
base.py	61	2	14	1	96%	61–62
utils.py	145	3	66	3	97%	254–255, 428
skore/src/skore/_sklearn/_plot/data
__init__.py	2	0	0	0	100%
table_report.py	177	1	60	1	99%	670
skore/src/skore/_sklearn/_plot/inspection
__init__.py	0	0	0	0	100%
coefficients.py	181	0	88	1	100%
impurity_decrease.py	103	2	34	3	98%	423, 467
permutation_importance.py	198	1	90	1	99%	585
utils.py	32	0	10	0	100%
skore/src/skore/_sklearn/_plot/metrics
__init__.py	6	0	0	0	100%
confusion_matrix.py	198	0	66	2	100%
metrics_summary_display.py	152	1	74	2	99%	290
precision_recall_curve.py	113	0	32	1	100%
prediction_error.py	166	0	54	2	100%
roc_curve.py	119	0	34	2	100%
skore/src/skore/_sklearn/train_test_split
__init__.py	2	0	0	0	100%
train_test_split.py	71	0	34	2	100%
skore/src/skore/_sklearn/train_test_split/warning
__init__.py	8	0	0	0	100%
high_class_imbalance_too_few_examples_warning.py	19	1	6	1	94%	83
high_class_imbalance_warning.py	20	0	6	0	100%
random_state_unset_warning.py	10	0	2	0	100%
shuffle_true_warning.py	9	0	2	0	100%
stratify_is_set_warning.py	10	0	2	0	100%
time_based_column_warning.py	21	0	4	0	100%
train_test_split_warning.py	3	0	0	0	100%
skore/src/skore/_utils
__init__.py	6	2	0	0	66%	8, 13
_accessor.py	106	28	30	13	73%	12–13, 15, 36, 61–65, 68, 70–71, 76, 81, 83, 85, 92–94, 120–121, 123–124, 126, 132, 164, 218, 238
_cache.py	37	0	2	1	100%
_cache_key.py	35	5	22	5	85%	22, 24, 51, 59, 68
_callable_name.py	9	0	4	0	100%
_dataframe.py	43	4	18	4	90%	27, 46, 48, 63
_environment.py	33	2	10	3	93%	46, 49
_fixes.py	8	0	2	0	100%
_index.py	5	0	2	0	100%
_jupyter.py	8	2	0	0	75%	13–14
_measure_time.py	10	0	0	0	100%
_parallel.py	17	0	0	0	100%
_patch.py	21	12	8	8	42%	30, 35–39, 42–43, 46–47, 58, 60
_progress_bar.py	42	4	4	0	90%	53–54, 64–65
_show_versions.py	38	0	12	0	100%
_skrub.py	37	0	4	0	100%
_testing.py	128	14	12	2	89%	24, 33, 71–72, 94, 193, 202, 213–218, 220
skore/src/skore/_utils/repr
__init__.py	2	0	0	0	100%
base.py	54	0	4	0	100%
data.py	128	0	30	1	100%
html_repr.py	40	0	0	0	100%
rich_repr.py	80	0	22	3	100%
utils.py	11	0	2	0	100%
TOTAL	7393	229	1852	150	96%

Tests	Skipped	Failures	Errors	Time
2585	6 💤	0 ❌	0 🔥	8m 56s ⏱️

auguste-probabl

Some nitpicks but this is pretty much good to go!

GaetandeCast added 2 commits May 13, 2026 14:52

feat(skore/checks): Add baseline comparison checks

664b39a

fix doctest & sphinx

061dd51

GaetandeCast added 4 commits May 13, 2026 16:07

swallow skrub warning

3bafbae

Merge branch 'main' into cursor/474e10b7

a3da135

fix doctest

3d50073

iter

137f51b

GaetandeCast marked this pull request as ready for review May 20, 2026 13:21

GaetandeCast requested review from auguste-probabl and jeromedockes May 20, 2026 13:35

auguste-probabl previously approved these changes May 21, 2026

View reviewed changes

Comment thread sphinx/user_guide/automated_checks.rst

Comment thread skore/src/skore/_sklearn/_checks/model_checks.py Outdated

Comment thread skore/src/skore/_sklearn/_checks/_utils.py Outdated

GaetandeCast added 2 commits May 22, 2026 10:16

suggestions from review

52aa1f9

Merge branch 'main' into cursor/474e10b7

cad5b69

GaetandeCast dismissed auguste-probabl’s stale review via cad5b69 May 22, 2026 08:23

GaetandeCast requested a review from auguste-probabl May 22, 2026 08:23

GaetandeCast added 2 commits May 22, 2026 10:27

fix

9430a32

fix sphinx

d051986

jeromedockes approved these changes May 22, 2026

View reviewed changes

jeromedockes enabled auto-merge May 22, 2026 09:11

jeromedockes disabled auto-merge May 22, 2026 09:11

jeromedockes enabled auto-merge May 22, 2026 09:11

jeromedockes added this pull request to the merge queue May 22, 2026

Merged via the queue into probabl-ai:main with commit 4f3dddd May 22, 2026
42 of 63 checks passed

jeromedockes temporarily deployed to dev May 22, 2026 09:31 — with GitHub Actions Inactive

GaetandeCast deleted the cursor/474e10b7 branch May 22, 2026 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skore/checks): Add baseline comparison checks (SKD009, SKD010)#2906

feat(skore/checks): Add baseline comparison checks (SKD009, SKD010)#2906
jeromedockes merged 10 commits into
probabl-ai:mainfrom
GaetandeCast:cursor/474e10b7

GaetandeCast commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

auguste-probabl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GaetandeCast commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change description

Contribution checklist

AI usage disclosure

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

auguste-probabl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GaetandeCast commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading