ENH Batch `GeneralizingEstimator`'s estimator and scoring by mathias-sm · Pull Request #13909 · mne-tools/mne-python

mathias-sm · 2026-05-22T16:18:55Z

Reference issue (if any)

First proposal to implement batching inGeneralizingEstimator as mentioned in #13906

What does this implement/fix?

When possible, this turn the nested loop over time of GeneralizingEstimator into one loop to fit, and then a single batched transform+score by reshaping. When not possible, this conservatively falls back to the original implementation.

In implementing this, I realized that the biggest gains are from the scoring part rather than the transform part, and unfortunately scoring cannot be batched straightforwardly "from the outside". Therefore, commits 2-6 (see below for the overall logic behind the commits) succesively refactor the scoring loop and add batch scoring functions for some candidates: accuracy, balanced_accuracy and roc_auc. Anything that is not clearly identified ends up being scored by the original nested loops implementation.

It's a bit more verbose than I was hopping for, sorry about this. This also forced me to learn more about __qualname__ and things like getattr(scoring, "_score_func", None) than I was hopping for 😅

Happy to take comments and iterate, I hope this is a good fit!

Additional information

I broke the problem into changes that I considered meaningful to review in isolation, with the idea that maybe only a subset would be considered useful to merge. The logic is as follows.

b2cbd649f Batch estimator and nothing else (the longest / most annoying to review)
3cef4a52a If score is accuracy, also batch that, if not rever to nested loops
7a66f157c Refactor the scoring logic to be able to add batched scorer when desired
32028955b Adds batched balanced_accuracy
7a30425fc Identifies when scoring=None can be interpreted as "accuracy" and batched
b23e72dfb Adds batched roc_auc (the only one that adds an import and touches tests)

I also used a little script that compares the result and timing _gl_score for each commit against the starting point: each step until 5 (included) give the same result exactly; 6 implements roc_auc differently from sklearn and is allclose but not identical. My script is here and its output is here.

Highlights: for X.shape of (100, 272, 200) (trials, sensors, timepoints), no cross-validation, on my laptop: accuracy goes from 7893.3ms to 182.1ms, balanced accuracy goes from 14055.4ms to 186.4ms, roc_auc from 18103.3ms to 282.7ms. To test unimplemented scoring functions I also measured f1: it goes from 21571.0ms to 17009.8ms, the gain being from the batching of the estimator but not the scoring.

Because of the roc_auc change, I made two changes to existing tests: in mne/decoding/tests/test_search_light.py I replaced two assert_array_equal with assert_allclose. I also timed all tests that use GeneralizingEstimator and compared the duration against the starting point ; there are virtually no gains there (I presume because by design the tests are short and there is a lot of overhead), still e.g. test_verbose_arg[1-False] improved like 20%...

Note on AI use

I broke the problem into the steps described above; then each step's first draft was obtained by prompting Opus 4.7 through Claude Code. I refined the proposal, added documentation or changed the code where things were not immediately obvious to me, etc. until I was satisfied. I similarly prompted it to generate the first draft of the script to do check each steps correctness and measure speed difference (not commited, reviewed and and updated to my needs as the project was going along).

`_gl_score` invoked the scorer's response method (`predict` / `predict_proba` / `decision_function`) `n_estimators * n_slices` times per fold. Stack X across slices and call the response method once per estimator, then apply the metric per slice on the resulting predictions. The batching saves on overhead and best utilises vectorized operations. Scorers without a recognised `_response_method` (e.g. `scoring=None` or custom callables) fall back to the original nested loop.

When the scorer is `accuracy_score` with default kwargs, 1d-output (but can be multi-class), and `response_method == "predict"`, replace the per-slice `accuracy_score(y, y_pred[:, jj])` calls with one numpy reduction per estimator: `(y_pred == y[:, None]).mean(axis=0)`. Other scorers, multi-output `y`, etc. keep nested-loop behavior.

Replace the `fast_accuracy` flag with a `batched_score` which is either a callable if we recognized the scorer, or otherwise set to `None`. The scoring loop then branches on `batched_score`: call `batched_score(y_pred)` if `batched_score` is set, otherwise fall back to the nested loop.

Adds `balanced_accuracy_score` to the `batched_score` dispatch by manually estimating accuracy per class and then averaging.

When `scoring=None`, sklearn wraps `estimator.score` in a scorer with no `_score_func` so previous code did not batch. But for stock classifiers, this is just `accuracy_score(y, predict(X))`: we now detect this and promote `scoring` to the named "accuracy" scorer which uses the existing batched path.

Add `roc_auc_score` to the `batched_score` dispatch via the Mann-Whitney U identity with average-rank tie correction (`scipy.stats.rankdata(..., method="average", axis=0)` ranks all slices at once). Binary classification only: multi-class, or non-default kwargs, revert to nested loops. The rank identity implemented when batching gives the same AUC as sklearn within floating point precision, but implements it with different operations. A bit-exact match would need a loop, defeating the batching.

for more information, see https://pre-commit.ci

mathias-sm · 2026-05-22T16:28:33Z

My plan for nice, self-contained commits thrown off by pre-commit.ci and my inability to spot typos :(

larsoner · 2026-05-22T17:10:19Z

We squash+merge in the end so don't worry about it!

No need to run with [circle full], just make some small modification to the examples that use GeneralizingEstimator and CircleCI will run those examples. Can be something useful like a wording improvement or something that doesn't matter like a little line-wrapping / whitespace change, as long as it's in the git diff CircleCI will run it.

mathias-sm · 2026-05-22T18:01:21Z

I tried to remove [circle full] from the PR's title but it still seems to be running a great many tests. Not sure how I can downgrade that now...

…gEstimator

mathias-sm · 2026-05-25T17:05:32Z

OK, I think the current version is ready for review and feedback at this point.
Looking forward,

larsoner

Changes look reasonable to me, but a bit complex. It looks unlikely that the modified tests touch all branches in the new code, and given the complexity, it would be great to hit all of them (or at least the vast majority).

Could you update or add some test so that these branches get hit (scoring modes, shapes, etc.)? Try to make it take as little time as possible while still testing the functionality, our test suite already takes ages to run so we want to have any new tests be as quick as possible 😅

My code partially re-implemented logic that was present in `_gl_transform` already. This new version, as much as possible, simply reuses it. This cleans up the scoring's logic and avoid duplication.

So far, this re-write implemented everything in the body of `_gl_score`, but this turned out to make reading the overall logic harder. Abstract away (i) defaulting to "accuracy" for default args, (ii) extracting response method and ability to batch, and (iii) creating vectorized scoring functions

In reusing `_gl_transform` in the new batched `_gl_score` I ended up with a significant slowdown, which it turns out was due to a transpose+reshape performed many times instead of once

mathias-sm · 2026-05-26T23:57:44Z

Adding tests made me realize there were a couple of places where refactoring my changes wouldn't hurt. I'll let this new version rest again a little bit before asking for feedback.

larsoner

Just a couple of minor suggestions, otherwise LGTM +1 for merge

Adding tests made me realize there were a couple of places where refactoring my changes wouldn't hurt

I find the code more readable now 👍 And I think it's well commented enough that the next person to dig through could fix / improve

larsoner · 2026-05-27T13:28:13Z

... if you push and you're done working on it, feel free to mark "ready for review" and ping me to start CIs, and I'll mark for merge-when-green

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

…gEstimator

mathias-sm and others added 8 commits May 22, 2026 00:59

ENH: Vectorise balanced_accuracy in GeneralizingEstimator.score

3202895

Adds `balanced_accuracy_score` to the `batched_score` dispatch by manually estimating accuracy per class and then averaging.

[pre-commit.ci] auto fixes from pre-commit.com hooks

aac4a90

for more information, see https://pre-commit.ci

Fix typo: computaion ==> computation...

df1cfdd

mathias-sm changed the title ~~ENH Batch GeneralizingEstimator's estimator and scoring [circle full]~~ ENH Batch GeneralizingEstimator's estimator and scoring May 22, 2026

Adds entry in doc/changes/dev/ + names.inc

d7faec4

mathias-sm added 5 commits May 25, 2026 13:35

Clarify comments + harmonize var names

b1830f2

Merge branch 'mne-tools:main' into enh-issue-13906-BatchedGeneralizin…

d3f94ec

…gEstimator

Match comment and docstring

f059622

Minimal comment change to re-trigger tests

86f9b3b

Merge branch 'main' into enh-issue-13906-BatchedGeneralizingEstimator

c475df7

mathias-sm marked this pull request as ready for review May 25, 2026 17:04

mathias-sm requested review from agramfort, drammock and larsoner as code owners May 25, 2026 17:04

larsoner reviewed May 26, 2026

View reviewed changes

mathias-sm added 6 commits May 27, 2026 00:54

Reuse _gl_transform where possible

e8341d0

My code partially re-implemented logic that was present in `_gl_transform` already. This new version, as much as possible, simply reuses it. This cleans up the scoring's logic and avoid duplication.

transpose + reshape once only in _gl_transform

e775c12

In reusing `_gl_transform` in the new batched `_gl_score` I ended up with a significant slowdown, which it turns out was due to a transpose+reshape performed many times instead of once

Variable initialization not used anymore

bd87068

Fix docstrings in _gl_score and _gl_transform

ed3ada8

Adds tests for the batched/vectorized _gl_score

98003f6

mathias-sm marked this pull request as draft May 26, 2026 23:57

Merge branch 'main' into enh-issue-13906-BatchedGeneralizingEstimator

5df991b

larsoner reviewed May 27, 2026

View reviewed changes

Comment thread mne/decoding/tests/test_search_light.py Outdated

Comment thread mne/decoding/search_light.py Outdated

Comment thread mne/decoding/search_light.py Outdated

mathias-sm and others added 5 commits May 28, 2026 10:39

More transparent reshaping syntax

321c05d

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

Use standard syntax for optional dims in docstring

7e0fa3a

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

Fix typo / syntax error in new reshaping syntax

a17fb0e

Uses functools.wraps instead of manual attr copy

5ca53eb

Merge branch 'mne-tools:main' into enh-issue-13906-BatchedGeneralizin…

82eb284

…gEstimator

mathias-sm marked this pull request as ready for review May 28, 2026 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH Batch `GeneralizingEstimator`'s estimator and scoring#13909

ENH Batch `GeneralizingEstimator`'s estimator and scoring#13909
mathias-sm wants to merge 26 commits into
mne-tools:mainfrom
mathias-sm:enh-issue-13906-BatchedGeneralizingEstimator

mathias-sm commented May 22, 2026

Uh oh!

mathias-sm commented May 22, 2026 •

edited

Loading

Uh oh!

larsoner commented May 22, 2026

Uh oh!

mathias-sm commented May 22, 2026

Uh oh!

mathias-sm commented May 25, 2026

Uh oh!

larsoner left a comment

Uh oh!

mathias-sm commented May 26, 2026

Uh oh!

larsoner left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

larsoner commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mathias-sm commented May 22, 2026

Reference issue (if any)

What does this implement/fix?

Additional information

Note on AI use

Uh oh!

mathias-sm commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsoner commented May 22, 2026

Uh oh!

mathias-sm commented May 22, 2026

Uh oh!

mathias-sm commented May 25, 2026

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

mathias-sm commented May 26, 2026

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

larsoner commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mathias-sm commented May 22, 2026 •

edited

Loading