Skip to content

Fix macro average calculation in geometric_mean_score for binary classification#1164

Open
shubhrai23 wants to merge 2 commits into
scikit-learn-contrib:masterfrom
shubhrai23:fix-geometric-mean-binary
Open

Fix macro average calculation in geometric_mean_score for binary classification#1164
shubhrai23 wants to merge 2 commits into
scikit-learn-contrib:masterfrom
shubhrai23:fix-geometric-mean-binary

Conversation

@shubhrai23

Copy link
Copy Markdown

Geometric_mean_score with average='macro' was incorrect for binary classification (it was calculating the arithmetic mean of scores instead of the geometric mean of sensitivity and specificity).

Added a check to handle binary cases correctly while preserving multiclass behavior.

Added a regression test.

@immu4989

immu4989 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for tackling #1096 — the diagnosis (binary macro returning sqrt(mean_tpr * mean_tnr) instead of mean(sqrt(tpr*tnr))) is correct. But I think the fix is incomplete, and I wanted to flag it constructively before it lands.

The bug isn't binary-specific — multiclass macro is wrong too. This PR only switches to the per-class computation when is_binary is true, so the multiclass path still returns the old (incorrect) value. On the reporter's own multiclass fixture:

y_true = [0, 1, 2, 0, 1, 2]; y_pred = [0, 2, 1, 0, 1, 2]
geometric_mean_score(y_true, y_pred, average=None)     # [1. , 0.612, 0.612]
np.mean(...)                                            # 0.7416  <- expected
geometric_mean_score(y_true, y_pred, average="macro")  # 0.7454  <- still wrong with this PR

I verified this by checking out the branch locally — after the patch, macro still ≠ np.mean(average=None) for multiclass.

Related concern: the existing parametrized tests (test_geometric_mean_average, test_geometric_mean_score_prediction) still assert the old macro/weighted numbers, and they pass here only because the multiclass path is unchanged. So those expected values are themselves part of the bug and should have been updated — the fact that they still pass is a signal the multiclass case wasn't addressed.

The root cause is the same for binary and multiclass (sqrt of the averaged sen/spe vs. the average of per-class G-means), so it can be fixed unconditionally without the is_binary branch. PR #1173 takes that approach — computes per-class G-means and aggregates for both macro and weighted, updates the now-incorrect expected values, and adds an invariant test that macro == mean(per_class). It might be cleanest to consolidate on that one, or to drop the is_binary gate here and refresh the multiclass expectations.

Either way, removing the special-casing and fixing the multiclass expected values would make this complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants