fix(dif): avoid double min-max normalization in fit (closes #546)#675
Merged
yzhao062 merged 1 commit intoMay 12, 2026
Merged
Conversation
) `DIF.fit` min-max-scaled `X` then called `self.decision_function(X)` on the already-scaled `X`. `decision_function` transforms the input again, so `self.decision_scores_` was computed on a re-scaled (effectively in-range constant) view of the data and disagreed with `decision_function(X_train)` for the same `X`. Fix: keep the raw `X` around as `X_raw` and call `self.decision_function(X_raw)` at the end of `fit`, with a comment explaining why so a future refactor can't re-introduce the bug. Regression test (`test_train_scores_match_decision_function`) asserts `decision_scores_ == decision_function(X_train)` on the fitted detector. AI disclosure: drafted with assistance from Claude (Anthropic). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
yzhao062
added a commit
that referenced
this pull request
May 13, 2026
Bump version to 3.5.1 and add CHANGES.txt entry summarizing the patch release: the jbbqqf bundle (#673/#674/#675/#676/#677), the tuanaiseo GAAL torch-optional fix (#660) with our follow-up across mo_gaal/so_gaal/so_gaal_new, and the NSF POSE Phase II funding acknowledgment. Issues closed: #502 #546 #635 #638 #640. No new public API and no breaking changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DIF.fitmin-max-scaledXand then calledself.decision_function(X)on the already-scaled X.
decision_functiontransforms the inputagain, so
decision_scores_was computed on data that had been pushedthrough
MinMaxScaler.transformtwice. The values therefore disagreedwith
decision_function(X_train)for the very same input — aviolation of the BaseDetector contract that several downstream
utilities rely on (e.g.
predict_proba's'unify'mode).Fixes #546 — DIF model: duplicate normalization
Context
The reporter traced it to three lines:
pyod/models/dif.pyline 178 —X = self.minmax_scaler.transform(X)insidefitfitthen callsself.decision_function(X)on that scaledXdecision_function(line 245) re-appliesself.minmax_scaler.transform(X)That second transform shifts an already
[0, 1]-mapped array by anothermin-max scaling, which collapses the dynamic range and produces
ensemble scores that no other code path reproduces.
The minimal-blast-radius fix is to retain the raw input as
X_rawandhand it to
decision_function, which does its own (single) transform.Changes
pyod/models/dif.pyX_raw = Xbefore the in-place reassignment to the scaledview, with a comment explaining why so a future refactor cannot
silently re-introduce the bug.
fit:self.decision_scores_ = self.decision_function(X_raw)(was
self.decision_function(X)whereXhad already been scaled).pyod/test/test_dif.pytest_train_scores_match_decision_function— assertsdecision_scores_ ≈ decision_function(X_train)for the fitteddetector. Fails on
master, passes on this branch.Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
pytest pyod/test/test_dif.py -q-> 16 passed (15 prior + 1 newregression test).
master, fails with a~10⁻¹ scale mismatch (verified by
git stash-ing the dif.py change).Edge cases tested
decision_scores_agrees withdecision_functionDIF().fit(X).decision_function(X)1e-5rel toltest_train_scores_match_decision_functionDIF().fit(X_train).decision_function(X_test)test_prediction_scorespredictthresholding still worksDIF().fit(X_train).predict(X_test)test_prediction_labelsRisk / blast radius
The change only affects which
Xis fed todecision_functionat thetail of
fit. The transform-once-then-score path is exactly the onealready exercised by all
decision_function/predictcallers on testdata, so no behavior change there.
decision_scores_itself now liveson the same scale as
decision_function(X_train), which is thedocumented invariant of
BaseDetector.predict_proba(method='unify')becomes meaningful for DIF (it relied on that invariant); other modes
were unaffected because they renormalize.
Release note
PR template checklist (per
PULL_REQUEST_TEMPLATE.md)test_train_scores_match_decision_function).pytest pyod/test/test_dif.py -q-> 16 passed.PR drafted with assistance from Claude Code (Anthropic). The change was
reviewed manually against pyod's source. The reproducer block above is
the one I used during development; reviewers can paste it verbatim.