preprocessing: handle constant columns in minmax_scaling (#1167)#1169
Open
jbbqqf wants to merge 1 commit into
Open
preprocessing: handle constant columns in minmax_scaling (#1167)#1169jbbqqf wants to merge 1 commit into
jbbqqf wants to merge 1 commit into
Conversation
Constant columns have zero range, so the naive (x - min) / (max - min) inside minmax_scaling computes 0 / 0 and silently writes NaN into the output (with only a low-level numpy `RuntimeWarning: invalid value encountered in divide`). The sibling function `standardize` already collapses constant columns to 0.0; this commit aligns `minmax_scaling` with the same contract. Force the per-column denominator to 1 for constant columns: the numerator is identically zero in that case, so the column collapses to 0.0 (i.e. `min_val`) without raising a warning. Behaviour for non-constant columns is unchanged. Adds three regression tests covering the numpy path, the pandas DataFrame path, and the custom `(min_val, max_val)` range. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mlxtend.preprocessing.minmax_scalingsilently returnsNaNfor any column whose values are all identical: the per-columndenominator = max - minis0, sonumerator / denominatorbecomes0 / 0and numpy only emits a low-levelRuntimeWarning: invalid value encountered in divide. The sibling functionstandardizein the same module already collapses constant columns to0.0(and documents it). This PR alignsminmax_scalingwith that contract.Fixes #1167 — minmax_scaling returns NaN silently for constant columns
Context
The bug was reported on 2026-05-21 and points out the asymmetry with
standardize: constant columns get NaN with no error path other than the numpy warning, which is easy to miss in a larger pipeline. The reporter's expected output (constant column at0.0, other columns scaled normally) matches whatstandardizealready does, so the fix here is to make the two functions agree.Changes
mlxtend/preprocessing/scaling.py— inminmax_scaling, replace any zero entry indenominatorwith1before the divide. The numerator for those columns is identically zero, so the column collapses to0.0(i.e.min_val) instead ofNaN. Behaviour for non-constant columns is unchanged. TheNotessection of the docstring now records the contract explicitly. The substitution usesnp.where(denominator == 0, 1, denominator)— a 1-line guard with a 5-line comment explaining the invariant so a reviewer reading the diff cold doesn't have to re-derive it.mlxtend/preprocessing/tests/test__scaling__minmax_scaling.py— three new regression tests: numpy path with default(0, 1)range, pandas DataFrame path, and custom(50, 100)range. All three fail onorigin/masterand pass on this branch.docs/sources/CHANGELOG.md— one entry under "Version 0.25.0 (TBD)".Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
The three new regression tests fail on
origin/master:Edge cases tested
[[5,1],[5,2],[5,3]]0.0, column 1 →0/0.5/1test_minmax_scaling_constant_column_numpy{"const":[5,5,5],"var":[1,2,3]}.locindexing pathtest_minmax_scaling_constant_column_pandas(min_val, max_val)[[7,1],[7,2],[7,3]],(50, 100)50.0(the lower bound)test_minmax_scaling_constant_column_custom_rangetest_pandas_minmax_scaling/test_numpy_minmax_scalinginputsRisk / blast radius
Additive guard inside a single function. The new code path is only exercised when at least one selected column has zero range — previously that path produced NaN, now it produces
min_val. Callers that relied on the NaN as a downstream sentinel will see a behavioural change, but that path was undocumented and inconsistent withstandardize; the new behaviour matches the function's already-documented "rescaled column" contract.Release note
PR drafted with assistance from Claude Code. The change was reviewed manually against the existing
standardizeimplementation in the same file (which has handled constant columns since at least v0.18) and against the reporter's expected output in #1167. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim.