Skip to content

Modernize for scikit-learn 1.6+, NumPy 2.0+, scipy 1.17+, matplotlib 3.7+#1336

Open
jg585Username wants to merge 3 commits into
DistrictDataLabs:developfrom
jg585Username:feature/modernize-sklearn-numpy-compat
Open

Modernize for scikit-learn 1.6+, NumPy 2.0+, scipy 1.17+, matplotlib 3.7+#1336
jg585Username wants to merge 3 commits into
DistrictDataLabs:developfrom
jg585Username:feature/modernize-sklearn-numpy-compat

Conversation

@jg585Username
Copy link
Copy Markdown

@jg585Username jg585Username commented May 12, 2026

Summary

Fixes 337 test failures caused by breaking API changes introduced in recent versions of scikit-learn, NumPy, scipy, and matplotlib. The project now supports:

  • scikit-learn ≥ 1.6 (tested against 1.8.0)
  • NumPy ≥ 1.24 (tested against 2.4.4)
  • scipy ≥ 1.7 (tested against 1.17.1)
  • matplotlib ≥ 3.5 (tested against 3.10.9)
  • Python 3.9 – 3.13

Result: 1127 passed, 0 failed (was 337 failures before this PR).


scikit-learn compatibility

_estimator_type removed (sklearn 1.6+)

yellowbrick/utils/types.py — The most widespread breakage. sklearn 1.6 removed the _estimator_type class attribute from many estimators. Added _get_estimator_type() which falls back through: legacy attribute → __sklearn_tags__() API → Mixin subclass inspection. Rewrote is_classifier(), is_regressor(), is_clusterer() on top of it.

Pipeline.__sklearn_is_fitted__ (sklearn 1.8+)

yellowbrick/base.py — sklearn 1.8's Pipeline.score() calls check_is_fitted(last_step) on the final pipeline step. ModelVisualizer has no trailing-underscore attributes of its own, so this raised NotFittedError even after a successful fit(). Added __sklearn_is_fitted__() delegating to check_is_fitted(self.estimator).

__sklearn_tags__() for ContribEstimator (sklearn 1.6+)

yellowbrick/contrib/wrapper.py — Added __sklearn_tags__() so wrapped contrib estimators correctly report their type through the new Tags API.

_check_targets() returns 4-tuple (sklearn 1.8)

yellowbrick/classifier/class_prediction_error.py — Was unpacked as 3-tuple; switched to indexed access.

multi_class="auto" removed (sklearn 1.7)

yellowbrick/classifier/threshold.py — Removed the parameter.

store_cv_values / cv_values_ renamed (sklearn 1.7)

yellowbrick/regressor/alphas.py — Added dual-check with fallback to old names for backward compatibility.

np.matrix rejected (sklearn 1.7)

yellowbrick/cluster/elbow.py — Sparse matrix .mean() returns np.matrix; wrapped with np.asarray().

TSNE perplexity validation (sklearn 1.7+)

yellowbrick/cluster/icdm.py — perplexity must be < n_samples; added dynamic cap before fit_transform.


NumPy compatibility

File Change Reason
yellowbrick/utils/helpers.py np.in1dnp.isin Removed in NumPy 2.0
yellowbrick/contrib/missing/dispersion.py np.string_np.bytes_, np.unicode_np.str_ Removed in NumPy 1.24
yellowbrick/text/dispersion.py Wrapped generators in list() for np.stack() NumPy 2.0 requires sequences
yellowbrick/cluster/icdm.py interpolation=method= in np.percentile Renamed in NumPy 1.22
yellowbrick/classifier/base.py Extract KeyError key via e.args[0] NumPy 2.0 changed np.str_ repr

matplotlib compatibility

yellowbrick/regressor/influence.py — Removed use_line_collection=True from ax.stem() (argument removed in matplotlib 3.7).


Test suite updates

  • 13 test files updated for deprecated APIs, wrong MRO order, removed parameters, and updated numerical expected values (sklearn 1.8 / scipy 1.17 produce slightly different scores)
  • ~96 baseline PNG images regenerated to match matplotlib 3.10 rendering
  • tests/test_compat.py — 25 new targeted regression tests, one per fix, so future package upgrades immediately surface regressions:
    • TestEstimatorTypeDetection — verifies is_classifier/regressor/clusterer via Mixin-only classes (no _estimator_type)
    • TestModelVisualizerFittedState — verifies __sklearn_is_fitted__ and Pipeline.score() don't raise after fit
    • TestNumpyCompat — documents and verifies all NumPy 2.0 API removals
    • TestMatplotlibCompat — verifies ax.stem() without use_line_collection
  • requirements.txt / setup.py — bumped minimums and added Python 3.9–3.13 classifiers

… matplotlib 3.7+

Fixes 337 test failures caused by breaking API changes in upstream packages.

## scikit-learn compatibility
- types.py: replace all _estimator_type checks with _get_estimator_type() helper
  that falls back to __sklearn_tags__() and Mixin subclass inspection (sklearn 1.6+)
- base.py: add ModelVisualizer.__sklearn_is_fitted__() so Pipeline.score() correctly
  detects fitted state of wrapped visualizers (sklearn 1.8+)
- contrib/wrapper.py: add __sklearn_tags__() to ContribEstimator (sklearn 1.6+)
- classifier/class_prediction_error.py: unpack _check_targets() by index (4-tuple in 1.8)
- classifier/base.py: extract KeyError key via e.args[0] for np.str_ repr change
- classifier/threshold.py: remove multi_class="auto" (removed in 1.7)
- regressor/alphas.py: support store_cv_results/cv_results_ with fallback to old names
- cluster/elbow.py: wrap sparse matrix center with np.asarray() (np.matrix rejected in 1.7)
- cluster/icdm.py: cap TSNE perplexity dynamically (must be < n_samples)

## NumPy compatibility
- helpers.py: np.in1d → np.isin (removed in NumPy 2.0)
- contrib/missing/dispersion.py: np.string_ → np.bytes_, np.unicode_ → np.str_ (removed in 1.24)
- text/dispersion.py: wrap generators in list() for np.stack() (NumPy 2.0)
- cluster/icdm.py: interpolation= → method= in np.percentile (renamed in 1.22)

## matplotlib compatibility
- regressor/influence.py: remove use_line_collection=True from ax.stem() (removed in 3.7)

## Test suite updates
- Updated 13 test files for deprecated APIs and new sklearn/scipy numerical values
- Regenerated ~96 baseline PNG images for matplotlib 3.10 rendering
- Added tests/test_compat.py with 25 targeted regression tests for each fix
- Updated requirements.txt and setup.py: sklearn>=1.6, numpy>=1.24, scipy>=1.7,
  matplotlib>=3.5, Python 3.9–3.13

Result: 1127 passed, 0 failed (vs 337 failures before)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jg585Username
Copy link
Copy Markdown
Author

Relationship to Other Open Modernization PRs

While working on this PR, I noticed several other open attempts at the same problem. Here's how they relate:

#1322 and #1325 (lwgray) — Both are subsets of this PR. #1322 only removes the use_line_collection parameter from ax.stem(). #1325 covers that plus a handful of other matplotlib/NumPy fixes, but stops well short of full compatibility.

#1332 and #1333 (PythonCharmers) — These fix types.py and contrib/wrapper.py only, targeting the sklearn 1.6 tags API. Their approach for types.py (delegating directly to sklearn's own is_classifier/regressor/clusterer) is arguably simpler than ours, but neither PR addresses the sklearn 1.8 Pipeline.__sklearn_is_fitted__ breakage or any of the NumPy/matplotlib issues.

#1329 (lwgray) — The most comprehensive existing attempt, targeting sklearn 1.7, NumPy 2.0, and matplotlib 3.10. Covers roughly 70% of the same ground as this PR. The key differences are:

  • They skip the pipeline validation tests rather than fixing the root cause; this PR adds __sklearn_is_fitted__() to ModelVisualizer so Pipeline.score() works correctly on sklearn 1.8
  • They don't address _check_targets() returning a 4-tuple (sklearn 1.8), TSNE perplexity validation, np.matrix rejection, or multi_class="auto" removal
  • This PR targets sklearn 1.8 specifically and achieves 1127 passed, 0 failed
  • This PR adds tests/test_compat.py with 25 targeted regression tests so future package upgrades surface breakages immediately

In short: this PR is a strict superset of all the above. If the maintainers prefer a smaller, more incremental approach, merging #1329 first and then layering the remaining sklearn 1.8 fixes on top would also be a viable path.

@jg585Username jg585Username reopened this May 12, 2026
@lwgray lwgray self-assigned this May 22, 2026
@lwgray
Copy link
Copy Markdown
Contributor

lwgray commented May 22, 2026

@jg585Username I will take a look a this this weekend

etype = estimator.__sklearn_tags__().estimator_type
if etype is not None:
return etype
except Exception:
Comment thread tests/test_compat.py Outdated
from sklearn.naive_bayes import GaussianNB
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_classification, make_regression
- Remove unused make_classification import in test_compat.py
- Hoist sklearn imports in base.py to module top (called by Pipeline)
- Simplify ContribEstimator.__sklearn_tags__: drop dead _HAS_SKLEARN_TAGS
  flag since sklearn>=1.6 is now a hard requirement
- Add explanatory comment to bare except in _get_estimator_type so the
  defensive fallback is clear to readers and CodeQL
- Format four files modified in this PR with project's pinned black 22.6.0
- Shorten docstrings on __sklearn_is_fitted__ and test_percentile_method_kwarg
- Rewrap error message in ContribEstimator.__getattr__
- Replace long sklearn source URLs with short function references in
  is_classifier / is_regressor docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants