Modernize for scikit-learn 1.6+, NumPy 2.0+, scipy 1.17+, matplotlib 3.7+#1336
Conversation
… matplotlib 3.7+ Fixes 337 test failures caused by breaking API changes in upstream packages. ## scikit-learn compatibility - types.py: replace all _estimator_type checks with _get_estimator_type() helper that falls back to __sklearn_tags__() and Mixin subclass inspection (sklearn 1.6+) - base.py: add ModelVisualizer.__sklearn_is_fitted__() so Pipeline.score() correctly detects fitted state of wrapped visualizers (sklearn 1.8+) - contrib/wrapper.py: add __sklearn_tags__() to ContribEstimator (sklearn 1.6+) - classifier/class_prediction_error.py: unpack _check_targets() by index (4-tuple in 1.8) - classifier/base.py: extract KeyError key via e.args[0] for np.str_ repr change - classifier/threshold.py: remove multi_class="auto" (removed in 1.7) - regressor/alphas.py: support store_cv_results/cv_results_ with fallback to old names - cluster/elbow.py: wrap sparse matrix center with np.asarray() (np.matrix rejected in 1.7) - cluster/icdm.py: cap TSNE perplexity dynamically (must be < n_samples) ## NumPy compatibility - helpers.py: np.in1d → np.isin (removed in NumPy 2.0) - contrib/missing/dispersion.py: np.string_ → np.bytes_, np.unicode_ → np.str_ (removed in 1.24) - text/dispersion.py: wrap generators in list() for np.stack() (NumPy 2.0) - cluster/icdm.py: interpolation= → method= in np.percentile (renamed in 1.22) ## matplotlib compatibility - regressor/influence.py: remove use_line_collection=True from ax.stem() (removed in 3.7) ## Test suite updates - Updated 13 test files for deprecated APIs and new sklearn/scipy numerical values - Regenerated ~96 baseline PNG images for matplotlib 3.10 rendering - Added tests/test_compat.py with 25 targeted regression tests for each fix - Updated requirements.txt and setup.py: sklearn>=1.6, numpy>=1.24, scipy>=1.7, matplotlib>=3.5, Python 3.9–3.13 Result: 1127 passed, 0 failed (vs 337 failures before) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relationship to Other Open Modernization PRsWhile working on this PR, I noticed several other open attempts at the same problem. Here's how they relate: #1322 and #1325 (lwgray) — Both are subsets of this PR. #1322 only removes the #1332 and #1333 (PythonCharmers) — These fix #1329 (lwgray) — The most comprehensive existing attempt, targeting sklearn 1.7, NumPy 2.0, and matplotlib 3.10. Covers roughly 70% of the same ground as this PR. The key differences are:
In short: this PR is a strict superset of all the above. If the maintainers prefer a smaller, more incremental approach, merging #1329 first and then layering the remaining sklearn 1.8 fixes on top would also be a viable path. |
|
@jg585Username I will take a look a this this weekend |
| etype = estimator.__sklearn_tags__().estimator_type | ||
| if etype is not None: | ||
| return etype | ||
| except Exception: |
| from sklearn.naive_bayes import GaussianNB | ||
| from sklearn.impute import SimpleImputer | ||
| from sklearn.pipeline import Pipeline | ||
| from sklearn.datasets import make_classification, make_regression |
- Remove unused make_classification import in test_compat.py - Hoist sklearn imports in base.py to module top (called by Pipeline) - Simplify ContribEstimator.__sklearn_tags__: drop dead _HAS_SKLEARN_TAGS flag since sklearn>=1.6 is now a hard requirement - Add explanatory comment to bare except in _get_estimator_type so the defensive fallback is clear to readers and CodeQL
- Format four files modified in this PR with project's pinned black 22.6.0 - Shorten docstrings on __sklearn_is_fitted__ and test_percentile_method_kwarg - Rewrap error message in ContribEstimator.__getattr__ - Replace long sklearn source URLs with short function references in is_classifier / is_regressor docstrings
Summary
Fixes 337 test failures caused by breaking API changes introduced in recent versions of scikit-learn, NumPy, scipy, and matplotlib. The project now supports:
Result: 1127 passed, 0 failed (was 337 failures before this PR).
scikit-learn compatibility
_estimator_typeremoved (sklearn 1.6+)yellowbrick/utils/types.py— The most widespread breakage. sklearn 1.6 removed the_estimator_typeclass attribute from many estimators. Added_get_estimator_type()which falls back through: legacy attribute →__sklearn_tags__()API → Mixin subclass inspection. Rewroteis_classifier(),is_regressor(),is_clusterer()on top of it.Pipeline.__sklearn_is_fitted__(sklearn 1.8+)yellowbrick/base.py— sklearn 1.8'sPipeline.score()callscheck_is_fitted(last_step)on the final pipeline step.ModelVisualizerhas no trailing-underscore attributes of its own, so this raisedNotFittedErroreven after a successfulfit(). Added__sklearn_is_fitted__()delegating tocheck_is_fitted(self.estimator).__sklearn_tags__()for ContribEstimator (sklearn 1.6+)yellowbrick/contrib/wrapper.py— Added__sklearn_tags__()so wrapped contrib estimators correctly report their type through the new Tags API._check_targets()returns 4-tuple (sklearn 1.8)yellowbrick/classifier/class_prediction_error.py— Was unpacked as 3-tuple; switched to indexed access.multi_class="auto"removed (sklearn 1.7)yellowbrick/classifier/threshold.py— Removed the parameter.store_cv_values/cv_values_renamed (sklearn 1.7)yellowbrick/regressor/alphas.py— Added dual-check with fallback to old names for backward compatibility.np.matrixrejected (sklearn 1.7)yellowbrick/cluster/elbow.py— Sparse matrix.mean()returnsnp.matrix; wrapped withnp.asarray().TSNE perplexity validation (sklearn 1.7+)
yellowbrick/cluster/icdm.py— perplexity must be < n_samples; added dynamic cap beforefit_transform.NumPy compatibility
yellowbrick/utils/helpers.pynp.in1d→np.isinyellowbrick/contrib/missing/dispersion.pynp.string_→np.bytes_,np.unicode_→np.str_yellowbrick/text/dispersion.pylist()fornp.stack()yellowbrick/cluster/icdm.pyinterpolation=→method=innp.percentileyellowbrick/classifier/base.pye.args[0]np.str_reprmatplotlib compatibility
yellowbrick/regressor/influence.py— Removeduse_line_collection=Truefromax.stem()(argument removed in matplotlib 3.7).Test suite updates
tests/test_compat.py— 25 new targeted regression tests, one per fix, so future package upgrades immediately surface regressions:TestEstimatorTypeDetection— verifiesis_classifier/regressor/clusterervia Mixin-only classes (no_estimator_type)TestModelVisualizerFittedState— verifies__sklearn_is_fitted__andPipeline.score()don't raise after fitTestNumpyCompat— documents and verifies all NumPy 2.0 API removalsTestMatplotlibCompat— verifiesax.stem()withoutuse_line_collectionrequirements.txt/setup.py— bumped minimums and added Python 3.9–3.13 classifiers