Make pandas an optional dependency by zeel2104 · Pull Request #1831 · online-ml/river

zeel2104 · 2026-04-19T12:54:10Z

Summary

This PR removes pandas as a hard dependency.

Changes made:

move pandas from core dependencies to an optional extra
lazy-load pandas in batch-related code paths
raise a clear ImportError when learn_many, predict_many, transform_many, and related pandas-dependent operations are used without pandas
remove unnecessary internal pandas usage in places where it was not required
add tests covering behavior when pandas is not installed

Validation

Tested without pandas installed:

python -m pytest river\utils\test_pandas.py

Tested with pandas and scikit-learn installed:

python -m pytest river\compose\test_.py river\compose\test_product.py river\preprocessing\test_scale.py river\multiclass\test_ovr.py river\naive_bayes\test_naive_bayes.py river\proba\test_gaussian.py river\covariance\test_emp.py river\compat\test_sklearn.py river\anomaly\test_lof.py

MaxHalford

Nearly there! I just have a couple of questions.

MaxHalford · 2026-04-27T09:25:35Z

        super().learn_one(x, y=1)

    def learn_many(self, X):
+        pd = utils.pandas.import_pandas()


This is a good pattern, I like it

MaxHalford · 2026-04-27T09:27:32Z

    def predict_proba_many(self, X):
+        pd = utils.pandas.import_pandas()
        try:
-            return pd.Series(self.estimator.predict_proba(self._align_df(X)), columns=self.classes)


Ok that fixes a bug I guess, I'm surprised it didn't get caught before

MaxHalford · 2026-04-27T09:29:37Z

+def requires_pandas(method):
+    @functools.wraps(method)
+    def wrapper(*args, **kwargs):
+        import_pandas()
+        return method(*args, **kwargs)
+
+    return wrapper


Is this actually used?

Good catch — the requires_pandas decorator was unused. Removed in 072e2f7.

MaxHalford · 2026-04-27T09:33:12Z

+]
+
+[project.optional-dependencies]
+pandas = [


Could you add some instructions to the installation docs to explain that mini-batch is an opt-in feature and requires pandas?

Added an opt-in section to the README, the installation docs, and the mini-batching recipe explaining that mini-batch methods require pip install "river[pandas]". Also added a changelog entry under packaging in docs/releases/unreleased.md.

Resolve conflicts in pyproject.toml (keep main's modernized dev deps plus pandas) and river/anomaly/lof.py (drop unused functools/utils imports replaced by euclidean_distance_dict on main, keep lazy pandas import).

MaxHalford · 2026-05-28T10:56:35Z

Hello @zeel2104! If that's ok I'm going to finish your PR, as I'd like to get it shipped for the next version :)

- Drop `base.utils.*` references in naive_bayes/{bernoulli,multinomial,complement}.py — mypy --strict does not treat re-exposed submodules as exported. Import `utils` directly instead. - Add `TYPE_CHECKING` pandas import to river/tree/base.py so `pd.DataFrame` annotation resolves. - Add type annotations to river/utils/pandas.py helpers and river/utils/test_pandas.py tests. - Sync uv.lock with new optional-pandas dependency layout.

- Resolve conflict in river/cluster/textclust.py by combining the PR's numpy-only distance matrix (no pandas) with main's `_macro_distance` refactor and snake_case names. - Address review: drop unused `requires_pandas` decorator in river/utils/pandas.py. - Address review: document `pandas` as an opt-in extra in the README, the installation docs, the mini-batching recipe, and the unreleased changelog. - Auto-fix ruff (drop unused TYPE_CHECKING pandas imports left over by the PR; sort imports) and add `import pandas as pd` to doctests that used to rely on the module-level eager pandas import (compose/pipeline.py, compose/union.py, multiclass/ovr.py). - Update `MultivariateGaussian.__repr__` doctest expected output: the PR's pure-Python formatting drops pandas's positive-sign padding.

- Add a no-pandas CI job (.github/workflows/code-quality.yml) that installs the dev environment, uninstalls pandas, and runs the full pytest suite. A conftest hook auto-skips test modules and doctest sources whose text mentions `import pandas`, `>>> pd.`, or `fetch_openml` (the latter routes through pandas inside scikit-learn). - Lazy-load pandas in river.neural_net.mlp so importing river no longer pulls pandas in. The MLP estimator still requires pandas at call time; it now goes through utils.pandas.import_pandas() in learn_one / learn_many / __call__ / predict_one / predict_many. - Update river.checks to skip the mini-batch consistency checks when pandas is not installed (mini-batch methods inherently need pandas). - river.utils.pandas.import_pandas now raises an ImportError that points the user at `pip install "river[pandas]"`. - Consolidate the pandas check in river.compat.river_to_sklearn to use river.utils.pandas.PANDAS_INSTALLED instead of its own duplicate. - Drop the obsolete cibuildwheel TODO in pyproject.toml — pandas is now optional, so the comment's premise is gone.

Without this step, dataset-using doctests print 'Downloading...' lines that the doctest framework reports as unexpected output. Mirror the existing 'ubuntu' job's cache + 'make download-datasets' steps.

- Cache `import_pandas()` with `functools.cache` so repeated calls in hot paths (mlp.py, naive_bayes, compose) collapse to a single dict lookup after the first call. - Inline the install-hint string into the ImportError; the named constant added no reuse. - Drop the "Mini-batch checks are skipped..." comment in checks; the `if utils.pandas.PANDAS_INSTALLED:` already says it. - In compat.river_to_sklearn, import pandas directly inside the `PANDAS_INSTALLED` branch instead of going through `import_pandas()`. This is a module-load-time block already gated by the flag, so the indirection only obscured the intent.

Make pandas an optional dependency

a233292

zeel2104 requested review from Dennis1989, MaxHalford, hoanganhngo610 and smastelini as code owners April 19, 2026 12:54

MaxHalford reviewed Apr 27, 2026

View reviewed changes

Merge branch 'main' into remove-hard-pandas-dependency

e10d6b5

Resolve conflicts in pyproject.toml (keep main's modernized dev deps plus pandas) and river/anomaly/lof.py (drop unused functools/utils imports replaced by euclidean_distance_dict on main, keep lazy pandas import).

MaxHalford added 5 commits May 28, 2026 12:59

Cache and pre-download datasets in the no-pandas CI job

a774c38

Without this step, dataset-using doctests print 'Downloading...' lines that the doctest framework reports as unexpected output. Mirror the existing 'ubuntu' job's cache + 'make download-datasets' steps.

MaxHalford merged commit 942fb7b into online-ml:main May 28, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make pandas an optional dependency#1831

Make pandas an optional dependency#1831
MaxHalford merged 7 commits into
online-ml:mainfrom
zeel2104:remove-hard-pandas-dependency

zeel2104 commented Apr 19, 2026

Uh oh!

MaxHalford left a comment

Uh oh!

MaxHalford Apr 27, 2026

Uh oh!

MaxHalford Apr 27, 2026

Uh oh!

MaxHalford Apr 27, 2026

Uh oh!

MaxHalford May 28, 2026 •

edited

Loading

Uh oh!

MaxHalford Apr 27, 2026

Uh oh!

MaxHalford May 28, 2026

Uh oh!

MaxHalford commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zeel2104 commented Apr 19, 2026

Summary

Validation

Uh oh!

MaxHalford left a comment

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxHalford May 28, 2026 •

edited

Loading