Skip to content

Speed up preprocessing scalers and OneHotEncoder#1864

Open
MaxHalford wants to merge 1 commit into
mainfrom
perf/preprocessing-scalers-onehot
Open

Speed up preprocessing scalers and OneHotEncoder#1864
MaxHalford wants to merge 1 commit into
mainfrom
perf/preprocessing-scalers-onehot

Conversation

@MaxHalford
Copy link
Copy Markdown
Member

Summary

Profiled and optimized the three preprocessing estimators in scope:

  • OneHotEncoder.transform_one — ~8.0× faster. The previous implementation rebuilt the all-zeros dict via {f"{i}_{v}": 0 ...} on every call. The encoder now maintains an incremental cache of that zero-dict in learn_one / learn_many, and transform_one just .copy()s it before setting the 1s.
  • StandardScaler — ~15% faster learn_one. Hoisted self.counts/self.means/self.vars references into locals, split the with_std=True/with_std=False branches out of the loop, and inlined the safe_div call in transform_one (which was a 1M-times-called function for 100k × 10 features). Welford's update formula is unchanged.
  • MinMaxScaler / MaxAbsScaler — ~1.3× faster transform_one. Cache each feature's .get() result in a local (self.min[i].get() was called twice per feature) and inline safe_div. stats.Min/stats.Max/stats.AbsMax are still updated and read only via their .update() / .get() methods — no internal abstractions were bypassed.

Benchmarks (best of 5, 100k samples)

Op Baseline After Speedup
StandardScaler.learn_one (10 feat) 232.8 ms 204.7 ms 1.14×
StandardScaler.transform_one 112.3 ms 96.3 ms 1.17×
StandardScaler.learn+transform 345.6 ms 305.3 ms 1.13×
MinMaxScaler.transform_one (10 feat) 127.7 ms 99.2 ms 1.29×
MinMaxScaler.learn+transform 196.9 ms 166.1 ms 1.19×
OneHotEncoder.transform_one (5 feat, k=20) 566.5 ms 69.1 ms 8.20×
OneHotEncoder.learn+transform 607.6 ms 109.5 ms 5.55×

Correctness

Outputs are bit-identical to baseline across a 500-row parity test for StandardScaler, MinMaxScaler, MaxAbsScaler, and OneHotEncoder (including drop_zeros=True, drop_first=True).

Test plan

  • uv run pytest river/preprocessing — 32 passed
  • uv run pytest river/test_estimators.py -k 'StandardScaler or MinMaxScaler or MaxAbsScaler or OneHotEncoder' — 312 check_estimator framework checks passed
  • uv run pytest river/compose — pipeline integration still passes
  • Parity check: encoded outputs identical to baseline for all four classes
  • Pickle roundtrip and categories=... / learn_many paths for OneHotEncoder exercised manually

🤖 Generated with Claude Code

- OneHotEncoder: maintain an incremental cache of the all-zeros dict so
  transform_one copies it instead of rebuilding {f"{i}_{v}": 0 ...}
  every call. ~8x faster transform_one, ~5.5x faster learn+transform on
  100k rows x 5 features (cardinality 20).
- StandardScaler: hoist self.counts/self.means/self.vars out of the
  inner loop, split the with_std branch, inline safe_div in
  transform_one. ~15% faster learn_one. Welford formula unchanged.
- MinMaxScaler / MaxAbsScaler: cache each feature's .get() result in
  transform_one (self.min[i].get() was called twice per feature),
  inline safe_div, hoist self.min/self.max out of the loop. ~1.3x
  faster transform_one. stats.Min/Max/AbsMax are still updated and
  read only via their .update()/.get() methods.

Outputs are bit-identical to baseline across a 500-row parity test for
all four classes. All 32 preprocessing tests and 312 check_estimator
framework checks pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxHalford MaxHalford requested a review from smastelini as a code owner May 12, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant