Speed up HalfSpaceTrees with an iterative walk#1859
Merged
Conversation
Replace the generic recursive `tree.base.Branch.walk` traversal in `anomaly.HalfSpaceTrees.learn_one` and `score_one` with HST-specific iterative loops that inline the split logic, cache `size_limit` and the tree height as locals, and pivot node masses through a precomputed flat node list instead of `iter_dfs` each window. Output is unchanged (doctest scores byte-identical, full estimator-check suite for `MinMaxScaler | HalfSpaceTrees` passes). Pure HST (no scaler, synthetic 10-feature stream): score+learn: 27.9k -> 85.2k obs/s (~3.05x) learn_one: 64.2k -> 169k obs/s (~2.63x) score_one: 49.6k -> 188k obs/s (~3.79x) `MinMaxScaler | HalfSpaceTrees` on CreditCard: score+learn: 20.5k -> 40.5k obs/s (~1.98x)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the recursive generic
tree.base.Branch.walktraversal inanomaly.HalfSpaceTrees.learn_one/score_onewith HST-specific iterative loops that:HSTBranch.nextcall).size_limit = 0.1 * window_sizeandself.heightas locals (was a property recomputed ~1.1M times in profile).iter_dfs()everywindow_sizeobservations.The public API is unchanged — no constructor changes, no behaviour changes. The docstring scores remain byte-identical and all 50
check_estimatorcases forMinMaxScaler | HalfSpaceTreespass.Benchmarks
macOS arm64, Python 3.13, default
HalfSpaceTrees(n_trees=10, height=8, window_size=250), best of 3.Pure HST on a synthetic 10-feature uniform stream (no scaler):
MinMaxScaler | HalfSpaceTreesondatasets.CreditCard()(10k samples):In the post-fix profile, HST itself (
_walk_learn+_walk_score) is no longer the dominant cost —MinMaxScaler.transform_one/learn_oneis. Further wins would need a Rust port (cf. the recent ADWIN/Mondrian/VectorDict efforts) or speeding upMinMaxScaler.Test plan
uv run pytest --doctest-modules river/anomaly/hst.py— passes, docstring outputs byte-identicaluv run pytest river/anomaly/ -x— all 16 tests passuv run pytest river/test_estimators.py -k "HalfSpaceTrees"— all 50 estimator checks pass🤖 Generated with Claude Code