Skip to content

Reduce quantile sketch safety factor#12167

Merged
RAMitchell merged 2 commits into
dmlc:masterfrom
RAMitchell:quantile-kfactor-2
Apr 17, 2026
Merged

Reduce quantile sketch safety factor#12167
RAMitchell merged 2 commits into
dmlc:masterfrom
RAMitchell:quantile-kfactor-2

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

Summary

Reduce the weighted quantile sketch internal safety factor from 8.0 to 2.0.

The user-facing epsilon remains the rank-error target. This change only reduces the internal oversampling factor used when computing the retained summary size.

Rationale

The smaller factor is viable now that the sketch-size calculation includes the required log(n) term and adapts to the amount of data being sketched. The previous larger constant compensated for missing growth in the CPU and GPU sketch sizing paths. With the current adaptive sizing, the epsilon rank-error bounds are maintained with a smaller constant.

Validation

  • Built CPU testxgboost
  • Ran ./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'

@RAMitchell
Copy link
Copy Markdown
Member Author

I updated the affected tests to avoid assuming bitwise exactness between distributed/batched and single-machine sketching paths.

The sketch rank guarantees are still the relevant correctness condition, but the exact cut values, compressed indices, metric histories, or training trajectories are not guaranteed to be identical across these execution paths. These tests happened not to fail before, but that was relying on an implementation detail rather than a contract of quantile sketching.

The updated tests now check the actual invariants we want: valid sketch/cut rank behavior, structural consistency, finite training histories, prediction shape, and prediction/contribution consistency.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Reduces the internal oversampling “safety factor” used by the weighted quantile sketch while adjusting several tests that previously enforced stricter equivalence/metric assertions.

Changes:

  • Lowered WQuantileSketch internal kFactor from 8.0 to 2.0.
  • Relaxed/rewrote Spark and Dask distributed Python tests to assert finiteness and basic shape/history invariants instead of close metric equivalence.
  • Updated C++ data/ellpack/gradient-index tests to validate cut consistency via ValidateCuts and removed some byte/exact-index equivalence checks.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_distributed/test_with_spark/test_spark.py Removes single-node sklearn baseline comparisons; asserts basic training-summary invariants and shapes.
tests/cpp/data/test_iterative_dmatrix.cu Replaces some exact internal-structure checks with ValidateCuts and lighter invariants for ellpack host buffers.
tests/cpp/data/test_gradient_index.cc Replaces gidx/serialization equivalence checks with ValidateCuts.
src/common/quantile.h Reduces internal quantile sketch safety factor constant.
python-package/xgboost/testing/dask.py Removes history equality check; now only asserts finiteness of metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_distributed/test_with_spark/test_spark.py
Comment thread tests/test_distributed/test_with_spark/test_spark.py
Comment thread python-package/xgboost/testing/dask.py
Comment thread tests/cpp/data/test_iterative_dmatrix.cu
Comment thread tests/cpp/data/test_iterative_dmatrix.cu
Comment thread tests/cpp/data/test_gradient_index.cc
Comment thread src/common/quantile.h
Comment thread tests/test_distributed/test_with_spark/test_spark.py
Comment thread tests/test_distributed/test_with_spark/test_spark.py
@RAMitchell RAMitchell marked this pull request as ready for review April 16, 2026 11:55
@RAMitchell RAMitchell requested a review from trivialfis April 16, 2026 11:55
@RAMitchell RAMitchell merged commit 45861e7 into dmlc:master Apr 17, 2026
84 of 85 checks passed
@RAMitchell RAMitchell deleted the quantile-kfactor-2 branch April 17, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants