Reduce quantile sketch safety factor by RAMitchell · Pull Request #12167 · dmlc/xgboost

RAMitchell · 2026-04-16T10:16:37Z

Summary

Reduce the weighted quantile sketch internal safety factor from 8.0 to 2.0.

The user-facing epsilon remains the rank-error target. This change only reduces the internal oversampling factor used when computing the retained summary size.

Rationale

The smaller factor is viable now that the sketch-size calculation includes the required log(n) term and adapts to the amount of data being sketched. The previous larger constant compensated for missing growth in the CPU and GPU sketch sizing paths. With the current adaptive sizing, the epsilon rank-error bounds are maintained with a smaller constant.

Validation

Built CPU testxgboost
Ran ./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'

RAMitchell · 2026-04-16T11:29:54Z

I updated the affected tests to avoid assuming bitwise exactness between distributed/batched and single-machine sketching paths.

The sketch rank guarantees are still the relevant correctness condition, but the exact cut values, compressed indices, metric histories, or training trajectories are not guaranteed to be identical across these execution paths. These tests happened not to fail before, but that was relying on an implementation detail rather than a contract of quantile sketching.

The updated tests now check the actual invariants we want: valid sketch/cut rank behavior, structural consistency, finite training histories, prediction shape, and prediction/contribution consistency.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Reduces the internal oversampling “safety factor” used by the weighted quantile sketch while adjusting several tests that previously enforced stricter equivalence/metric assertions.

Changes:

Lowered WQuantileSketch internal kFactor from 8.0 to 2.0.
Relaxed/rewrote Spark and Dask distributed Python tests to assert finiteness and basic shape/history invariants instead of close metric equivalence.
Updated C++ data/ellpack/gradient-index tests to validate cut consistency via ValidateCuts and removed some byte/exact-index equivalence checks.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/test_distributed/test_with_spark/test_spark.py	Removes single-node sklearn baseline comparisons; asserts basic training-summary invariants and shapes.
tests/cpp/data/test_iterative_dmatrix.cu	Replaces some exact internal-structure checks with `ValidateCuts` and lighter invariants for ellpack host buffers.
tests/cpp/data/test_gradient_index.cc	Replaces gidx/serialization equivalence checks with `ValidateCuts`.
src/common/quantile.h	Reduces internal quantile sketch safety factor constant.
python-package/xgboost/testing/dask.py	Removes history equality check; now only asserts finiteness of metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RAMitchell added 2 commits April 15, 2026 23:40

Reduce quantile sketch safety factor

f8b0106

Remove exact sketch equality assumptions from tests

1f6c9aa

RAMitchell requested a review from Copilot April 16, 2026 11:30

Copilot started reviewing on behalf of RAMitchell April 16, 2026 11:35 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

RAMitchell marked this pull request as ready for review April 16, 2026 11:55

RAMitchell requested a review from trivialfis April 16, 2026 11:55

trivialfis approved these changes Apr 16, 2026

View reviewed changes

RAMitchell merged commit 45861e7 into dmlc:master Apr 17, 2026
84 of 85 checks passed

RAMitchell deleted the quantile-kfactor-2 branch April 17, 2026 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce quantile sketch safety factor#12167

Reduce quantile sketch safety factor#12167
RAMitchell merged 2 commits into
dmlc:masterfrom
RAMitchell:quantile-kfactor-2

RAMitchell commented Apr 16, 2026

Uh oh!

RAMitchell commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

RAMitchell commented Apr 16, 2026

Summary

Rationale

Validation

Uh oh!

RAMitchell commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants