Skip to content

fix(tree): solve the TreeSHAP Vandermonde systems exactly at every depth#547

Open
42logos wants to merge 3 commits into
mmschlk:mainfrom
FabianK-Dev:wu/fix-vandermonde-conditioning
Open

fix(tree): solve the TreeSHAP Vandermonde systems exactly at every depth#547
42logos wants to merge 3 commits into
mmschlk:mainfrom
FabianK-Dev:wu/fix-vandermonde-conditioning

Conversation

@42logos

@42logos 42logos commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes #545.

What was wrong

The polynomial TreeSHAP machinery (TreeSHAPIQ, LinearTreeSHAP) builds its interpolation N matrices by explicitly inverting Vandermonde systems, inv(np.vander(D[:i]).T), over prefixes of the interpolation grid. These systems are severely ill-conditioned in double precision — the conditioning is non-monotonic in the prefix length and peaks at interior prefixes ($i \approx n/2$). On the default Chebyshev grids the explicit inverse drifts at the $\sim 10^{-7}$ level from interpolation degree ~20, returns silently wrong values from degree ~32 (rank-deficient to machine precision), and crashes with an unexplained LinAlgError near degree 60 — all without a single warning.

Fix: solve the systems exactly

The ill-conditioning is purely a floating-point artifact — the interpolation nodes are distinct, so the systems are exactly solvable over the rationals. The new solve_vandermonde therefore returns the float64 rounding of the exact solution at every depth, using only the standard library:

  • $O(n^2)$ Björck–Pereyra dual recursion (Golub & Van Loan, Alg. 4.6.2) instead of an $O(n^3)$ inverse, executed in scaled-integer fixed-point arithmetic (int, scale $2^{128}$ and up). A depth-100 grid's full prefix workload takes ~0.3 s.
  • Certificate by independent agreement: the fixed-point result is accepted when it agrees bitwise with a plain float64 LAPACK solve (cheap corroboration on the well-conditioned common path) or with the next precision rung. Nodes that collapse onto the same scaled integer at a coarse rung climb the ladder instead of being misreported as coincident; only genuinely degenerate custom nodes exhaust it, which raises a documented ValueError.
  • Memoized on (grid, rhs): every tree of equal depth in an ensemble issues identical solves, so a forest pays one tree's worth of work.
  • A shared build_n_matrix helper replaces the four duplicated construction loops and checks each row against the representation limit as soon as it is computed.

This removes all conditioning error from the solves: up to the representation limit below, the returned coefficients are the correctly rounded exact solutions (values for degrees above ~20 may therefore shift at the $\sim 10^{-7}$ level relative to previous releases).

The remaining limit is representational — and is now enforced honestly

The exact coefficients grow exponentially with the interpolation degree (e.g. $\max|N| \approx 10^{13}$ at degree 36), and the downstream float64 pipeline consumes them in inner products whose cancellation error tracks $\max|N| \cdot 10^{-13}$ (measured on chain trees of depth 20–40, within one order of magnitude). Beyond an empirically calibrated bound (max|N| > 3e10, expected loss in the 0.3–3 % range) construction raises RepresentationLimitError (a ValueError) instead of returning silently wrong values. On the default grids the accept/refuse boundary is exactly degree 29/30 for LinearTreeSHAP and 25/26 for TreeSHAPIQ (whose identity N matrix, built for every index, saturates first) — both pinned by tests.

TreeExplainer handles the limit transparently where possible: LinearTreeSHAP's degree is the full tree depth, while TreeSHAPIQ's is min(depth, features in the tree), so when only the former trips the limit (deep trees over few features), order-1 explanations are re-routed to TreeSHAPIQ, which computes the same Shapley values at a feasible degree. When both degrees exceed the limit (e.g. a depth-30 tree that actually uses 30+ features), the error propagates: the user gets an explanatory exception instead of the silently wrong numbers previous releases returned.

Curing the representation limit itself requires rewriting the polynomial algebra in a value-space (barycentric) basis — out of scope here and tracked as follow-up work.

How this resolves the symptoms reported in #545

Symptom in #545 Behavior on this branch
Degrees ~27–31: silent precision loss Solved exactly (bitwise equal to the exact rational solution)
Degrees ~32–59: silently wrong values Solve layer: exact at every degree. End to end: degrees ≤ 29 return exact coefficients; ≥ 30 raise RepresentationLimitError (with the re-route above)
Degrees ~60+: unexplained LinAlgError crash Solve layer: exact, no crash. End to end: explanatory RepresentationLimitError
End-to-end sklearn scenarios (depths 30/40/55/None), wrong values with zero warnings All raise RepresentationLimitError; nothing remains in the silent-failure class

Note that the issue proposed a weaker fix (certify the grid, degrade to least squares, emit a coalesced RuntimeWarning). This PR deliberately goes further: exact-or-raise instead of warn-and-approximate — within the representable range the values carry no conditioning error at all, and beyond it no approximate values are returned.

Performance

Measured on the N-matrix build for chebpts2 grids (min of 7 runs):

depth previous inv exact, first tree exact, further trees (memoized)
12 0.34 ms 1.86 ms 0.26 ms
20 0.71 ms 4.87 ms 0.54 ms
28 1.23 ms 10.4 ms 0.94 ms

Single trees pay a one-time millisecond-scale cost; for ensembles the memoized path is slightly faster than the old inverse, so a 100-tree forest builds its N matrices in about the same time as before (~34 ms vs ~38 ms at depth 12).

How has this been tested

test_tree_numerics.py (34 tests, all passing; the full tests_tree_explainer suite passes with 253 passed / 1 skipped):

  • Exactness oracle: bitwise agreement with exact rational Gaussian elimination on Chebyshev grids of sizes 12–45 — including the interior prefixes where the old code was rank-deficient — and on a clustered non-Chebyshev custom grid.
  • Differential regression guards: the replaced inv(vander)-based construction is recomputed inline at grid size 32 with the library's own right-hand side and shown to be off by an absolute error of ~2e2 (fatal after downstream cancellation) where the new solver is bitwise exact; and an end-to-end sklearn scenario (a one-hot-style fit forcing a depth-39 chain over 39 features) must raise RepresentationLimitError where previous releases silently returned values with a completeness error of ~2e5. Both tests fail on the pre-fix code by construction.
  • Agreement with the previous formulation on well-conditioned grids (sizes 2–20).
  • Boundary pinning: degree 29 constructs and satisfies completeness, degree 30 is refused (LinearTreeSHAP); degree 25 constructs and 26 is refused (TreeSHAPIQ) — the figures quoted in the error message and docs cannot silently decouple from behavior.
  • Deep-chain completeness at depths 20/24/28 with tolerances matching the documented downstream bound, and explicit refusal (RepresentationLimitError) at depths 35/60 — both on a deterministic chain-tree generator that cannot silently skip. The gate is exercised on the LinearTreeSHAP path and inside TreeSHAPIQ's own N matrices (higher-order indices).
  • Re-routing: a depth-40 tree over 5 features is refused by LinearTreeSHAP but explained exactly through TreeExplainer (falls back to TreeSHAPIQ, completeness to $10^{-9}$; measured $\sim 10^{-16}$) — verified for index="SV" and order-1 "SII".
  • Precision-ladder climbing: tiny-magnitude nodes that collapse at the first rung's resolution are solved exactly at a finer rung instead of being misreported as coincident.
  • Cache isolation: mutating a returned solution does not corrupt the memoized value.
  • Degenerate inputs: coincident or non-finite nodes raise instead of returning garbage.

Behavioral notes (also in the CHANGELOG under "Changed")

  • Degrees ~30–59 previously returned silently wrong values and ~60+ crashed with LinAlgError; both now raise RepresentationLimitError at construction (with the TreeExplainer re-routing above).
  • Values for degrees above ~20 may shift at the $10^{-7}$ level relative to previous releases: the coefficients are now the correctly rounded exact solutions.

…TreeSHAP-IQ

The polynomial TreeSHAP machinery computed inv(vander(points).T) @ rhs at four
call sites (TreeSHAPIQ N/N_cii/N_id matrices and LinearTreeSHAP's N_v2). The
interpolation nodes are the first i entries of a depth-sized Chebyshev grid, so
the Vandermonde systems are ill-conditioned even for moderate sizes: precision
degrades silently from roughly size 30 (residuals > 1e-6 around 29, > 1e-3
around 37, O(1) around 45 on a depth-grid basis) and the matrix becomes exactly
singular for very deep trees, which crashed with an unexplained LinAlgError
(observed for trees of depth ~55-60).

This change centralises the solve in tree/_numerics.solve_vandermonde():
- np.linalg.solve instead of forming an explicit inverse,
- a RuntimeWarning when the condition number exceeds 1e12 (precision loss),
- a least-squares fallback instead of a hard crash when the system is singular,
  with a warning that values may be inaccurate.

Deep trees now explain (approximately) instead of crashing, and users are told
when the exactness contract can no longer be honoured. Minimal reproduction:
fit DecisionTreeRegressor(max_depth=60) on noise and construct
LinearTreeSHAP(tree) - previously LinAlgError, now a warning.
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.29730% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/shapiq/tree/_numerics.py 96.34% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the polynomial interpolation (Vandermonde) solves used by TreeSHAPIQ and LinearTreeSHAP to avoid silent numerical corruption and prevent LinAlgError crashes on deep trees, adding coalesced diagnostics warnings and unit test coverage.

Changes:

  • Introduces src/shapiq/tree/_numerics.py to centralize guarded Vandermonde solving with grid certification + diagnostics aggregation.
  • Switches TreeSHAPIQ and LinearTreeSHAP N-matrix construction to use solve_vandermonde(...) and emit one summary RuntimeWarning per matrix.
  • Adds unit tests covering fast-path equivalence, warning behavior, least-squares fallback, and an end-to-end deep-tree regression; updates CHANGELOG.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/shapiq/tree/_numerics.py New numerics helper for certified/checked Vandermonde solves plus coalesced warning emission.
src/shapiq/tree/treeshapiq.py Replaces explicit inverses with solve_vandermonde and coalesced diagnostics emission for TreeSHAPIQ N-matrices.
src/shapiq/tree/linear/explainer.py Replaces explicit inverse in get_N_v2 with guarded solve + coalesced diagnostics emission.
tests/shapiq/tests_unit/tests_explainer/tests_tree_explainer/test_tree_numerics.py New tests validating numerical behavior, warnings, and deep-tree regression coverage.
CHANGELOG.md Documents the behavior change (new warnings, least-squares fallback) under Unreleased.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/shapiq/tree/_numerics.py Outdated
Comment thread src/shapiq/tree/_numerics.py Outdated
@42logos 42logos marked this pull request as draft June 11, 2026 08:43
@42logos 42logos force-pushed the wu/fix-vandermonde-conditioning branch from c91e070 to 49742ed Compare June 11, 2026 10:32
@42logos 42logos force-pushed the wu/fix-vandermonde-conditioning branch from 49742ed to c053595 Compare June 11, 2026 10:43
@42logos 42logos changed the title fix(tree): guard the ill-conditioned Vandermonde solves in TreeSHAP-IQ fix(tree): solve the TreeSHAP Vandermonde systems exactly at every depth Jun 11, 2026
@42logos 42logos force-pushed the wu/fix-vandermonde-conditioning branch 2 times, most recently from 82effe4 to 30073e2 Compare June 12, 2026 12:41
42logos added 2 commits June 12, 2026 15:29
Replaces the guarded float64 solves with exact ones, removing all warning and
least-squares fallback paths:

- O(n^2) Bjorck-Pereyra dual recursion in scaled-integer fixed-point arithmetic
  (standard-library int), with a convergence certificate: each system is solved
  at increasing precision until two independent computations agree bitwise, so
  the result is the float64 rounding of the exact rational solution at any
  depth (a depth-100 grid's full prefix workload takes ~0.3 s). Nodes that
  collapse onto the same scaled integer at the first precision rung climb the
  ladder instead of being misreported as coincident.
- The previous inversion drifted at the ~1e-7 level from interpolation degree
  ~20, returned silently wrong values from ~32, and crashed (LinAlgError) at
  ~60+; the exact solves carry no conditioning error at any degree.
- The one remaining limit is representational: the monomial-basis N entries
  grow exponentially with the interpolation degree and the downstream float64
  pipeline cancels them, so beyond a measured magnitude bound (degree ~29 on
  the default grids, pinned by a boundary test) construction raises an
  explanatory RepresentationLimitError instead of returning silently wrong
  values; for order-1 Shapley values TreeExplainer re-routes affected trees to
  TreeSHAPIQ when its feature-bounded degree still fits.
- Tests: bitwise agreement with an exact rational-elimination oracle (including
  the previously rank-deficient sizes 28-45 and a clustered custom grid), the
  exact 29/30 accept/refuse boundary, deep-chain completeness at depths
  20/24/28, refusal at 35/60 on both the LinearTreeSHAP and TreeSHAPIQ paths,
  re-routing for SV and order-1 SII, precision-ladder climbing for nodes finer
  than the first rung's resolution, and cache isolation.
@42logos 42logos force-pushed the wu/fix-vandermonde-conditioning branch from 30073e2 to 73618de Compare June 12, 2026 14:11
@42logos 42logos marked this pull request as ready for review June 12, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

TreeSHAP-IQ Vandermonde solves: silent wrong values for interpolation degrees ~32-59, LinAlgError crash at ~60+

2 participants