Skip to content

Implement LeverageSHAP approximator#524

Open
FabianK-Dev wants to merge 54 commits into
mmschlk:mainfrom
FabianK-Dev:leverageSHAP
Open

Implement LeverageSHAP approximator#524
FabianK-Dev wants to merge 54 commits into
mmschlk:mainfrom
FabianK-Dev:leverageSHAP

Conversation

@FabianK-Dev

@FabianK-Dev FabianK-Dev commented May 22, 2026

Copy link
Copy Markdown

Motivation and Context

In this PR we added the implementation of the LeverageSHAP approximator, based on the paper by Musco & Witter (2024). We built (1) a custom sampler for leverage score sampling (uniform size + paired sampling) (2) implemented the regression solver using the row-centering trick (Lemma 3.1) and (3) added different test cases. We originally opened a PR in our fork (FabianK-Dev#1) which I closed now and I am reopening here.

A few disclaimers / important notes:

    1. We already commited our summary src/shapiq/approximator/regression/our_impl_progress.md (as requsted in the project description) but this file is stil work-in-progress and not really ready for review, yet.
    1. There are still more unittests to come: I estimate we only covered around 50 % of unittests.
    1. We will improve existing docstrings and add more later on.

Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

Details: Added LeverageSHAP class to shapiq.approximator.regression.


How Has This Been Tested?

We added many unittests to cover the following things:

  • Budget constraints, exact recovery on additive games and deterministic behavior.
  • Tested with large n to verify the IS weight cancellation prevents float overflows.
  • Tested with highly skewed interaction games to ensure the lstsq solver maintains the efficiency axiom on ill-conditioned matrices.
  • Verified the reproducibility requirement by testing that the mean L2 error drops significantly when increasing the budget on a synthetic game.

You can run all new unittests using:
uv run pytest tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py

Tests are passing:

uv run pytest tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py
warning: `VIRTUAL_ENV=/home/fabian/Dokumente/Uni/LMU/Master/Semester2/Toolbox/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
====================================================================================== test session starts ======================================================================================
platform linux -- Python 3.13.5, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/fabian/Dokumente/Uni/LMU/Master/Semester2/Toolbox/shapiq
configfile: pyproject.toml
plugins: xdist-3.8.0, cov-7.0.0, anyio-4.12.1
collected 15 items                                                                                                                                                                              

tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py ...............                                                                                             [100%]

====================================================================================== 15 passed in 0.20s =======================================================================================

Checklist

We haven't completed all points on the checklist yet, as this is still an early PR where we ask for feedback but don't plan to merge it into the upstream repository, yet.

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • An entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

42logos added 3 commits May 11, 2026 02:46
Forward-looking spec for the 3 new SV approximators (LeverageSHAP,
PolySHAP, OddSHAP). Approximator classes are looked up dynamically by
name, so the file auto-skips classes that have not yet been registered
in shapiq.approximator. As each implementation lands, the corresponding
parametrizations activate.

- Interface conformance (always required): index='SV', n_players,
  max_order/min_order, values shape and dtype, interaction_lookup.
- Numerical convergence vs ExactComputer (xfail strict=False): atol
  schedule by budget percentage.
- Determinism: same (n, random_state, budget, game) -> bit-identical
  output.

75 tests, all currently SKIP on main. Will activate as classes land.
Honors the cross-method testing platform promised to the tutor:
unified harness covering every SV approximator in shapiq (the
existing 11 — KernelSHAP, SVARM, Permutation*, ProxySPEX, ... — and
the 3 new ones from this project) instead of only the new line-up.

Approximator list is sourced dynamically from
shapiq.approximator.SV_APPROXIMATORS (canonical registry) plus the
3 new project names, deduplicated. Future shapiq additions land in
the harness automatically.

Split into two scopes:

  * test_interface_conformance — strict shape/dtype/index/lookup
    contract from the API spec. Applied ONLY to the 3 new
    approximators (the contract is ours; existing methods have
    different default output conventions like ProxySPEX defaulting
    to FBII and max_order=n).

  * test_numerical_convergence_vs_exact + test_determinism — apply
    to ALL SV approximators. Cross-method comparison against
    ExactComputer ground truth on identical SOUM games. xfail with
    strict=False so methods that do converge surface as XPASS;
    methods still under development surface as XFAIL.

Two robustness helpers:

  * _construct_or_skip — tries (n=, index='SV', max_order=1,
    random_state=) first (covers multi-index methods like SPEX,
    ProxySPEX, ProxySHAP, MSRBiased, kADDSHAP), then falls back to
    minimal signature for SV-only methods (KernelSHAP, OwenSamplingSV).

  * _safe_approximate — skips on ValueError raised by approximators
    that explicitly refuse a regime (e.g. SPEX 'Insufficient budget
    to compute the transform' at low budgets).

Results: 10 passed, 95 skipped, 90 xfailed, 23 xpassed. The 23
xpassed are existing shapiq SV methods that converge cleanly at
full budget on small SOUM — a useful baseline for the upcoming
benchmark report.
Drop-in framework that any teammate can merge into their feature branch
to run head-to-head benchmarks against ExactComputer across every SV
approximator in shapiq, then plot the standard SHAP-literature metric
curves. No source files are modified — adds a top-level benchmark/
package, a single test file, and a small in-place test-helper sys.path
hook. Does not touch pyproject.toml or any other upstream config.

Files added:

  * benchmark/__init__.py: makes the runner a proper Python package so
    invocation is 'python -m benchmark.performance'.

  * benchmark/_discovery.py: single source of truth for SV approximator
    discovery + SV-mode construction. Holds:
      - PROJECT_APPROXIMATOR_NAMES: LeverageSHAP, PolySHAP,
        PolySHAPKAdd / Partial / Prior, OddSHAP.
      - _SV_CONSTRUCT_OVERRIDES: per-class kwargs for non-standard
        constructors (PolySHAP variants need max_order /
        n_explanation_terms / q_prior).
      - construct_for_sv(): three-stage construction (override ->
        explicit SV signature -> minimal signature), returning
        (estimator, exc) so the caller can report the most informative
        exception. A ValueError from inside a matched signature wins
        over a TypeError from a signature mismatch.
      - safe_approximate(): catches ValueError and RuntimeError so
        sparse approximators that refuse a budget regime (SPEX,
        ProxySPEX, ...) skip the cell cleanly instead of crashing.

  * benchmark/performance.py: CLI runner that consumes _discovery,
    sweeps (method, game, budget, seed), records every cell in a
    long-format CSV, and emits one PNG per (game, metric) plus a
    runtime PNG. Seven metrics chosen from the union of LeverageSHAP,
    PolySHAP, OddSHAP and shapiq.benchmark.metrics literature:
    MSE / MAE / SSE / SAE / Precision@5 / Precision@10 / KendallTau.
    Includes a '--check' interface-probe mode that prints a
    constructibility table without running a sweep.

  * benchmark/README.md: usage doc covering merge workflow, --check,
    sweep CLI, output layout, CSV format, metric definitions, plot
    conventions, and notes on the multi-index approximators that need
    explicit (index='SV', max_order=1).

Files modified:

  * tests/shapiq/tests_unit/tests_approximators/test_approximators_vs_exact.py:
    now imports the shared helpers from benchmark._discovery via a
    tightly-scoped sys.path hook at the top of the file. Picks up the
    ValueError-priority construction and the RuntimeError-catch that
    the test file previously did not have. Interface conformance is
    now applied to the project's six new approximator names
    (LeverageSHAP, PolySHAP + 3 variants, OddSHAP), so Matthias's
    PolySHAP variants are no longer silently skipped by the contract
    check.

Verified locally:

  * pytest test_approximators_vs_exact.py: 10 passed, 170 skipped,
    87 xfailed, 26 xpassed. No failures.
  * python -m benchmark.performance --check: surfaces all 17 method
    names (11 existing on main + 6 project additions) correctly.
  * Drop-in compatibility verified by temporary merge into all three
    feature branches (oddshap_approximator, leverageSHAP, PolySHAP) —
    clean merge in each, --check picks up the local approximator.
@FabianK-Dev

Copy link
Copy Markdown
Author

Finally all tests pass again after:

  • Refactoring leverageshap.py to move its _solve() method into the base regression class in 274207e and 66c4e6c
  • Updating tests in 433fb7a to test a list of pre-defined seeds to prevent and accidental overfitting on seeds and fix tests accordingly in 348db3b
========================================== 1270 passed, 166 skipped, 124 xfailed, 32 xpassed, 5181 warnings in 956.40s (0:15:56) ==========================================

real    16m2,138s
user    119m51,705s
sys     0m37,665s

Theresa Geber and others added 24 commits June 8, 2026 13:57
…tions set, Z_list and probs_list and create all-true and all-false coalitions
…ferent game variables to avoid access counters interfering; Also compare metadata
…d tiny-n edge case and add comments to document and explain the test
…use its core claim was not reliable

With n = 6, a budget of 100 is above 2^n = 64, so the implementation enters the full-budget/exact regime.  In that regime, the result should be identical no matter which seed you use, so asserting that different seeds must differ is false and will fail even though the code is correct.
=> I lowered the budget to budget=20
…o test_exact_regime_seed_independence and test_stochastic_regime_seed_variability
…est_exact_matches_multiple_small_games, test_null_player_axiom and test_minimal_budget_sweep

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new regression-based Shapley value approximator, LeverageSHAP, and refactors the regression solver to support a more robust SVD-backed path, with extensive unit tests validating numerical stability and reproducibility.

Changes:

  • Implement LeverageSHAP (Musco & Witter, 2024) with leverage-score-guided paired coalition sampling and centered regression.
  • Refactor regression solving into a shared solve_regression(..., use_svd=...) utility + Regression.solve_regression() wrapper, adding numerical guards and fallback behavior.
  • Add comprehensive unit tests for LeverageSHAP behavior and for regression-solver edge cases.

Reviewed changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/shapiq/tests_unit/tests_approximators/test_approximator_regression_base.py Adds targeted tests for new solve_regression(..., use_svd=...) behavior and guards.
tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py Adds a full unit test suite for LeverageSHAP (accuracy, axioms, seeds, numerical stability).
src/shapiq/approximator/regression/leverageshap.py New LeverageSHAP approximator implementation including custom sampling and IS reweighting.
src/shapiq/approximator/regression/base.py Introduces solve_regression(..., use_svd=...) + class wrapper and updates internal call sites.
src/shapiq/approximator/regression/init.py Exports LeverageSHAP from the regression approximators package.
src/shapiq/approximator/init.py Exposes LeverageSHAP at the top-level approximator API and lists it in SV_APPROXIMATORS.
notebooks/data/communities.names Adds UCI Communities & Crime metadata used by an existing notebook.
.gitignore Ignores benchmark result output files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/shapiq/approximator/regression/leverageshap.py Outdated
Comment thread src/shapiq/approximator/regression/leverageshap.py Outdated
Comment thread tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py Outdated

@Advueu963 Advueu963 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a very nice implementation and nice safeguards implemented for large n. Yet I think we can (a) get rid of the log procedure you introduced, as the _find_c is already the bottleneck for large n, if I see this correctly^^.

Also I have left a comment on how the CoalitionSampler can be used, to produce very similar sampling procedures. See Figure 9 of the Leverageshape paper, of where the differences lie. The CoalitionSampler in our code is somewhat doing without BernoulliSampling.

Comment thread notebooks/data/communities.data Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this file really necessary? Can you not use the load_communities_and_crime function in shapiq_games?

@FabianK-Dev FabianK-Dev Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for pointing this out! I implemented your suggestion in b98d617. Changes:

  • Replace custom dataset loading with shapiq_games.datasets.load_communities_and_crime() and refactor all code cells accordingly
  • Also load communities data set and extend codes cells to skip computing exact shapley values for n > 15 (for Communities dataset)
  • Skip experiments with communities dataset where exact_svs is None
  • Add fallback to src/shapiq/approximator/regression/leverageshap._sample_without_replacement() for astronomically large binomial pools

I committed the ruff style-fixes in a separate commit (ad74f97) so that the diff doesn't blow up in one commit. I had to ignore some ruff rules in a comment in the first cell as I believe it makes sense to ignore those rules in a scientific notebook environment:

# ruff: noqa: T201, RUF001, RUF002, RUF003, E402
# Justification for rule suppressions:
# - T201 (print found): Standard print statements are intentionally utilized for inline
#   execution logging. Standard logging modules would introduce unnecessary verbosity,
#   thereby reducing the readability of the notebook's experimental flow.
# - RUF001 / RUF002 / RUF003 (Ambiguous characters): The inclusion of specific typographic
#   symbols (such as mathematical multiplication or minus signs) is intentional to maintain
#   standard notation and ensure formal clarity within text cells and documentation strings.
# - E402 (Import not at top): In an interactive notebook environment, contextualizing
#   imports within specific cells ensures logical modularity and encapsulation. This prevents
#   unnecessary global scope clutter and allows for isolated cell execution during
#   iterative experimentation without re-running the initial setup.

Could you please review the changes? Thank you.

Comment thread notebooks/data/communities.names Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this I think is not necessary as shapiq_games should already have the capability to load communities_and_crime via shapiq_games.

@FabianK-Dev FabianK-Dev Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment above: #524 (comment)

Comment on lines +182 to +192
for i, s in enumerate(sizes): # for each sampled coalition of size s
log_w = (
math.lgamma(s) + math.lgamma(n - s) - math.lgamma(n + 1)
) # log Shapley kernel w(s)
log_C = (
math.lgamma(n + 1) - math.lgamma(s + 1) - math.lgamma(n - s + 1)
) # log C(n,s)
log_p = log_2c - log_C # log(2c * l_z) = log(2c / C(n,s))
log_min_p = min(0.0, log_p) # cap probability at 1 (log 1 = 0)
log_weights[i] = log_w - log_min_p # IS weight = w(s) / min(1, 2c*l_z)
log_weights -= log_weights.max() # shift so exp doesn't overflow

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you are guarding against the size of s, due to the explosino with high numbers. But actually the algorithm will already fail with _find_c, as you there need also to construct all the binom terms.
So I would rather argue that this might not even be suitable here, and could be removed to ensure more clarity. But I am open to other opinions on this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I implemented the changes here: c28409b

Could you please review the changes?

if target <= 0:
return 0.0 # nothing left to sample beyond empty + grand

binoms = [float(math.comb(n, s)) for s in range(1, n)] # C(n,s) for each interior size

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part, where already for large n the algorithm will collapse.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen that you have done quite some work in reproducing Algorithm 2 and 3 from the paper. Real nice job! Yet I would like to point out that the CoalitionSampler inherently already does support sampling based on leverage scores using sampling_weights=np.ones(n_players + 1). Most importantly this should be very similar to the approach you have implemented here, whilst you are somewhat more efficient with how you deal with duplicates (you avoid them completly; the CoalitionSampler accounts for them in sampling adjustment weights). But as n increases it becomes quite difficult to differentiate those in total. So I would (before merging) be interested in the different of your implementation with that implementation using the CoalitionSampler with the weights described above. It should then come down to something very similar as depicted in Figure 9 of the LeverageSHAP paper.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I wrote a new Jupyter NB in e98266a that reproduces what you are asking for. In the NB I compare only KernelSHAP, only LeverageSHAP (with our custom bernoulli sampling implementation), LeverageSHAP without Bernoulli by using a class override inside the NB to force using CoalitionSampler instead of our custom sampler and additionally KernelSHAP with sampling_weights=np.ones(n_players + 1) set. From my understanding KernelSHAP with sampling_weights=np.ones(n_players + 1) should behave exactly the same as "LeverageSHAP without Bernoulli" (assuming same fixed seed) but for illustration purposes I plot both.

Below you can find the results (you can also directly view them inside the committed NB). As you can see our custom implementation consistently beats KernelSHAP + np.ones / LeverageSHAP w/o Bernoulli after a certain budget count (for m >= 42 consistently). Because of that, I believe keeping the custom Bernoulli sampling is worth the extra lines of code. What do you think/suggest?

The plot:
image

The plot in words:

Median ℓ₂-norm error (Lower is better):
----------------------------------------------------------------------------------------------------
            KernelSHAP (Standard)  LeverageSHAP w/o Bernoulli (Override)  KernelSHAP + np.ones (Reviewer Setup)  LeverageSHAP (Custom Bernoulli)
Budget (m)                                                                                                                                      
2                         0.90230                                0.90230                                0.90230                          0.90230
8                         0.77130                                0.77382                                0.77382                          0.77667
15                        0.45012                                0.27952                                0.28783                          0.12703
22                        0.04356                                0.04092                                0.04092                          0.04499
29                        0.06454                                0.05521                                0.05521                          0.02961
36                        0.02105                                0.02224                                0.02224                          0.02422
42                        0.01673                                0.02007                                0.02007                          0.01984
49                        0.03694                                0.03514                                0.03514                          0.01719
56                        0.01262                                0.01592                                0.01592                          0.01398
63                        0.02596                                0.02355                                0.02355                          0.01203
70                        0.01104                                0.01159                                0.01159                          0.01040
77                        0.02186                                0.01517                                0.01517                          0.00896
83                        0.01464                                0.01680                                0.01680                          0.00897
90                        0.00835                                0.00917                                0.00917                          0.00890
97                        0.01366                                0.01357                                0.01357                          0.00786
104                       0.00740                                0.00845                                0.00845                          0.00712
111                       0.01297                                0.01079                                0.01079                          0.00661
118                       0.00647                                0.00769                                0.00769                          0.00667
124                       0.00605                                0.00705                                0.00705                          0.00600
131                       0.00841                                0.00997                                0.00997                          0.00521
138                       0.00559                                0.00650                                0.00650                          0.00450
145                       0.00964                                0.00817                                0.00817                          0.00459
152                       0.00429                                0.00603                                0.00603                          0.00433
159                       0.00688                                0.00645                                0.00645                          0.00364
165                       0.00591                                0.00551                                0.00551                          0.00323
172                       0.00377                                0.00363                                0.00363                          0.00329
179                       0.00499                                0.00565                                0.00565                          0.00308
186                       0.00329                                0.00340                                0.00340                          0.00276
193                       0.00448                                0.00467                                0.00467                          0.00248
200                       0.00286                                0.00289                                0.00289                          0.00249

====================================================================================================
DIRECT COMPARISON: Custom Bernoulli vs. Reviewer Setup (np.ones)
====================================================================================================
m=2  : Reviewer Setup WINS! Error is  0.0% lower.
m=8  : Reviewer Setup WINS! Error is  0.4% lower.
m=15 : Custom Sampling WINS! Error is 55.9% lower.
m=22 : Reviewer Setup WINS! Error is  9.9% lower.
m=29 : Custom Sampling WINS! Error is 46.4% lower.
m=36 : Reviewer Setup WINS! Error is  8.9% lower.
m=42 : Custom Sampling WINS! Error is  1.1% lower.
m=49 : Custom Sampling WINS! Error is 51.1% lower.
m=56 : Custom Sampling WINS! Error is 12.2% lower.
m=63 : Custom Sampling WINS! Error is 48.9% lower.
m=70 : Custom Sampling WINS! Error is 10.3% lower.
m=77 : Custom Sampling WINS! Error is 41.0% lower.
m=83 : Custom Sampling WINS! Error is 46.6% lower.
m=90 : Custom Sampling WINS! Error is  2.9% lower.
m=97 : Custom Sampling WINS! Error is 42.1% lower.
m=104: Custom Sampling WINS! Error is 15.8% lower.
m=111: Custom Sampling WINS! Error is 38.8% lower.
m=118: Custom Sampling WINS! Error is 13.3% lower.
m=124: Custom Sampling WINS! Error is 14.9% lower.
m=131: Custom Sampling WINS! Error is 47.7% lower.
m=138: Custom Sampling WINS! Error is 30.8% lower.
m=145: Custom Sampling WINS! Error is 43.8% lower.
m=152: Custom Sampling WINS! Error is 28.2% lower.
m=159: Custom Sampling WINS! Error is 43.6% lower.
m=165: Custom Sampling WINS! Error is 41.4% lower.
m=172: Custom Sampling WINS! Error is  9.4% lower.
m=179: Custom Sampling WINS! Error is 45.5% lower.
m=186: Custom Sampling WINS! Error is 18.8% lower.
m=193: Custom Sampling WINS! Error is 46.8% lower.
m=200: Custom Sampling WINS! Error is 13.9% lower.

@FabianK-Dev

Copy link
Copy Markdown
Author

Overall a very nice implementation and nice safeguards implemented for large n. Yet I think we can (a) get rid of the log procedure you introduced, as the _find_c is already the bottleneck for large n, if I see this correctly^^.

Also I have left a comment on how the CoalitionSampler can be used, to produce very similar sampling procedures. See Figure 9 of the Leverageshape paper, of where the differences lie. The CoalitionSampler in our code is somewhat doing without BernoulliSampling.

Dear @Advueu963, thank you very much for the feedback! I will implement your suggestions asap.

FabianK-Dev and others added 8 commits June 14, 2026 22:59
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- I accidentally only accepted one out of three autofix suggestions by Copilot; the other two are now marked as outdated and can't be committed inside GitHub anymore which is why I commit them manually
- "_find_c() converts binomial coefficients to float (`float(math.comb(...))`). For large `n`, `math.comb(n, s)` can exceed the maximum finite float and become `inf`, which then makes `hi` infinite and the bisection never meaningfully converges (returning `inf` for `c`). This can break sampling/weight computation for large-player games. Keep binomials as Python ints and choose an upper bound for `c` via exponential search (or another integer-safe strategy) instead of `max_binom/2` in float space."
- "_sample_without_replacement() uses rejection sampling for `total >= 10**6`. This is only efficient when `total >> k`, but in this implementation `k` can be a large fraction of `total` (e.g., when `prob` is close to 1 but still < 1). In that case the `while len(seen) < k:` loop can take an extremely long time due to heavy collisions. Python's `random.sample()` supports sampling directly from `range(total)` without materializing it and handles the `k` vs `total` regime robustly, so it's safer to use it for the large-pool case too."
- mmschlk#524
…ities_and_crime() and added fallback mechanism in the random sampling method

- Implement feedback from mmschlk#524
- Replace custom dataset loading with shapiq_games.datasets.load_communities_and_crime() and refactor all code cells accordingly
- Also load communities data set and extend codes cells to skip computing exact shapley values for n > 15 (for Communities dataset)
- Skip experiments with communities dataset where exact_svs is None
- Add fallback to src/shapiq/approximator/regression/leverageshap._sample_without_replacement() for astronomically large binomial pools
…ges into this separate commit) for easier reviewing
…s, boolean traps, and df naming in SOUM notebook
…AP (with our custom bernoulli sampling), LeverageSHAP w/o bernoulli using class override and KernelSHAP + np.ones
@FabianK-Dev

Copy link
Copy Markdown
Author

Dead @mmschlk, I refactored the solve_regression() method in src/shapiq/approximator/regression/base.py as discussed in our meeting and removed a (now outdated/unused) old unit test in e70f9d6 and solved all remaining ruff style warnings in 67d05fc.

The method is now cleaned up, more readable introduces the "safe" try-except code block as well as the use_svd parameter that will be exclusively used by LeverageSHAP. I removed the unsafe code blocks that would return np.nan. It now throws errors instead.

Could you please review the changes? Thank you.

I've also run the following commands locally to ensure all code quality checks (pre-commit), unit-tests and coverage pass (the same commands like in the GitHub CI pipeline):

uv sync --group lint --all-extras
uv run pre-commit run --all-files --show-diff-on-failure

uv sync --all-extras
uv run pytest "tests/shapiq" --cov=shapiq --cov-report=term -n logical

uv sync --no-dev --all-extras --group all_ml
uv run --no-sync python -c "import shapiq; print('✅ shapiq imported successfully')"
uv run --no-sync python -c "import shapiq_games; print('✅ shapiq_games imported successfully')"

I've you're okay with the changes, please feel free to re-run the workflow in Github.

@FabianK-Dev FabianK-Dev requested a review from Advueu963 June 18, 2026 15:35
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.65217% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/shapiq/approximator/regression/leverageshap.py 95.10% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@FabianK-Dev

Copy link
Copy Markdown
Author

Thanks for running the CI. I see that the documentation is not building correctly and that some lines are still missing coverage. I'll have a look at it ASAP.

@FabianK-Dev

Copy link
Copy Markdown
Author

@Advueu963 As requested in the last meeting, I have updated and expanded the benchmark notebook to evaluate our custom LeverageSHAP implementation (Algorithm 2) against the uniform KernelSHAP baseline across 25 configurations and 4 distinct datasets. The results demonstrate that our method surpasses KernelSHAP + np.zeros, with the custom implementation achieving a lower absolute L2 error and winning in 22 to 28 out of 30 budget steps across almost every configuration (please see the table below). Also, our Bernoulli sampling mechanism resolves the "zig-zag" pattern seen in 9 out of 25 baseline plots for KernelSHAP + np.zeros. I assume the "zig-zag" occurs when the KernelSHAP + np.zeros setup runs out of budget mid-layer.

Some configurations show extreme negative percentages in the relative "Avg Improvement" column. I think this caused when the baseline randomly hits a near-zero error in its symmetry valleys which then distorts the relative average. The fully cleaned and documented notebook is now ready for review. I also added a "Empirical Evaluation: Custom LeverageSHAP vs. Uniform Weighting Baseline" section which describes and tries to explain the observations. We also plot every single experiment (i.e. each line in the table below in the Jupyter NB).

image

You can also find the Jupyter NB exported in the PDF here:
reproduce_figure9_sampling_architecture.ipynb.pdf

…NBs will not be pushed to the shapiq repository according to the last meeting)
@FabianK-Dev

Copy link
Copy Markdown
Author

@Advueu963 I moved the notebooks located in notebooks/ folder to a PR on our fork because if I understood it correctly, as discussed with @pwhofman the notebooks, discussion files and 1-page summary will not be pushed to mmschlk/shapiq, thus I'm cleaning up this PR so it gets ready to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants