Implement LeverageSHAP approximator by FabianK-Dev · Pull Request #524 · mmschlk/shapiq

FabianK-Dev · 2026-05-22T13:19:51Z

Motivation and Context

In this PR we added the implementation of the LeverageSHAP approximator, based on the paper by Musco & Witter (2024). We built (1) a custom sampler for leverage score sampling (uniform size + paired sampling) (2) implemented the regression solver using the row-centering trick (Lemma 3.1) and (3) added different test cases. We originally opened a PR in our fork (FabianK-Dev#1) which I closed now and I am reopening here.

A few disclaimers / important notes:

1. We already commited our summary src/shapiq/approximator/regression/our_impl_progress.md (as requsted in the project description) but this file is stil work-in-progress and not really ready for review, yet.
1. There are still more unittests to come: I estimate we only covered around 50 % of unittests.
1. We will improve existing docstrings and add more later on.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

Details: Added LeverageSHAP class to shapiq.approximator.regression.

How Has This Been Tested?

We added many unittests to cover the following things:

Budget constraints, exact recovery on additive games and deterministic behavior.
Tested with large n to verify the IS weight cancellation prevents float overflows.
Tested with highly skewed interaction games to ensure the lstsq solver maintains the efficiency axiom on ill-conditioned matrices.
Verified the reproducibility requirement by testing that the mean L2 error drops significantly when increasing the budget on a synthetic game.

You can run all new unittests using:
uv run pytest tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py

Tests are passing:

uv run pytest tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py
warning: `VIRTUAL_ENV=/home/fabian/Dokumente/Uni/LMU/Master/Semester2/Toolbox/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
====================================================================================== test session starts ======================================================================================
platform linux -- Python 3.13.5, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/fabian/Dokumente/Uni/LMU/Master/Semester2/Toolbox/shapiq
configfile: pyproject.toml
plugins: xdist-3.8.0, cov-7.0.0, anyio-4.12.1
collected 15 items                                                                                                                                                                              

tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py ...............                                                                                             [100%]

====================================================================================== 15 passed in 0.20s =======================================================================================

Checklist

We haven't completed all points on the checklist yet, as this is still an early PR where we ask for feedback but don't plan to merge it into the upstream repository, yet.

The changes have been tested locally.
~~Documentation has been updated (if the public API or usage changes).~~
~~An entry has been added to CHANGELOG.md (if relevant for users).~~
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

Forward-looking spec for the 3 new SV approximators (LeverageSHAP, PolySHAP, OddSHAP). Approximator classes are looked up dynamically by name, so the file auto-skips classes that have not yet been registered in shapiq.approximator. As each implementation lands, the corresponding parametrizations activate. - Interface conformance (always required): index='SV', n_players, max_order/min_order, values shape and dtype, interaction_lookup. - Numerical convergence vs ExactComputer (xfail strict=False): atol schedule by budget percentage. - Determinism: same (n, random_state, budget, game) -> bit-identical output. 75 tests, all currently SKIP on main. Will activate as classes land.

Honors the cross-method testing platform promised to the tutor: unified harness covering every SV approximator in shapiq (the existing 11 — KernelSHAP, SVARM, Permutation*, ProxySPEX, ... — and the 3 new ones from this project) instead of only the new line-up. Approximator list is sourced dynamically from shapiq.approximator.SV_APPROXIMATORS (canonical registry) plus the 3 new project names, deduplicated. Future shapiq additions land in the harness automatically. Split into two scopes: * test_interface_conformance — strict shape/dtype/index/lookup contract from the API spec. Applied ONLY to the 3 new approximators (the contract is ours; existing methods have different default output conventions like ProxySPEX defaulting to FBII and max_order=n). * test_numerical_convergence_vs_exact + test_determinism — apply to ALL SV approximators. Cross-method comparison against ExactComputer ground truth on identical SOUM games. xfail with strict=False so methods that do converge surface as XPASS; methods still under development surface as XFAIL. Two robustness helpers: * _construct_or_skip — tries (n=, index='SV', max_order=1, random_state=) first (covers multi-index methods like SPEX, ProxySPEX, ProxySHAP, MSRBiased, kADDSHAP), then falls back to minimal signature for SV-only methods (KernelSHAP, OwenSamplingSV). * _safe_approximate — skips on ValueError raised by approximators that explicitly refuse a regime (e.g. SPEX 'Insufficient budget to compute the transform' at low budgets). Results: 10 passed, 95 skipped, 90 xfailed, 23 xpassed. The 23 xpassed are existing shapiq SV methods that converge cleanly at full budget on small SOUM — a useful baseline for the upcoming benchmark report.

Drop-in framework that any teammate can merge into their feature branch to run head-to-head benchmarks against ExactComputer across every SV approximator in shapiq, then plot the standard SHAP-literature metric curves. No source files are modified — adds a top-level benchmark/ package, a single test file, and a small in-place test-helper sys.path hook. Does not touch pyproject.toml or any other upstream config. Files added: * benchmark/__init__.py: makes the runner a proper Python package so invocation is 'python -m benchmark.performance'. * benchmark/_discovery.py: single source of truth for SV approximator discovery + SV-mode construction. Holds: - PROJECT_APPROXIMATOR_NAMES: LeverageSHAP, PolySHAP, PolySHAPKAdd / Partial / Prior, OddSHAP. - _SV_CONSTRUCT_OVERRIDES: per-class kwargs for non-standard constructors (PolySHAP variants need max_order / n_explanation_terms / q_prior). - construct_for_sv(): three-stage construction (override -> explicit SV signature -> minimal signature), returning (estimator, exc) so the caller can report the most informative exception. A ValueError from inside a matched signature wins over a TypeError from a signature mismatch. - safe_approximate(): catches ValueError and RuntimeError so sparse approximators that refuse a budget regime (SPEX, ProxySPEX, ...) skip the cell cleanly instead of crashing. * benchmark/performance.py: CLI runner that consumes _discovery, sweeps (method, game, budget, seed), records every cell in a long-format CSV, and emits one PNG per (game, metric) plus a runtime PNG. Seven metrics chosen from the union of LeverageSHAP, PolySHAP, OddSHAP and shapiq.benchmark.metrics literature: MSE / MAE / SSE / SAE / Precision@5 / Precision@10 / KendallTau. Includes a '--check' interface-probe mode that prints a constructibility table without running a sweep. * benchmark/README.md: usage doc covering merge workflow, --check, sweep CLI, output layout, CSV format, metric definitions, plot conventions, and notes on the multi-index approximators that need explicit (index='SV', max_order=1). Files modified: * tests/shapiq/tests_unit/tests_approximators/test_approximators_vs_exact.py: now imports the shared helpers from benchmark._discovery via a tightly-scoped sys.path hook at the top of the file. Picks up the ValueError-priority construction and the RuntimeError-catch that the test file previously did not have. Interface conformance is now applied to the project's six new approximator names (LeverageSHAP, PolySHAP + 3 variants, OddSHAP), so Matthias's PolySHAP variants are no longer silently skipped by the contract check. Verified locally: * pytest test_approximators_vs_exact.py: 10 passed, 170 skipped, 87 xfailed, 26 xpassed. No failures. * python -m benchmark.performance --check: surfaces all 17 method names (11 existing on main + 6 project additions) correctly. * Drop-in compatibility verified by temporary merge into all three feature branches (oddshap_approximator, leverageSHAP, PolySHAP) — clean merge in each, --check picks up the local approximator.

FabianK-Dev · 2026-05-30T23:32:08Z

Finally all tests pass again after:

Refactoring leverageshap.py to move its _solve() method into the base regression class in 274207e and 66c4e6c
Updating tests in 433fb7a to test a list of pre-defined seeds to prevent and accidental overfitting on seeds and fix tests accordingly in 348db3b

========================================== 1270 passed, 166 skipped, 124 xfailed, 32 xpassed, 5181 warnings in 956.40s (0:15:56) ==========================================

real    16m2,138s
user    119m51,705s
sys     0m37,665s

…tions set, Z_list and probs_list and create all-true and all-false coalitions

…x pre-commit errors

…determinism on LeverageSHAP()

…ferent game variables to avoid access counters interfering; Also compare metadata

…ames produce (slightly) different outputs

…ncreased budget

…d tiny-n edge case and add comments to document and explain the test

…use its core claim was not reliable With n = 6, a budget of 100 is above 2^n = 64, so the implementation enters the full-budget/exact regime. In that regime, the result should be identical no matter which seed you use, so asserting that different seeds must differ is false and will fail even though the code is correct. => I lowered the budget to budget=20

…o test_exact_regime_seed_independence and test_stochastic_regime_seed_variability

…stochastic regime

…est_exact_matches_multiple_small_games, test_null_player_axiom and test_minimal_budget_sweep

…er results

…her n to avoid minimal floating errors

…to base regression class

Copilot

Pull request overview

This PR adds a new regression-based Shapley value approximator, LeverageSHAP, and refactors the regression solver to support a more robust SVD-backed path, with extensive unit tests validating numerical stability and reproducibility.

Changes:

Implement LeverageSHAP (Musco & Witter, 2024) with leverage-score-guided paired coalition sampling and centered regression.
Refactor regression solving into a shared solve_regression(..., use_svd=...) utility + Regression.solve_regression() wrapper, adding numerical guards and fallback behavior.
Add comprehensive unit tests for LeverageSHAP behavior and for regression-solver edge cases.

Reviewed changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/shapiq/tests_unit/tests_approximators/test_approximator_regression_base.py	Adds targeted tests for new `solve_regression(..., use_svd=...)` behavior and guards.
tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py	Adds a full unit test suite for LeverageSHAP (accuracy, axioms, seeds, numerical stability).
src/shapiq/approximator/regression/leverageshap.py	New LeverageSHAP approximator implementation including custom sampling and IS reweighting.
src/shapiq/approximator/regression/base.py	Introduces `solve_regression(..., use_svd=...)` + class wrapper and updates internal call sites.
src/shapiq/approximator/regression/init.py	Exports `LeverageSHAP` from the regression approximators package.
src/shapiq/approximator/init.py	Exposes `LeverageSHAP` at the top-level approximator API and lists it in `SV_APPROXIMATORS`.
notebooks/data/communities.names	Adds UCI Communities & Crime metadata used by an existing notebook.
.gitignore	Ignores benchmark result output files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Advueu963

Overall a very nice implementation and nice safeguards implemented for large n. Yet I think we can (a) get rid of the log procedure you introduced, as the _find_c is already the bottleneck for large n, if I see this correctly^^.

Also I have left a comment on how the CoalitionSampler can be used, to produce very similar sampling procedures. See Figure 9 of the Leverageshape paper, of where the differences lie. The CoalitionSampler in our code is somewhat doing without BernoulliSampling.

Advueu963 · 2026-06-13T17:44:36Z

It this file really necessary? Can you not use the load_communities_and_crime function in shapiq_games?

Good point, thanks for pointing this out! I implemented your suggestion in b98d617. Changes:

Replace custom dataset loading with shapiq_games.datasets.load_communities_and_crime() and refactor all code cells accordingly

Also load communities data set and extend codes cells to skip computing exact shapley values for n > 15 (for Communities dataset)

Skip experiments with communities dataset where exact_svs is None

Add fallback to src/shapiq/approximator/regression/leverageshap._sample_without_replacement() for astronomically large binomial pools

I committed the ruff style-fixes in a separate commit (ad74f97) so that the diff doesn't blow up in one commit. I had to ignore some ruff rules in a comment in the first cell as I believe it makes sense to ignore those rules in a scientific notebook environment:

# ruff: noqa: T201, RUF001, RUF002, RUF003, E402 # Justification for rule suppressions: # - T201 (print found): Standard print statements are intentionally utilized for inline # execution logging. Standard logging modules would introduce unnecessary verbosity, # thereby reducing the readability of the notebook's experimental flow. # - RUF001 / RUF002 / RUF003 (Ambiguous characters): The inclusion of specific typographic # symbols (such as mathematical multiplication or minus signs) is intentional to maintain # standard notation and ensure formal clarity within text cells and documentation strings. # - E402 (Import not at top): In an interactive notebook environment, contextualizing # imports within specific cells ensures logical modularity and encapsulation. This prevents # unnecessary global scope clutter and allows for isolated cell execution during # iterative experimentation without re-running the initial setup.

Could you please review the changes? Thank you.

Advueu963 · 2026-06-13T17:45:16Z

As this I think is not necessary as shapiq_games should already have the capability to load communities_and_crime via shapiq_games.

Please see my comment above: #524 (comment)

Advueu963 · 2026-06-13T19:05:38Z

+            for i, s in enumerate(sizes):  # for each sampled coalition of size s
+                log_w = (
+                    math.lgamma(s) + math.lgamma(n - s) - math.lgamma(n + 1)
+                )  # log Shapley kernel w(s)
+                log_C = (
+                    math.lgamma(n + 1) - math.lgamma(s + 1) - math.lgamma(n - s + 1)
+                )  # log C(n,s)
+                log_p = log_2c - log_C  # log(2c * l_z) = log(2c / C(n,s))
+                log_min_p = min(0.0, log_p)  # cap probability at 1 (log 1 = 0)
+                log_weights[i] = log_w - log_min_p  # IS weight = w(s) / min(1, 2c*l_z)
+            log_weights -= log_weights.max()  # shift so exp doesn't overflow


I understand why you are guarding against the size of s, due to the explosino with high numbers. But actually the algorithm will already fail with _find_c, as you there need also to construct all the binom terms.
So I would rather argue that this might not even be suitable here, and could be removed to ensure more clarity. But I am open to other opinions on this.

Thank you. I implemented the changes here: c28409b

Could you please review the changes?

Advueu963 · 2026-06-13T19:06:57Z

+        if target <= 0:
+            return 0.0  # nothing left to sample beyond empty + grand
+
+        binoms = [float(math.comb(n, s)) for s in range(1, n)]  # C(n,s) for each interior size


This is the part, where already for large n the algorithm will collapse.

Advueu963 · 2026-06-13T19:20:40Z

I have seen that you have done quite some work in reproducing Algorithm 2 and 3 from the paper. Real nice job! Yet I would like to point out that the CoalitionSampler inherently already does support sampling based on leverage scores using sampling_weights=np.ones(n_players + 1). Most importantly this should be very similar to the approach you have implemented here, whilst you are somewhat more efficient with how you deal with duplicates (you avoid them completly; the CoalitionSampler accounts for them in sampling adjustment weights). But as n increases it becomes quite difficult to differentiate those in total. So I would (before merging) be interested in the different of your implementation with that implementation using the CoalitionSampler with the weights described above. It should then come down to something very similar as depicted in Figure 9 of the LeverageSHAP paper.

Great catch! I wrote a new Jupyter NB in e98266a that reproduces what you are asking for. In the NB I compare only KernelSHAP, only LeverageSHAP (with our custom bernoulli sampling implementation), LeverageSHAP without Bernoulli by using a class override inside the NB to force using CoalitionSampler instead of our custom sampler and additionally KernelSHAP with sampling_weights=np.ones(n_players + 1) set. From my understanding KernelSHAP with sampling_weights=np.ones(n_players + 1) should behave exactly the same as "LeverageSHAP without Bernoulli" (assuming same fixed seed) but for illustration purposes I plot both.

Below you can find the results (you can also directly view them inside the committed NB). As you can see our custom implementation consistently beats KernelSHAP + np.ones / LeverageSHAP w/o Bernoulli after a certain budget count (for m >= 42 consistently). Because of that, I believe keeping the custom Bernoulli sampling is worth the extra lines of code. What do you think/suggest?

The plot:

The plot in words:

Median ℓ₂-norm error (Lower is better): ---------------------------------------------------------------------------------------------------- KernelSHAP (Standard) LeverageSHAP w/o Bernoulli (Override) KernelSHAP + np.ones (Reviewer Setup) LeverageSHAP (Custom Bernoulli) Budget (m) 2 0.90230 0.90230 0.90230 0.90230 8 0.77130 0.77382 0.77382 0.77667 15 0.45012 0.27952 0.28783 0.12703 22 0.04356 0.04092 0.04092 0.04499 29 0.06454 0.05521 0.05521 0.02961 36 0.02105 0.02224 0.02224 0.02422 42 0.01673 0.02007 0.02007 0.01984 49 0.03694 0.03514 0.03514 0.01719 56 0.01262 0.01592 0.01592 0.01398 63 0.02596 0.02355 0.02355 0.01203 70 0.01104 0.01159 0.01159 0.01040 77 0.02186 0.01517 0.01517 0.00896 83 0.01464 0.01680 0.01680 0.00897 90 0.00835 0.00917 0.00917 0.00890 97 0.01366 0.01357 0.01357 0.00786 104 0.00740 0.00845 0.00845 0.00712 111 0.01297 0.01079 0.01079 0.00661 118 0.00647 0.00769 0.00769 0.00667 124 0.00605 0.00705 0.00705 0.00600 131 0.00841 0.00997 0.00997 0.00521 138 0.00559 0.00650 0.00650 0.00450 145 0.00964 0.00817 0.00817 0.00459 152 0.00429 0.00603 0.00603 0.00433 159 0.00688 0.00645 0.00645 0.00364 165 0.00591 0.00551 0.00551 0.00323 172 0.00377 0.00363 0.00363 0.00329 179 0.00499 0.00565 0.00565 0.00308 186 0.00329 0.00340 0.00340 0.00276 193 0.00448 0.00467 0.00467 0.00248 200 0.00286 0.00289 0.00289 0.00249 ==================================================================================================== DIRECT COMPARISON: Custom Bernoulli vs. Reviewer Setup (np.ones) ==================================================================================================== m=2 : Reviewer Setup WINS! Error is 0.0% lower. m=8 : Reviewer Setup WINS! Error is 0.4% lower. m=15 : Custom Sampling WINS! Error is 55.9% lower. m=22 : Reviewer Setup WINS! Error is 9.9% lower. m=29 : Custom Sampling WINS! Error is 46.4% lower. m=36 : Reviewer Setup WINS! Error is 8.9% lower. m=42 : Custom Sampling WINS! Error is 1.1% lower. m=49 : Custom Sampling WINS! Error is 51.1% lower. m=56 : Custom Sampling WINS! Error is 12.2% lower. m=63 : Custom Sampling WINS! Error is 48.9% lower. m=70 : Custom Sampling WINS! Error is 10.3% lower. m=77 : Custom Sampling WINS! Error is 41.0% lower. m=83 : Custom Sampling WINS! Error is 46.6% lower. m=90 : Custom Sampling WINS! Error is 2.9% lower. m=97 : Custom Sampling WINS! Error is 42.1% lower. m=104: Custom Sampling WINS! Error is 15.8% lower. m=111: Custom Sampling WINS! Error is 38.8% lower. m=118: Custom Sampling WINS! Error is 13.3% lower. m=124: Custom Sampling WINS! Error is 14.9% lower. m=131: Custom Sampling WINS! Error is 47.7% lower. m=138: Custom Sampling WINS! Error is 30.8% lower. m=145: Custom Sampling WINS! Error is 43.8% lower. m=152: Custom Sampling WINS! Error is 28.2% lower. m=159: Custom Sampling WINS! Error is 43.6% lower. m=165: Custom Sampling WINS! Error is 41.4% lower. m=172: Custom Sampling WINS! Error is 9.4% lower. m=179: Custom Sampling WINS! Error is 45.5% lower. m=186: Custom Sampling WINS! Error is 18.8% lower. m=193: Custom Sampling WINS! Error is 46.8% lower. m=200: Custom Sampling WINS! Error is 13.9% lower.

FabianK-Dev · 2026-06-14T20:57:42Z

Overall a very nice implementation and nice safeguards implemented for large n. Yet I think we can (a) get rid of the log procedure you introduced, as the _find_c is already the bottleneck for large n, if I see this correctly^^.

Also I have left a comment on how the CoalitionSampler can be used, to produce very similar sampling procedures. See Figure 9 of the Leverageshape paper, of where the differences lie. The CoalitionSampler in our code is somewhat doing without BernoulliSampling.

Dear @Advueu963, thank you very much for the feedback! I will implement your suggestions asap.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

- I accidentally only accepted one out of three autofix suggestions by Copilot; the other two are now marked as outdated and can't be committed inside GitHub anymore which is why I commit them manually - "_find_c() converts binomial coefficients to float (`float(math.comb(...))`). For large `n`, `math.comb(n, s)` can exceed the maximum finite float and become `inf`, which then makes `hi` infinite and the bisection never meaningfully converges (returning `inf` for `c`). This can break sampling/weight computation for large-player games. Keep binomials as Python ints and choose an upper bound for `c` via exponential search (or another integer-safe strategy) instead of `max_binom/2` in float space." - "_sample_without_replacement() uses rejection sampling for `total >= 10**6`. This is only efficient when `total >> k`, but in this implementation `k` can be a large fraction of `total` (e.g., when `prob` is close to 1 but still < 1). In that case the `while len(seen) < k:` loop can take an extremely long time due to heavy collisions. Python's `random.sample()` supports sampling directly from `range(total)` without materializing it and handles the `k` vs `total` regime robustly, so it's safer to use it for the large-pool case too." - mmschlk#524

…ities_and_crime() and added fallback mechanism in the random sampling method - Implement feedback from mmschlk#524 - Replace custom dataset loading with shapiq_games.datasets.load_communities_and_crime() and refactor all code cells accordingly - Also load communities data set and extend codes cells to skip computing exact shapley values for n > 15 (for Communities dataset) - Skip experiments with communities dataset where exact_svs is None - Add fallback to src/shapiq/approximator/regression/leverageshap._sample_without_replacement() for astronomically large binomial pools

…ges into this separate commit) for easier reviewing

…it test

…s, boolean traps, and df naming in SOUM notebook

…AP (with our custom bernoulli sampling), LeverageSHAP w/o bernoulli using class override and KernelSHAP + np.ones

FabianK-Dev · 2026-06-18T15:35:50Z

Dead @mmschlk, I refactored the solve_regression() method in src/shapiq/approximator/regression/base.py as discussed in our meeting and removed a (now outdated/unused) old unit test in e70f9d6 and solved all remaining ruff style warnings in 67d05fc.

The method is now cleaned up, more readable introduces the "safe" try-except code block as well as the use_svd parameter that will be exclusively used by LeverageSHAP. I removed the unsafe code blocks that would return np.nan. It now throws errors instead.

Could you please review the changes? Thank you.

I've also run the following commands locally to ensure all code quality checks (pre-commit), unit-tests and coverage pass (the same commands like in the GitHub CI pipeline):

uv sync --group lint --all-extras
uv run pre-commit run --all-files --show-diff-on-failure

uv sync --all-extras
uv run pytest "tests/shapiq" --cov=shapiq --cov-report=term -n logical

uv sync --no-dev --all-extras --group all_ml
uv run --no-sync python -c "import shapiq; print('✅ shapiq imported successfully')"
uv run --no-sync python -c "import shapiq_games; print('✅ shapiq_games imported successfully')"

I've you're okay with the changes, please feel free to re-run the workflow in Github.

codecov · 2026-06-30T08:14:03Z

Codecov Report

❌ Patch coverage is 95.65217% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/shapiq/approximator/regression/leverageshap.py	95.10%	7 Missing ⚠️

📢 Thoughts on this report? Let us know!

FabianK-Dev · 2026-06-30T09:19:18Z

Thanks for running the CI. I see that the documentation is not building correctly and that some lines are still missing coverage. I'll have a look at it ASAP.

…sets and different n

…ghting Baseline

…want to make wrong assumptions)

FabianK-Dev · 2026-07-03T15:22:57Z

@Advueu963 As requested in the last meeting, I have updated and expanded the benchmark notebook to evaluate our custom LeverageSHAP implementation (Algorithm 2) against the uniform KernelSHAP baseline across 25 configurations and 4 distinct datasets. The results demonstrate that our method surpasses KernelSHAP + np.zeros, with the custom implementation achieving a lower absolute L2 error and winning in 22 to 28 out of 30 budget steps across almost every configuration (please see the table below). Also, our Bernoulli sampling mechanism resolves the "zig-zag" pattern seen in 9 out of 25 baseline plots for KernelSHAP + np.zeros. I assume the "zig-zag" occurs when the KernelSHAP + np.zeros setup runs out of budget mid-layer.

Some configurations show extreme negative percentages in the relative "Avg Improvement" column. I think this caused when the baseline randomly hits a near-zero error in its symmetry valleys which then distorts the relative average. The fully cleaned and documented notebook is now ready for review. I also added a "Empirical Evaluation: Custom LeverageSHAP vs. Uniform Weighting Baseline" section which describes and tries to explain the observations. We also plot every single experiment (i.e. each line in the table below in the Jupyter NB).

You can also find the Jupyter NB exported in the PDF here:
reproduce_figure9_sampling_architecture.ipynb.pdf

…NBs will not be pushed to the shapiq repository according to the last meeting)

FabianK-Dev · 2026-07-03T19:26:14Z

@Advueu963 I moved the notebooks located in notebooks/ folder to a PR on our fork because if I understood it correctly, as discussed with @pwhofman the notebooks, discussion files and 1-page summary will not be pushed to mmschlk/shapiq, thus I'm cleaning up this PR so it gets ready to be merged.

…o leverageshap-docs" This reverts commit 4ac5580, reversing changes made to 6ede9b5.

42logos added 3 commits May 11, 2026 02:46

github-project-automation Bot added this to shapiq development May 22, 2026

feat/gitignore: Add benchmark/results/* to .gitignore

6bede21

Theresa Geber and others added 24 commits June 8, 2026 13:57

leverageSHAP sceleton

2447992

feat/SG-20/leverageshap: Add budget validation, initialize seen_coali…

67400f8

…tions set, Z_list and probs_list and create all-true and all-false coalitions

feat/SG-20/leverageshap: Add leverage score sampling loop

80f4248

feat/SG-20/leverageshap: Run uv run pre-commit run --all-files and fi…

f02e5ed

…x pre-commit errors

feat/SG-20/leverageshap: Add comments to document and explain code

e4d2ff6

implementation of _solver function + first tests

0f8a940

feat/SG-20/leverageshap: Add reference for Lemma 3.2

6a1bb98

feat/SG-22/testing: Add basic test_reproducibility test => Tests for …

44ba153

…determinism on LeverageSHAP()

feat/SG-22/testing: Improve test_reproducibility(): Split up into dif…

bfa15ee

…ferent game variables to avoid access counters interfering; Also compare metadata

feat/SG-22/testing: Test whether different seeds of identical dummy g…

aecacdf

…ames produce (slightly) different outputs

feat/SG-22/testing: Test whether approximation error decreases with i…

109d14a

…ncreased budget

feat/SG-22/testing: Add comments to make test more understandable

2fbabae

Fix: DRY principle applied

030aa00

fix: solve_regression for rank-deficient matrices

1337b00

feat/SG-22/testing: Add tests for exact matches with ExactComputer an…

30acd7b

…d tiny-n edge case and add comments to document and explain the test

feat/SG-22/testing: Split test_reproducibility_different_seeds up int…

058e5d7

…o test_exact_regime_seed_independence and test_stochastic_regime_seed_variability

feat/SG-22/testing: Add test for pairing trick variance reduction in …

e09f1f0

…stochastic regime

feat/SG-22/testing: Add test_leverageshap_vs_kernelshap_mean_error, t…

7267e63

…est_exact_matches_multiple_small_games, test_null_player_axiom and test_minimal_budget_sweep

reproduceability test and changes necessary to actually reproduce pap…

4cb7602

…er results

feat/SG-22/testing: Update test_empirical_convergence_rate to use hig…

6ff9ad3

…her n to avoid minimal floating errors

feat/gitignore: Add benchmark/results/* to .gitignore

097de27

feat/Add DISCUSSION.md (WIP)

0d87b9c

feat/SG-69/refactor: Refactor leverageshap.py to move solve method in…

a772261

…to base regression class

Copilot started reviewing on behalf of mmschlk June 10, 2026 08:40 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread src/shapiq/approximator/regression/leverageshap.py Outdated

Comment thread src/shapiq/approximator/regression/leverageshap.py Outdated

Comment thread tests/shapiq/tests_unit/tests_approximators/test_approximator_leverageshap.py Outdated

Advueu963 reviewed Jun 13, 2026

View reviewed changes

FabianK-Dev and others added 8 commits June 14, 2026 22:59

fix: typo in comment: "ovefitting" -> "overfitting"

34b9a26

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ruff: Add ruff changes in seperate commit (i.e. ruff split style chan…

ad74f97

…ges into this separate commit) for easier reviewing

refactor: Refactor faulty solve_regression() method and remove old un…

e70f9d6

…it test

ruff: Apply automatic ruff format changes and fix remaining type hint…

67d05fc

…s, boolean traps, and df naming in SOUM notebook

refactor: simplify IS weight math and optimize solver

c28409b

test: reproduce figure 9 of the paper: compare KernelSHAP, LeverageSH…

e98266a

…AP (with our custom bernoulli sampling), LeverageSHAP w/o bernoulli using class override and KernelSHAP + np.ones

FabianK-Dev requested a review from Advueu963 June 18, 2026 15:35

fix: correct LaTeX formula rendering

cd7c0b6

FabianK-Dev added 3 commits July 3, 2026 16:07

feat(benchmark): Extend benchmark with more instances, different data…

e3e4841

…sets and different n

docs: Add empirical evaluation of Custom LeverageSHAP vs. Uniform Wei…

a0083f9

…ghting Baseline

feat: remove empirical evaluation comments where I'm unsure (I don't …

6b020e4

…want to make wrong assumptions)

fix: Remove NBs from current branch and move to branch 'submission' (…

49394e6

…NBs will not be pushed to the shapiq repository according to the last meeting)

FabianK-Dev added 7 commits July 3, 2026 21:26

Merge branch 'main' into leverageSHAP

bd6d472

Merge branch 'main' of github.com:mmschlk/shapiq

dfd68ce

Merge branch 'main' into leverageSHAP

598ef5f

ruff: auto format

079ab89

Revert "Merge remote-tracking branch 'origin/wu/conformance-test' int…

3b40fc1

…o leverageshap-docs" This reverts commit 4ac5580, reversing changes made to 6ede9b5.

test: add testing to add coverage for remaining code lines

a141ebb

fix: Fix docstring to ensure sphinx builds

6c4d22a

Conversation

FabianK-Dev commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Public API Changes

How Has This Been Tested?

Checklist

Uh oh!

FabianK-Dev commented May 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Advueu963 left a comment

Choose a reason for hiding this comment

Uh oh!

Advueu963 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

FabianK-Dev Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Advueu963 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

FabianK-Dev Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Advueu963 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

FabianK-Dev Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Advueu963 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Advueu963 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

FabianK-Dev Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

FabianK-Dev commented Jun 14, 2026

Uh oh!

FabianK-Dev commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 30, 2026

Codecov Report

Uh oh!

FabianK-Dev commented Jun 30, 2026

Uh oh!

FabianK-Dev commented Jul 3, 2026

Uh oh!

FabianK-Dev commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

FabianK-Dev commented May 22, 2026 •

edited

Loading

FabianK-Dev Jun 18, 2026 •

edited

Loading

FabianK-Dev Jun 18, 2026 •

edited

Loading