Skip to content

Fix DiceGenetic.compute_proximity_loss for all-categorical datasets (#276)#471

Open
jbbqqf wants to merge 1 commit intointerpretml:mainfrom
jbbqqf:fix/276-proximity-loss-no-continuous
Open

Fix DiceGenetic.compute_proximity_loss for all-categorical datasets (#276)#471
jbbqqf wants to merge 1 commit intointerpretml:mainfrom
jbbqqf:fix/276-proximity-loss-no-continuous

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

DiceGenetic.compute_proximity_loss divides by sum(feature_weights),
where feature_weights is restricted to the continuous feature indexes.
For an all-categorical dataset (e.g. survey responses, OHE-only inputs)
that array is empty — and the original code hit either of two bugs
depending on input shape:

Both poison compute_loss with NaN/exceptions and break genetic search
for legitimate categorical-only use.

Why

Proximity is conceptually undefined when there are no continuous distances
to weigh, so the function short-circuits with a zero loss vector matching
the population shape. The categorical_penalty term in compute_loss
already handles categorical sparsity, so dropping the proximity
contribution is the correct semantic — and matches what users expect when
they explicitly set up a categorical-only dice_ml.Data.

The change is one early-return on len(feature_weights) == 0. Comment in
the code explains the choice for future reviewers.

Reproduce BEFORE/AFTER yourself (copy-paste)

set -e
cd /tmp && rm -rf DiCE-276 && git clone https://github.com/interpretml/DiCE.git DiCE-276
cd DiCE-276
pip install -q -e . pytest scikit-learn

git fetch https://github.com/jbbqqf/DiCE.git fix/276-proximity-loss-no-continuous
git checkout FETCH_HEAD -- tests/test_dice_interface/test_dice_genetic.py

# --- BEFORE: production code from origin/main ---
git checkout origin/main -- dice_ml/explainer_interfaces/dice_genetic.py
python -m pytest \"tests/test_dice_interface/test_dice_genetic.py::TestComputeProximityLossNoContinuousFeatures::test_compute_proximity_loss_returns_zero_when_no_continuous_features\" -q || echo \"BEFORE: ValueError 'cannot reshape array of size 0' (expected)\"

# --- AFTER: fix applied ---
git checkout FETCH_HEAD -- dice_ml/explainer_interfaces/dice_genetic.py
python -m pytest \"tests/test_dice_interface/test_dice_genetic.py::TestComputeProximityLossNoContinuousFeatures::test_compute_proximity_loss_returns_zero_when_no_continuous_features\" -q
# Expected: 1 passed

What I ran locally

  • New test passes on this branch
  • Test fails on origin/main with ValueError: cannot reshape array of size 0 into shape (0) (the original snippet's proximity_loss / sum(...) path is reached only with a slightly different input shape, but both have the same root cause)
  • Full tests/test_dice_interface/test_dice_genetic.py ran: same 4 pre-existing failures as origin/main (regression suite for TestDiceGeneticRegressionMethods is broken on main for separate reasons), 0 new failures introduced

Edge cases

dataset shape path taken result
Mixed continuous + categorical unchanged unchanged
All categorical (this fix) new early-return np.zeros(population)
All continuous unchanged unchanged
proximity_weight = 0 already short-circuited in compute_loss unchanged

AI disclosure

This change was prepared with the assistance of Claude (Anthropic).
The author reviewed every line and is responsible for the final result.

When the dataset has no continuous features,
`continuous_feature_indexes` is empty, so `feature_weights` is an empty
np.array and the original implementation hit either:

* `proximity_loss / sum(feature_weights)` ⇒ ZeroDivisionError /
  RuntimeWarning + NaN losses (the symptom @kburchfiel reported in interpretml#276
  with the original quoted snippet), or
* `product.reshape(-1, product.shape[-1])` ⇒ ValueError on a 0-sized
  array, depending on input shape.

Both paths poison `compute_loss` with NaN/exceptions and break the
genetic search for legitimate all-categorical use cases.

Proximity is conceptually undefined when there are no continuous
distances to weigh, so short-circuit with a zero loss vector matching
the population shape. The categorical penalty in `compute_loss`
already accounts for categorical sparsity, so dropping the proximity
contribution is the correct semantic — and it matches what users
expect when they explicitly set up a categorical-only `dice_ml.Data`.

Adds `TestComputeProximityLossNoContinuousFeatures` covering the
all-categorical path. The test fails on `origin/main` with the
ValueError reshape variant of this bug.

Closes interpretml#276.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jbbqqf jbbqqf requested review from amit-sharma and gaugup as code owners May 9, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant