Skip to content

fix(bandit): return neutral reward for non-finite fitness inputs#2

Open
GrigoryEvko wants to merge 1 commit into
FusionBrainLab:mainfrom
GrigoryEvko:fix/bandit-finite-guard
Open

fix(bandit): return neutral reward for non-finite fitness inputs#2
GrigoryEvko wants to merge 1 commit into
FusionBrainLab:mainfrom
GrigoryEvko:fix/bandit-finite-guard

Conversation

@GrigoryEvko
Copy link
Copy Markdown

@GrigoryEvko GrigoryEvko commented May 15, 2026

compute_bandit_reward(NaN, _) returns NaN. That feeds into the SlidingWindowUCB1 mean and silently bricks routing — every arm's score becomes NaN, score > best_score is always False, the first arm in dict order always wins. Trigger: any path producing non-finite fitness, e.g. a crashed validity stage emitting NaN.

Fix: return 0.0 when either input is non-finite. Tests: finite-input parity, NaN child, +inf parent.

compute_bandit_reward used to compute exp(min(max(NaN, 0), _MAX)) → NaN
when either input was non-finite. The NaN then flowed into
SlidingWindowUCB1.update_reward → mean_reward → UCB score, after which
every arm's score is NaN, "score > best_score" is False, and the first
arm in dict iteration order is always selected (exploration silently
bricked). Non-finite inputs do occur in practice: validity stage crashes
yield sentinel-or-NaN fitness depending on the acceptor.

Add a finite-input fast-guard returning 0.0 — the neutral reward — so the
sliding-window mean stays well-defined.

Tests cover positive (finite path unchanged), NaN child, and +inf parent.
GrigoryEvko added a commit to GrigoryEvko/gigaevo-core that referenced this pull request May 15, 2026
``_StructuredOutputRouter._maybe_fire_failure_hook`` used to swallow
hook exceptions silently (``except Exception: pass``). The hook is
observability-only, so re-raising would mask the real LLM failure —
but a *silent* swallow loses telemetry whenever the hook itself has a
bug, hiding bandit-side regressions. Replace ``pass`` with
``logger.warning(...)`` so the original exception still propagates,
yet a broken hook is visible in operator logs.

Audit item FusionBrainLab#2 from the PR FusionBrainLab#13 bug hunt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant