Skip to content

fix(bandit): skip reward updates for unknown arm names#12

Open
GrigoryEvko wants to merge 1 commit into
FusionBrainLab:mainfrom
GrigoryEvko:fix/bandit-canonicalize-mutation-model
Open

fix(bandit): skip reward updates for unknown arm names#12
GrigoryEvko wants to merge 1 commit into
FusionBrainLab:mainfrom
GrigoryEvko:fix/bandit-canonicalize-mutation-model

Conversation

@GrigoryEvko
Copy link
Copy Markdown

on_mutation_outcome forwards program.get_metadata('mutation_model') directly to self._bandit.update_reward, which indexes self.arms[arm_name]. The normal flow keeps producer and router in step, but the metadata can also arrive from a Program loaded out of a Redis snapshot written by an older router, from a custom mutation operator, or from a hand-built test fixture. In those cases update_reward raises KeyError and aborts the agent callback.

Minimal reproducer against current main:

from unittest.mock import MagicMock
from gigaevo.llm.bandit import BanditModelRouter, MutationOutcome
from gigaevo.programs.program import Program

def _mock(name):
    m = MagicMock(); m.model_name = name
    m.with_structured_output = MagicMock(return_value=MagicMock())
    return m

router = BanditModelRouter([_mock('llama'), _mock('qwen')], [0.5, 0.5], fitness_key='score')
child = Program(code='x=1')
child.set_metadata('mutation_model', 'gpt-4-not-in-router')
child.metrics['score'] = 0.8
parent = Program(code='x=0'); parent.metrics['score'] = 0.5
router.on_mutation_outcome(child, [parent], outcome=MutationOutcome.ACCEPTED)
# KeyError: 'gpt-4-not-in-router'

The change adds a single membership check against self._bandit.arms at the top of on_mutation_outcome and skips with one debug note when the name is unknown. Three new tests cover the unknown-arm path on the ACCEPTED branch, on the REJECTED_ACCEPTOR branch, and combined with missing child fitness.

on_mutation_outcome forwarded program.get_metadata('mutation_model')
directly to self._bandit.update_reward, which raised KeyError when
the metadata did not match a current arm (Redis snapshot replay,
custom operator, hand-built test fixture). Membership check at the
top of the function turns the crash into a debug-level skip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant