Skip to content

Add exploration-max-score-picker to address stale scores.#30

Open
Mohammad-nassar10 wants to merge 4 commits into
ms-llmd:mainfrom
Mohammad-nassar10:exploration-picker
Open

Add exploration-max-score-picker to address stale scores.#30
Mohammad-nassar10 wants to merge 4 commits into
ms-llmd:mainfrom
Mohammad-nassar10:exploration-picker

Conversation

@Mohammad-nassar10

Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds an exploration-max-score-picker that wraps max-score exploitation with controlled exploration. On each request, with probability explorationRatio, it probes a model whose score has been unchanged for longer than stalenessThreshold instead of picking the top-scored model.

Which issue(s) this PR fixes:

Fixes #18.

Release note (write NONE if no user-facing change):

NONE

Signed-off-by: Mohammad Nassar <mohammad.nassar@ibm.com>
Mohammad-nassar10 and others added 3 commits June 10, 2026 11:38
Signed-off-by: Mohammad Nassar <mohammad.nassar@ibm.com>
Signed-off-by: Mohammad-nassar10 <79787844+Mohammad-nassar10@users.noreply.github.com>
Signed-off-by: Mohammad Nassar <mohammad.nassar@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add exploration-exploitation score picker to address stale scores

1 participant