Skip to content

fix(weave): populate structured scorer feedback for runnable scores#6986

Open
benjreinhart wants to merge 1 commit into
05-27-feat_weave_support_scorer__fields_for_wandb.runnable_scorersfrom
05-27-fix_weave_populate_structured_scorer_feedback_for_runnable_scores
Open

fix(weave): populate structured scorer feedback for runnable scores#6986
benjreinhart wants to merge 1 commit into
05-27-feat_weave_support_scorer__fields_for_wandb.runnable_scorersfrom
05-27-fix_weave_populate_structured_scorer_feedback_for_runnable_scores

Conversation

@benjreinhart
Copy link
Copy Markdown
Contributor

@benjreinhart benjreinhart commented May 27, 2026

Description

For older runnable call scorers using llm-as-judge, if the llm output matches what we expect for the typed outputs (scorer_* columns), populate them in the to-be-inserted feedback row.

This should allow the runnable scorers to start adopting the new typed columns while remaining backwards compatible for the ones that exist.

Testing

Unit tests

Copy link
Copy Markdown
Contributor Author

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 91.04478% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
weave/trace_server/feedback.py 91.04% 4 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@benjreinhart benjreinhart force-pushed the 05-27-fix_weave_populate_structured_scorer_feedback_for_runnable_scores branch from 71d6e01 to f919f8b Compare May 27, 2026 23:32
@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 Bot commented May 27, 2026

@benjreinhart benjreinhart force-pushed the 05-27-fix_weave_populate_structured_scorer_feedback_for_runnable_scores branch from f919f8b to 89517a8 Compare May 27, 2026 23:42
@benjreinhart benjreinhart marked this pull request as ready for review May 27, 2026 23:44
@benjreinhart benjreinhart requested a review from a team as a code owner May 27, 2026 23:44
Copy link
Copy Markdown
Contributor

@jtschoonhoven jtschoonhoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should DRY this up with the code in this branch of agent_scoring_types.py

Apologies if you weren't aware of that branch, hate to duplicate work.

Comment on lines +135 to +138
def _derive_scorer_fields_from_payload(
feedback_req: tsi.FeedbackCreateReq,
processed_payload: dict[str, Any],
) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to replace most of this by reusing code from https://github.com/wandb/core/pull/44328/changes#diff-daee47bf3fb4333f5265212ed054429a7a15ad728e17225fa7a09993400d04f0

E.g. this is similar to ScorerLlmOutputGroup.from_agent_scorer_output()

It probably makes sense to pull some of those classes out of the agent scoring worker and into a shared module.

Comment on lines +377 to +384
request_scorer_fields = {
"scorer_tags": feedback_req.scorer_tags,
"scorer_tag_reasons": feedback_req.scorer_tag_reasons,
"scorer_tag_confidences": feedback_req.scorer_tag_confidences,
"scorer_ratings": feedback_req.scorer_ratings,
"scorer_rating_reasons": feedback_req.scorer_rating_reasons,
"scorer_rating_confidences": feedback_req.scorer_rating_confidences,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://github.com/wandb/core/pull/44328/changes#diff-daee47bf3fb4333f5265212ed054429a7a15ad728e17225fa7a09993400d04f0 there's a pydantic model ScorerColumns you could use to parse and validate these.

@benjreinhart
Copy link
Copy Markdown
Contributor Author

@jtschoonhoven i was not aware but sounds good to me, excited to simplify+reuse. I'll update this when that gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants