feat: support structured reward outputs and grouped reward aggregation by Wangxiaoxiaoa · Pull Request #1200 · inclusionAI/AReaL

Wangxiaoxiaoa · 2026-04-17T10:08:35Z

Description

This PR adds support for structured reward outputs in the reward path for multi-reward RL workflows.

Today, the reward interface is much more naturally aligned with a single scalar reward, which makes it hard to represent multiple reward components for one sample.

This PR extends the reward path so reward functions can return either:

a scalar reward
a structured reward dictionary

while keeping existing scalar-only behavior unchanged.

This is useful for reproducing multi-reward RL setups such as GDPO. With this change, GDPO-style logic can be implemented at the user level:

in a custom reward function, by returning multiple reward components
or in a workflow, by aggregating those components into the final scalar reward used for training

This PR does not implement GDPO itself in AReaL core. It provides the reward representation needed to build GDPO-style and other multi-reward workflows on top of AReaL.

Related Issue

Fixes #1196

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

N/A

gemini-code-assist

Code Review

This pull request updates the reward API to support both float and dictionary-based reward types and introduces a mechanism in the inference engine to aggregate group results through a workflow method. Feedback was provided to refine a type hint from Any to a more specific union type to maintain consistency with the updated documentation.

gemini-code-assist · 2026-04-17T10:09:39Z

        return None

-    async def __call__(self, *args, **kwargs) -> float:
+    async def __call__(self, *args, **kwargs) -> Any:


The return type hint Any is too generic. Since the reward_fn docstring at line 60 has been updated to specify float | dict[str, float], it is better to use the same specific type hint here to maintain consistency and improve type checking.

Suggested change

async def __call__(self, *args, **kwargs) -> Any:

async def __call__(self, *args, **kwargs) -> float | dict[str, float]:

Applied，thx

github-actions · 2026-05-08T02:41:20Z

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

Wangxiaoxiaoa requested a review from garrett4wade as a code owner April 17, 2026 10:08

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

Wangxiaoxiaoa force-pushed the xiao/pr-reward-structured branch 2 times, most recently from babf3ad to b708fef Compare April 17, 2026 10:20

feat: support structured reward outputs and grouped reward aggregation

7365bca

Wangxiaoxiaoa force-pushed the xiao/pr-reward-structured branch from b708fef to 7365bca Compare April 17, 2026 10:53

github-actions Bot added the stale label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support structured reward outputs and grouped reward aggregation#1200

feat: support structured reward outputs and grouped reward aggregation#1200
Wangxiaoxiaoa wants to merge 1 commit intoinclusionAI:mainfrom
Wangxiaoxiaoa:xiao/pr-reward-structured

Wangxiaoxiaoa commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

Wangxiaoxiaoa Apr 17, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	async def __call__(self, args, *kwargs) -> Any:
	async def __call__(self, args, *kwargs) -> float \| dict[str, float]:

Conversation

Wangxiaoxiaoa commented Apr 17, 2026

Description

Related Issue

Type of Change

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Wangxiaoxiaoa Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant