Skip to content

Feature Request: Implementation of Agent Lightning for Agent Reinforcement Learning (RL)Β #5163

@Dronanaik

Description

@Dronanaik

πŸš€ Feature Request: Agent Lightning for Agent RL

Summary

This issue proposes the integration of Agent Lightning β€” a structured training and optimization framework for agents using Reinforcement Learning (RL) β€” within Google ADK (Agent Development Kit). The goal is to enable agents built on ADK to be trained end-to-end using RL feedback loops, allowing them to self-improve based on environment signals, reward functions, and multi-turn interaction traces.


Motivation

As LLM-based agents become more capable, there is growing demand to move beyond static prompting and tool definitions toward adaptive agents that can learn from interaction. Agent RL is a paradigm where an agent is:

  • Rewarded for successful task completion
  • Penalized for tool misuse, hallucinations, or failed sub-tasks
  • Iteratively improved via policy gradient or RLHF-style feedback

Agent Lightning refers to a fast, scalable RL training framework (similar in spirit to PyTorch Lightning for deep learning) that abstracts boilerplate, manages rollout collection, reward computation, and policy optimization in a clean interface β€” tailored specifically for agentic workflows.


Does ADK Currently Support This?

The current Google ADK (adk-python) provides several building blocks that partially support RL-style training:

ADK Feature Relevance to Agent RL
LlmAgent with callbacks Can be used to intercept and log intermediate steps for reward computation
BaseAgent lifecycle hooks (before_agent_callback, after_agent_callback) Useful for trajectory collection
Event and Content tracing Can feed into reward labeling pipelines
Tool definitions with structured outputs Enables evaluation of correctness per step
InMemorySessionService / DatabaseSessionService Can persist rollout trajectories for offline RL
Multi-agent orchestration (SequentialAgent, ParallelAgent) Can represent environment and policy agent roles

However, there is no native support for:

  • Defining reward functions at the agent or session level
  • Running rollout loops (environment resets, episode collection)
  • Policy optimization hooks (e.g., calling a fine-tuning endpoint after trajectory collection)
  • A training loop abstraction analogous to Agent Lightning

Proposed Implementation

1. RewardFunction Interface

from abc import ABC, abstractmethod
from google.adk.events import Event

class RewardFunction(ABC):
    @abstractmethod
    def compute(self, trajectory: list[Event]) -> float:
        """Compute a scalar reward from a completed agent trajectory."""
        ...

2. RolloutSession β€” Episode Collection

class RolloutSession:
    def __init__(self, agent: BaseAgent, env_tool: BaseTool, reward_fn: RewardFunction):
        self.agent = agent
        self.env_tool = env_tool
        self.reward_fn = reward_fn

    async def run_episode(self, initial_prompt: str) -> tuple[list[Event], float]:
        """Run one full episode and return trajectory + reward."""
        ...

3. AgentLightningTrainer β€” Training Loop

class AgentLightningTrainer:
    def __init__(self, rollout_session: RolloutSession, optimizer_endpoint: str):
        ...

    async def train(self, num_episodes: int = 100, batch_size: int = 10):
        """Collect rollouts and call optimizer (e.g., Vertex AI fine-tuning)."""
        ...

Example Use Case

An ADK agent that helps users write SQL queries gets rewarded (+1) when the generated query executes successfully against a test DB, and penalized (-1) when it causes errors. After 500 episodes, the agent is fine-tuned using collected trajectories via Vertex AI.

sql_agent = LlmAgent(name="SQLAgent", model="gemini-2.0-flash", tools=[sql_executor_tool])
reward_fn = SQLExecutionReward(db_connection=test_db)
rollout = RolloutSession(agent=sql_agent, env_tool=sql_executor_tool, reward_fn=reward_fn)
trainer = AgentLightningTrainer(rollout_session=rollout, optimizer_endpoint=VERTEX_FINETUNE_URL)
await trainer.train(num_episodes=500)

Reference: Current ADK Hooks That Could Be Leveraged

Callbacks in LlmAgent already allow interception of agent lifecycle:

def my_after_agent_callback(callback_context: CallbackContext) -> None:
    trajectory = callback_context.state.get("trajectory", [])
    reward = reward_fn.compute(trajectory)
    callback_context.state["reward"] = reward

agent = LlmAgent(
    name="TrainableAgent",
    model="gemini-2.0-flash",
    after_agent_callback=my_after_agent_callback,
)

This pattern can be extended natively within ADK to formalize a training loop.


References


Questions for Maintainers

  1. Is there an internal roadmap for RL-based agent training within ADK?
  2. Are callback hooks (before_agent_callback, after_agent_callback) intended to be used for trajectory collection, or is there a better-supported mechanism?
  3. Would the team be open to a community contribution adding a RewardFunction interface and a RolloutSession abstraction as an optional module?

Suggested labels: enhancement, feature-request, reinforcement-learning, agent-training

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluationneeds review[Status] The PR/issue is awaiting review from the maintainer

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions