-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Feature Request: Implementation of Agent Lightning for Agent Reinforcement Learning (RL)Β #5163
Description
π Feature Request: Agent Lightning for Agent RL
Summary
This issue proposes the integration of Agent Lightning β a structured training and optimization framework for agents using Reinforcement Learning (RL) β within Google ADK (Agent Development Kit). The goal is to enable agents built on ADK to be trained end-to-end using RL feedback loops, allowing them to self-improve based on environment signals, reward functions, and multi-turn interaction traces.
Motivation
As LLM-based agents become more capable, there is growing demand to move beyond static prompting and tool definitions toward adaptive agents that can learn from interaction. Agent RL is a paradigm where an agent is:
- Rewarded for successful task completion
- Penalized for tool misuse, hallucinations, or failed sub-tasks
- Iteratively improved via policy gradient or RLHF-style feedback
Agent Lightning refers to a fast, scalable RL training framework (similar in spirit to PyTorch Lightning for deep learning) that abstracts boilerplate, manages rollout collection, reward computation, and policy optimization in a clean interface β tailored specifically for agentic workflows.
Does ADK Currently Support This?
The current Google ADK (adk-python) provides several building blocks that partially support RL-style training:
| ADK Feature | Relevance to Agent RL |
|---|---|
LlmAgent with callbacks |
Can be used to intercept and log intermediate steps for reward computation |
BaseAgent lifecycle hooks (before_agent_callback, after_agent_callback) |
Useful for trajectory collection |
Event and Content tracing |
Can feed into reward labeling pipelines |
| Tool definitions with structured outputs | Enables evaluation of correctness per step |
InMemorySessionService / DatabaseSessionService |
Can persist rollout trajectories for offline RL |
Multi-agent orchestration (SequentialAgent, ParallelAgent) |
Can represent environment and policy agent roles |
However, there is no native support for:
- Defining reward functions at the agent or session level
- Running rollout loops (environment resets, episode collection)
- Policy optimization hooks (e.g., calling a fine-tuning endpoint after trajectory collection)
- A training loop abstraction analogous to Agent Lightning
Proposed Implementation
1. RewardFunction Interface
from abc import ABC, abstractmethod
from google.adk.events import Event
class RewardFunction(ABC):
@abstractmethod
def compute(self, trajectory: list[Event]) -> float:
"""Compute a scalar reward from a completed agent trajectory."""
...2. RolloutSession β Episode Collection
class RolloutSession:
def __init__(self, agent: BaseAgent, env_tool: BaseTool, reward_fn: RewardFunction):
self.agent = agent
self.env_tool = env_tool
self.reward_fn = reward_fn
async def run_episode(self, initial_prompt: str) -> tuple[list[Event], float]:
"""Run one full episode and return trajectory + reward."""
...3. AgentLightningTrainer β Training Loop
class AgentLightningTrainer:
def __init__(self, rollout_session: RolloutSession, optimizer_endpoint: str):
...
async def train(self, num_episodes: int = 100, batch_size: int = 10):
"""Collect rollouts and call optimizer (e.g., Vertex AI fine-tuning)."""
...Example Use Case
An ADK agent that helps users write SQL queries gets rewarded (+1) when the generated query executes successfully against a test DB, and penalized (-1) when it causes errors. After 500 episodes, the agent is fine-tuned using collected trajectories via Vertex AI.
sql_agent = LlmAgent(name="SQLAgent", model="gemini-2.0-flash", tools=[sql_executor_tool])
reward_fn = SQLExecutionReward(db_connection=test_db)
rollout = RolloutSession(agent=sql_agent, env_tool=sql_executor_tool, reward_fn=reward_fn)
trainer = AgentLightningTrainer(rollout_session=rollout, optimizer_endpoint=VERTEX_FINETUNE_URL)
await trainer.train(num_episodes=500)Reference: Current ADK Hooks That Could Be Leveraged
Callbacks in LlmAgent already allow interception of agent lifecycle:
def my_after_agent_callback(callback_context: CallbackContext) -> None:
trajectory = callback_context.state.get("trajectory", [])
reward = reward_fn.compute(trajectory)
callback_context.state["reward"] = reward
agent = LlmAgent(
name="TrainableAgent",
model="gemini-2.0-flash",
after_agent_callback=my_after_agent_callback,
)This pattern can be extended natively within ADK to formalize a training loop.
References
- Google ADK Documentation
- Google ADK Callbacks
- Vertex AI Fine-Tuning
- Agent RL β RLHF for LLM Agents (Research)
- PyTorch Lightning (Inspiration for Agent Lightning pattern)
Questions for Maintainers
- Is there an internal roadmap for RL-based agent training within ADK?
- Are callback hooks (
before_agent_callback,after_agent_callback) intended to be used for trajectory collection, or is there a better-supported mechanism? - Would the team be open to a community contribution adding a
RewardFunctioninterface and aRolloutSessionabstraction as an optional module?
Suggested labels: enhancement, feature-request, reinforcement-learning, agent-training