Feature Request: Implementation of Agent Lightning for Agent Reinforcement Learning (RL)

## 🚀 Feature Request: Agent Lightning for Agent RL

### Summary

This issue proposes the integration of **Agent Lightning** — a structured training and optimization framework for agents using **Reinforcement Learning (RL)** — within Google ADK (Agent Development Kit). The goal is to enable agents built on ADK to be trained end-to-end using RL feedback loops, allowing them to self-improve based on environment signals, reward functions, and multi-turn interaction traces.

---

### Motivation

As LLM-based agents become more capable, there is growing demand to move beyond static prompting and tool definitions toward **adaptive agents** that can learn from interaction. Agent RL is a paradigm where an agent is:
- Rewarded for successful task completion
- Penalized for tool misuse, hallucinations, or failed sub-tasks
- Iteratively improved via policy gradient or RLHF-style feedback

**Agent Lightning** refers to a fast, scalable RL training framework (similar in spirit to PyTorch Lightning for deep learning) that abstracts boilerplate, manages rollout collection, reward computation, and policy optimization in a clean interface — tailored specifically for **agentic workflows**.

---

### Does ADK Currently Support This?

The current Google ADK (`adk-python`) provides several building blocks that partially support RL-style training:

| ADK Feature | Relevance to Agent RL |
|---|---|
| `LlmAgent` with `callbacks` | Can be used to intercept and log intermediate steps for reward computation |
| `BaseAgent` lifecycle hooks (`before_agent_callback`, `after_agent_callback`) | Useful for trajectory collection |
| `Event` and `Content` tracing | Can feed into reward labeling pipelines |
| Tool definitions with structured outputs | Enables evaluation of correctness per step |
| `InMemorySessionService` / `DatabaseSessionService` | Can persist rollout trajectories for offline RL |
| Multi-agent orchestration (`SequentialAgent`, `ParallelAgent`) | Can represent environment and policy agent roles |

However, there is **no native support** for:
- Defining **reward functions** at the agent or session level
- Running **rollout loops** (environment resets, episode collection)
- **Policy optimization** hooks (e.g., calling a fine-tuning endpoint after trajectory collection)
- A **training loop abstraction** analogous to Agent Lightning

---

### Proposed Implementation

#### 1. `RewardFunction` Interface

```python
from abc import ABC, abstractmethod
from google.adk.events import Event

class RewardFunction(ABC):
    @abstractmethod
    def compute(self, trajectory: list[Event]) -> float:
        """Compute a scalar reward from a completed agent trajectory."""
        ...
```

#### 2. `RolloutSession` — Episode Collection

```python
class RolloutSession:
    def __init__(self, agent: BaseAgent, env_tool: BaseTool, reward_fn: RewardFunction):
        self.agent = agent
        self.env_tool = env_tool
        self.reward_fn = reward_fn

    async def run_episode(self, initial_prompt: str) -> tuple[list[Event], float]:
        """Run one full episode and return trajectory + reward."""
        ...
```

#### 3. `AgentLightningTrainer` — Training Loop

```python
class AgentLightningTrainer:
    def __init__(self, rollout_session: RolloutSession, optimizer_endpoint: str):
        ...

    async def train(self, num_episodes: int = 100, batch_size: int = 10):
        """Collect rollouts and call optimizer (e.g., Vertex AI fine-tuning)."""
        ...
```

---

### Example Use Case

> An ADK agent that helps users write SQL queries gets rewarded (+1) when the generated query executes successfully against a test DB, and penalized (-1) when it causes errors. After 500 episodes, the agent is fine-tuned using collected trajectories via Vertex AI.

```python
sql_agent = LlmAgent(name="SQLAgent", model="gemini-2.0-flash", tools=[sql_executor_tool])
reward_fn = SQLExecutionReward(db_connection=test_db)
rollout = RolloutSession(agent=sql_agent, env_tool=sql_executor_tool, reward_fn=reward_fn)
trainer = AgentLightningTrainer(rollout_session=rollout, optimizer_endpoint=VERTEX_FINETUNE_URL)
await trainer.train(num_episodes=500)
```

---

### Reference: Current ADK Hooks That Could Be Leveraged

Callbacks in `LlmAgent` already allow interception of agent lifecycle:

```python
def my_after_agent_callback(callback_context: CallbackContext) -> None:
    trajectory = callback_context.state.get("trajectory", [])
    reward = reward_fn.compute(trajectory)
    callback_context.state["reward"] = reward

agent = LlmAgent(
    name="TrainableAgent",
    model="gemini-2.0-flash",
    after_agent_callback=my_after_agent_callback,
)
```

This pattern can be extended natively within ADK to formalize a training loop.

---

### References

- [Google ADK Documentation](https://google.github.io/adk-docs/)
- [Google ADK Callbacks](https://google.github.io/adk-docs/callbacks/)
- [Vertex AI Fine-Tuning](https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-models)
- [Agent RL — RLHF for LLM Agents (Research)](https://arxiv.org/abs/2309.10814)
- [PyTorch Lightning (Inspiration for Agent Lightning pattern)](https://lightning.ai/)

---

### Questions for Maintainers

1. Is there an internal roadmap for RL-based agent training within ADK?
2. Are callback hooks (`before_agent_callback`, `after_agent_callback`) intended to be used for trajectory collection, or is there a better-supported mechanism?
3. Would the team be open to a community contribution adding a `RewardFunction` interface and a `RolloutSession` abstraction as an optional module?

---

*Suggested labels: `enhancement`, `feature-request`, `reinforcement-learning`, `agent-training`*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Implementation of Agent Lightning for Agent Reinforcement Learning (RL) #5163

🚀 Feature Request: Agent Lightning for Agent RL

Summary

Motivation

Does ADK Currently Support This?

Proposed Implementation

1. `RewardFunction` Interface

2. `RolloutSession` — Episode Collection

3. `AgentLightningTrainer` — Training Loop

Example Use Case

Reference: Current ADK Hooks That Could Be Leveraged

References

Questions for Maintainers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ADK Feature	Relevance to Agent RL
`LlmAgent` with `callbacks`	Can be used to intercept and log intermediate steps for reward computation
`BaseAgent` lifecycle hooks (`before_agent_callback`, `after_agent_callback`)	Useful for trajectory collection
`Event` and `Content` tracing	Can feed into reward labeling pipelines
Tool definitions with structured outputs	Enables evaluation of correctness per step
`InMemorySessionService` / `DatabaseSessionService`	Can persist rollout trajectories for offline RL
Multi-agent orchestration (`SequentialAgent`, `ParallelAgent`)	Can represent environment and policy agent roles

Feature Request: Implementation of Agent Lightning for Agent Reinforcement Learning (RL) #5163

Description

🚀 Feature Request: Agent Lightning for Agent RL

Summary

Motivation

Does ADK Currently Support This?

Proposed Implementation

1. RewardFunction Interface

2. RolloutSession — Episode Collection

3. AgentLightningTrainer — Training Loop

Example Use Case

Reference: Current ADK Hooks That Could Be Leveraged

References

Questions for Maintainers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `RewardFunction` Interface

2. `RolloutSession` — Episode Collection

3. `AgentLightningTrainer` — Training Loop