Skip to content

Commit 014dfa7

Browse files
committed
fixing and debugging the green agent
1 parent e96b528 commit 014dfa7

9 files changed

Lines changed: 354 additions & 209 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ wheels/
1212

1313
# Virtual environments
1414
.venv
15+
*.log

README.md

Lines changed: 107 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,144 @@
1-
# Code Translator Green Agent (Evaluator)
1+
# Code Translator Green Agent (Judge)
22

3-
This repository contains the **Green Agent** for the Code Translator system. Built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/), this agent acts as the evaluator and orchestrator for code translation scenarios.
3+
This repository contains the implementation of the **Green Agent**, a judge agent designed for the Code Translator competition. Its primary role is to evaluate code translations performed by other agents (specifically the **Purple Agent**).
44

55
## Overview
66

7-
The Green Agent is responsible for:
8-
1. **Orchestrating** the interaction between participant agents (Purple Agents).
9-
2. **Evaluating** the quality of code translations provided by participants.
10-
3. **Scoring** the submissions based on specific criteria.
7+
The Green Agent acts as an orchestrator and evaluator. When it receives a request to evaluate a code translation task:
8+
1. **Orchestration**: It requests the **Purple Agent** (Participant) to translate a given snippet of code from a source language to a target language.
9+
2. **Evaluation**: Upon receiving the translation, it uses **Google GenAI (Gemini)** to act as a judge. The judge evaluates the translation based on executing correctness, style, conciseness, and relevance.
10+
3. **Reporting**: It returns a structured evaluation containing scores, reasoning, and a winner determination.
1111

12-
### Evaluation Criteria
13-
The agent uses `gemini-2.5-flash` to judge translations based on:
14-
* **Execution Correctness**: The code must run without errors.
15-
* **Style & Documentation**: Adherence to the target language's style guides and proper commenting.
16-
* **Conciseness**: Efficient code without unnecessary boilerplate.
17-
* **Relevance**: Logical and structural equivalence to the original code.
12+
## Repository Structure
1813

19-
## Architecture
14+
- **`src/`**: Source code for the agent.
15+
- **`agent.py`**: Contains `TranslationGreenAgent`. This is the core logic that handles the evaluation workflow: validating requests, communicating with the participant agent, and invoking the Gemini model for judging.
16+
- **`server.py`**: The entry point for the application. It initializes the `TranslationGreenAgent`, wraps it in a `GreenExecutor`, and sets up the **A2A (Agent-to-Agent)** Starlette server.
17+
- **`common.py`**: Defines shared data structures and Pydantic models (e.g., `EvalRequest`, `TranslatorEval`) and the Agent Card configuration.
18+
- **`executor.py`**: Handles the execution context for the agent, providing the sandbox or environment for running the agent logic.
19+
- **`tool_provider.py`**: Provides utilities for the agent to interact with external services or other agents (e.g., `talk_to_agent` implementation).
20+
- **`client.py`**: Client-side utilities or helpers for interacting with the agent.
21+
- **`tests/`**: Test suite.
22+
- **`test_agent.py`**: Contains integration tests and A2A conformance tests to ensure the agent behaves correctly, validates schemas, and adheres to the protocol.
23+
- **`conftest.py`**: Pytest configuration and fixtures.
24+
- **`Dockerfile`**: Configuration to containerize the application for deployment.
25+
- **`pyproject.toml`**: Project configuration and dependencies.
2026

21-
* **Framework**: Google ADK (`google-adk[a2a]`)
22-
* **Model**: Gemini 2.5 Flash
23-
* **Communication**: Agent-to-Agent (A2A) Protocol
24-
* **Server**: Uvicorn + FastAPI (exposed via ADK)
27+
## Setup & Setup
2528

26-
## Prerequisites
29+
### Prerequisites
2730

28-
* Python 3.11+
29-
* [uv](https://github.com/astral-sh/uv) (recommended) or pip
30-
* Google GenAI API Key
31+
- Python 3.11+
32+
- A **Google GenAI API Key** (Gemini)
33+
- (Optional) Docker
3134

32-
## Setup & Installation
35+
### Installation
3336

34-
1. **Clone the repository:**
37+
1. **Clone the repository**:
3538
```bash
3639
git clone <repository-url>
3740
cd code_translator_green_agent
3841
```
3942

40-
2. **Configure Environment:**
41-
Create a `.env` file in the root directory:
43+
2. **Create a virtual environment** (optional but recommended):
4244
```bash
43-
GOOGLE_API_KEY=your_api_key_here
45+
python -m venv .venv
46+
source .venv/bin/activate
4447
```
4548

46-
3. **Install Dependencies:**
47-
Using `uv`:
49+
3. **Install dependencies**:
4850
```bash
49-
uv sync
51+
pip install .
52+
# Or install specific requirements
53+
pip install python-dotenv uvicorn httpx google-genai pydantic "google-adk[a2a]"
54+
```
55+
56+
4. **Environment Variables**:
57+
Create a `.env` file in the root directory (or ensure relevant environment variables are set) containing your Google API key:
58+
```env
59+
GOOGLE_API_KEY=your_google_api_key_here
5060
```
5161

5262
## Running the Agent
5363

54-
### Local Execution
55-
To run the agent server locally:
64+
### Locally
65+
66+
To start the agent server:
67+
68+
```bash
69+
python src/server.py
70+
```
71+
72+
By default, the server runs on `http://127.0.0.1:9009`.
73+
You can customize the host and port using arguments:
5674

5775
```bash
58-
uv run src/server.py --host 0.0.0.0 --port 9009
76+
python src/server.py --host 0.0.0.0 --port 8080
5977
```
6078

61-
The agent will be available at `http://localhost:9009`.
79+
### Using Docker
6280

63-
### Docker Execution
64-
To build and run using Docker:
81+
1. **Build the image**:
82+
```bash
83+
docker build -t green-agent .
84+
```
6585

66-
1. **Build the image:**
86+
2. **Run the container**:
6787
```bash
68-
docker build -t code-translator-green .
88+
docker run -p 9009:9009 --env GOOGLE_API_KEY=your_api_key green-agent
6989
```
7090

71-
2. **Run the container:**
91+
## Usage as a Judge
92+
93+
The agent is designed to be called by an orchestration layer or directly via A2A protocol. It expects a JSON payload (Evaluator Request) with the following structure:
94+
95+
```json
96+
{
97+
"participants": {
98+
"researcher_translator": "http://url-to-purple-agent"
99+
},
100+
"config": {
101+
"code_to_translate": "print('Hello World')",
102+
"source_language": "python",
103+
"target_language": "javascript"
104+
}
105+
}
106+
```
107+
108+
**The Workflow:**
109+
1. The Green Agent contacts the participant agent at the provided URL (`http://url-to-purple-agent`).
110+
2. It sends the `code_to_translate`, `source_language`, and `target_language` to the participant.
111+
3. It waits for the participant to return the translated code.
112+
4. Once received, the Green Agent constructs a prompt for the Gemini model (Judge), instructing it to evaluate the translation.
113+
5. It returns a result resembling:
114+
115+
```json
116+
{
117+
"winner": "researcher_translator",
118+
"scores": [
119+
{
120+
"participant": "researcher_translator",
121+
"score": 9
122+
}
123+
],
124+
"reasoning": "The translation is syntactically correct and preserves functionality..."
125+
}
126+
```
127+
128+
## Testing
129+
130+
To ensure the agent is functioning correctly, you can run the provided tests.
131+
132+
1. **Install test dependencies** (if not already installed):
72133
```bash
73-
docker run -p 9009:9009 --env-file .env code-translator-green
134+
pip install pytest pytest-asyncio
74135
```
75136

76-
## Project Structure
137+
2. **Run tests**:
138+
```bash
139+
pytest tests/
140+
```
77141

78-
* `src/agent.py`: Defines the ADK Agent, system prompt, and evaluation logic.
79-
* `src/server.py`: Entry point for the HTTP server.
80-
* `src/tool_provider.py`: Tools for the agent (e.g., A2A communication).
81-
* `src/common.py`: Shared data models (e.g., `TranslatorEval` schema).
142+
The `test_agent.py` contains:
143+
- **Conformance Tests**: Verifies the Agent Card and A2A protocol structure (e.g., proper message formats, capabilities).
144+
- **Message Validation**: Ensures that request and response payloads adhere to the defined schemas.

src/agent.py

Lines changed: 141 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
11
from google.adk.agents import Agent
22
from google.adk.tools import FunctionTool
3-
from src.common import TranslatorEval
3+
from src.common import TranslatorEval, EvalRequest
44
from src.tool_provider import ToolProvider
5+
from src.executor import GreenAgent
6+
from a2a.utils import new_agent_text_message
7+
from a2a.server.tasks import TaskUpdater
8+
from a2a.server.tasks import TaskUpdater
9+
import json
10+
import os
11+
from google import genai
12+
from google.genai import types
513

614
SYSTEM_PROMPT = '''
7-
you are an expert evaluation agent specialized in evaluating code and programming languages translation and
15+
you are an expert evaluation agent specialized in evaluating code and programming languages translation and
816
how efficient it is to run without errors, and judging a successful translation requires the following
917
considerations:
1018
@@ -24,15 +32,134 @@
2432
in general the translation needs to be clear, clean and error free.
2533
'''
2634

27-
def create_judge_agent(tool_provider: ToolProvider) -> Agent:
28-
return Agent(
29-
name="translator_judge_adk",
30-
model="gemini-2.5-flash",
31-
description=(
32-
"assess the quality of the programming language translation given and which one is better meeting the criteria"
33-
),
34-
instruction=SYSTEM_PROMPT,
35-
tools=[FunctionTool(func=tool_provider.talk_to_agent)],
36-
output_schema=TranslatorEval,
37-
after_agent_callback=lambda callback_context: tool_provider.reset()
38-
)
35+
class TranslationGreenAgent(GreenAgent):
36+
def __init__(self, tool_provider: ToolProvider):
37+
self._tool_provider = tool_provider
38+
# Initialize Gemini Client
39+
self.client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
40+
41+
# Removed _create_judge_agent as we use genai.Client directly
42+
43+
def validate_request(self, request: EvalRequest) -> tuple[bool, str]:
44+
if not request.participants:
45+
return False, "No participants provided in the evaluation request."
46+
if len(request.participants) > 1:
47+
return False, "Only one participant is supported per evaluation."
48+
if "code_to_translate" not in request.config:
49+
return False, "Missing 'code_to_translate' in config."
50+
if "source_language" not in request.config:
51+
return False, "Missing 'source_language' in config."
52+
if "target_language" not in request.config:
53+
return False, "Missing 'target_language' in config."
54+
return True, ""
55+
56+
async def run_eval(self, request: EvalRequest, updater: TaskUpdater) -> None:
57+
# Extract the single participant
58+
role, endpoint = next(iter(request.participants.items()))
59+
code_to_translate = request.config["code_to_translate"]
60+
source_language = request.config["source_language"]
61+
target_language = request.config["target_language"]
62+
63+
# Step 1: Request translation from the participant agent
64+
await updater.update_status(
65+
"working",
66+
new_agent_text_message(f"Requesting translation from participant '{role}'...")
67+
)
68+
try:
69+
# Send the code to translate to the participant agent
70+
print(f"[DEBUG] Sending message to Purple Agent at {endpoint}", flush=True)
71+
response = await self._tool_provider.talk_to_agent(
72+
url=endpoint,
73+
message=json.dumps({
74+
"code_to_translate": code_to_translate,
75+
"source_language": source_language,
76+
"target_language": target_language
77+
})
78+
)
79+
print(f"[DEBUG] Received response from Purple Agent: '{response}'", flush=True)
80+
# The response is expected to be a JSON string with the translated code
81+
translated_code_data = json.loads(response)
82+
translated_code = translated_code_data.get("translated_code", "")
83+
84+
if not translated_code:
85+
await updater.failed(new_agent_text_message("Participant did not return translated code."))
86+
return
87+
88+
except Exception as e:
89+
print(f"[DEBUG] Exception communicating with participant: {e}", flush=True)
90+
await updater.failed(new_agent_text_message(f"Error communicating with participant: {e}"))
91+
return
92+
93+
await updater.update_status(
94+
"working",
95+
new_agent_text_message("Received translated code. Evaluating...")
96+
)
97+
98+
# Step 2: Use the judge agent to evaluate the translated code
99+
prompt = f"""
100+
{SYSTEM_PROMPT}
101+
102+
Please evaluate the following code translation based on the criteria:
103+
- Execution Correctness
104+
- Style & Documentation
105+
- Conciseness
106+
- Relevance
107+
108+
Original {source_language} code:
109+
```
110+
{code_to_translate}
111+
```
112+
113+
Translated {target_language} code (from participant '{role}'):
114+
```
115+
{translated_code}
116+
```
117+
118+
Provide your evaluation in the TranslatorEval schema, including reasoning, winner (the participant's role if it's a good translation, or 'N/A' otherwise), and scores.
119+
"""
120+
models_to_try = [
121+
"gemini-2.5-flash",
122+
"gemini-2.0-flash",
123+
"gemini-flash-latest",
124+
"gemini-pro-latest",
125+
"gemini-2.5-pro"
126+
]
127+
128+
last_error = None
129+
for model in models_to_try:
130+
try:
131+
print(f"[DEBUG] Trying evaluation with model: {model}")
132+
response = await self.client.aio.models.generate_content(
133+
model=model,
134+
contents=prompt,
135+
config=types.GenerateContentConfig(
136+
response_mime_type='application/json',
137+
response_schema=TranslatorEval
138+
)
139+
)
140+
eval_result: TranslatorEval = response.parsed
141+
142+
# If parsed is None (should not happen with structured output)
143+
if not eval_result:
144+
raise ValueError("Model failed to return structured output")
145+
146+
await updater.update_status(
147+
"completed",
148+
new_agent_text_message(f"Evaluation complete. Winner: {eval_result.winner}, Scores: {eval_result.scores}")
149+
)
150+
# You might want to store the full eval_result or just the scores in the task result
151+
await updater.update_result(eval_result.model_dump())
152+
return # Assessment successful, exit function
153+
154+
except Exception as e:
155+
print(f"[DEBUG] Model {model} failed: {e}")
156+
last_error = e
157+
# Check for resource exhausted and wait if needed
158+
if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
159+
print("[DEBUG] Quota exhausted. Waiting 30 seconds before trying next model...", flush=True)
160+
import asyncio
161+
await asyncio.sleep(30)
162+
# Continue to next model
163+
164+
# If all models failed
165+
await updater.failed(new_agent_text_message(f"All evaluation models failed. Last error: {last_error}"))

0 commit comments

Comments
 (0)