Samir-atra
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 107 additions & 44 deletions b/‎README.md‎
Lines changed: 107 additions & 44 deletions
diff --git a/‎src/agent.py‎
Lines changed: 141 additions & 14 deletions b/‎src/agent.py‎
Lines changed: 141 additions & 14 deletions
@@ -12,3 +12,4 @@ wheels/
 
 # Virtual environments
 .venv
+*.log
@@ -1,81 +1,144 @@
-# Code Translator Green Agent (Evaluator)
+# Code Translator Green Agent (Judge)
 
-This repository contains the **Green Agent** for the Code Translator system. Built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/), this agent acts as the evaluator and orchestrator for code translation scenarios.
+This repository contains the implementation of the **Green Agent**, a judge agent designed for the Code Translator competition. Its primary role is to evaluate code translations performed by other agents (specifically the **Purple Agent**).
 
 ## Overview
 
-The Green Agent is responsible for:
-1.  **Orchestrating** the interaction between participant agents (Purple Agents).
-2.  **Evaluating** the quality of code translations provided by participants.
-3.  **Scoring** the submissions based on specific criteria.
+The Green Agent acts as an orchestrator and evaluator. When it receives a request to evaluate a code translation task:
+1.  **Orchestration**: It requests the **Purple Agent** (Participant) to translate a given snippet of code from a source language to a target language.
+2.  **Evaluation**: Upon receiving the translation, it uses **Google GenAI (Gemini)** to act as a judge. The judge evaluates the translation based on executing correctness, style, conciseness, and relevance.
+3.  **Reporting**: It returns a structured evaluation containing scores, reasoning, and a winner determination.
 
-### Evaluation Criteria
-The agent uses `gemini-2.5-flash` to judge translations based on:
-*   **Execution Correctness**: The code must run without errors.
-*   **Style & Documentation**: Adherence to the target language's style guides and proper commenting.
-*   **Conciseness**: Efficient code without unnecessary boilerplate.
-*   **Relevance**: Logical and structural equivalence to the original code.
+## Repository Structure
 
-## Architecture
+-   **`src/`**: Source code for the agent.
+    -   **`agent.py`**: Contains `TranslationGreenAgent`. This is the core logic that handles the evaluation workflow: validating requests, communicating with the participant agent, and invoking the Gemini model for judging.
+    -   **`server.py`**: The entry point for the application. It initializes the `TranslationGreenAgent`, wraps it in a `GreenExecutor`, and sets up the **A2A (Agent-to-Agent)** Starlette server.
+    -   **`common.py`**: Defines shared data structures and Pydantic models (e.g., `EvalRequest`, `TranslatorEval`) and the Agent Card configuration.
+    -   **`executor.py`**: Handles the execution context for the agent, providing the sandbox or environment for running the agent logic.
+    -   **`tool_provider.py`**: Provides utilities for the agent to interact with external services or other agents (e.g., `talk_to_agent` implementation).
+    -   **`client.py`**: Client-side utilities or helpers for interacting with the agent.
+-   **`tests/`**: Test suite.
+    -   **`test_agent.py`**: Contains integration tests and A2A conformance tests to ensure the agent behaves correctly, validates schemas, and adheres to the protocol.
+    -   **`conftest.py`**: Pytest configuration and fixtures.
+-   **`Dockerfile`**: Configuration to containerize the application for deployment.
+-   **`pyproject.toml`**: Project configuration and dependencies.
 
-*   **Framework**: Google ADK (`google-adk[a2a]`)
-*   **Model**: Gemini 2.5 Flash
-*   **Communication**: Agent-to-Agent (A2A) Protocol
-*   **Server**: Uvicorn + FastAPI (exposed via ADK)
+## Setup & Setup
 
-## Prerequisites
+### Prerequisites
 
-*   Python 3.11+
-*   [uv](https://github.com/astral-sh/uv) (recommended) or pip
-*   Google GenAI API Key
+-   Python 3.11+
+-   A **Google GenAI API Key** (Gemini)
+-   (Optional) Docker
 
-## Setup & Installation
+### Installation
 
-1.  **Clone the repository:**
+1.  **Clone the repository**:
     ```bash
     git clone <repository-url>
     cd code_translator_green_agent
     ```
 
-2.  **Configure Environment:**
-    Create a `.env` file in the root directory:
+2.  **Create a virtual environment** (optional but recommended):
     ```bash
-    GOOGLE_API_KEY=your_api_key_here
+    python -m venv .venv
+    source .venv/bin/activate
     ```
 
-3.  **Install Dependencies:**
-    Using `uv`:
+3.  **Install dependencies**:
     ```bash
-    uv sync
+    pip install .
+    # Or install specific requirements
+    pip install python-dotenv uvicorn httpx google-genai pydantic "google-adk[a2a]"
+    ```
+    
+4.  **Environment Variables**:
+    Create a `.env` file in the root directory (or ensure relevant environment variables are set) containing your Google API key:
+    ```env
+    GOOGLE_API_KEY=your_google_api_key_here
     ```
 
 ## Running the Agent
 
-### Local Execution
-To run the agent server locally:
+### Locally
+
+To start the agent server:
+
+```bash
+python src/server.py
+```
+
+By default, the server runs on `http://127.0.0.1:9009`.
+You can customize the host and port using arguments:
 
 ```bash
-uv run src/server.py --host 0.0.0.0 --port 9009
+python src/server.py --host 0.0.0.0 --port 8080
 ```
 
-The agent will be available at `http://localhost:9009`.
+### Using Docker
 
-### Docker Execution
-To build and run using Docker:
+1.  **Build the image**:
+    ```bash
+    docker build -t green-agent .
+    ```
 
-1.  **Build the image:**
+2.  **Run the container**:
     ```bash
-    docker build -t code-translator-green .
+    docker run -p 9009:9009 --env GOOGLE_API_KEY=your_api_key green-agent
     ```
 
-2.  **Run the container:**
+## Usage as a Judge
+
+The agent is designed to be called by an orchestration layer or directly via A2A protocol. It expects a JSON payload (Evaluator Request) with the following structure:
+
+```json
+{
+  "participants": {
+    "researcher_translator": "http://url-to-purple-agent"
+  },
+  "config": {
+    "code_to_translate": "print('Hello World')",
+    "source_language": "python",
+    "target_language": "javascript"
+  }
+}
+```
+
+**The Workflow:**
+1.  The Green Agent contacts the participant agent at the provided URL (`http://url-to-purple-agent`).
+2.  It sends the `code_to_translate`, `source_language`, and `target_language` to the participant.
+3.  It waits for the participant to return the translated code.
+4.  Once received, the Green Agent constructs a prompt for the Gemini model (Judge), instructing it to evaluate the translation.
+5.  It returns a result resembling:
+
+```json
+{
+  "winner": "researcher_translator",
+  "scores": [
+    {
+      "participant": "researcher_translator",
+      "score": 9
+    }
+  ],
+  "reasoning": "The translation is syntactically correct and preserves functionality..."
+}
+```
+
+## Testing
+
+To ensure the agent is functioning correctly, you can run the provided tests.
+
+1.  **Install test dependencies** (if not already installed):
     ```bash
-    docker run -p 9009:9009 --env-file .env code-translator-green
+    pip install pytest pytest-asyncio
     ```
 
-## Project Structure
+2.  **Run tests**:
+    ```bash
+    pytest tests/
+    ```
 
-*   `src/agent.py`: Defines the ADK Agent, system prompt, and evaluation logic.
-*   `src/server.py`: Entry point for the HTTP server.
-*   `src/tool_provider.py`: Tools for the agent (e.g., A2A communication).
-*   `src/common.py`: Shared data models (e.g., `TranslatorEval` schema).
+    The `test_agent.py` contains:
+    -   **Conformance Tests**: Verifies the Agent Card and A2A protocol structure (e.g., proper message formats, capabilities).
+    -   **Message Validation**: Ensures that request and response payloads adhere to the defined schemas.
@@ -1,10 +1,18 @@
 from google.adk.agents import Agent
 from google.adk.tools import FunctionTool
-from src.common import TranslatorEval
+from src.common import TranslatorEval, EvalRequest
 from src.tool_provider import ToolProvider
+from src.executor import GreenAgent
+from a2a.utils import new_agent_text_message
+from a2a.server.tasks import TaskUpdater
+from a2a.server.tasks import TaskUpdater
+import json
+import os
+from google import genai
+from google.genai import types
 
 SYSTEM_PROMPT = '''
-you are an expert evaluation agent specialized in evaluating code and programming languages translation and 
+you are an expert evaluation agent specialized in evaluating code and programming languages translation and
 how efficient it is to run without errors, and judging a successful translation requires the following
 considerations:
 
@@ -24,15 +32,134 @@
 in general the translation needs to be clear, clean and error free.
 '''
 
-def create_judge_agent(tool_provider: ToolProvider) -> Agent:
-    return Agent(
-        name="translator_judge_adk",
-        model="gemini-2.5-flash",
-        description=(
-            "assess the quality of the programming language translation given and which one is better meeting the criteria"
-        ),
-        instruction=SYSTEM_PROMPT,
-        tools=[FunctionTool(func=tool_provider.talk_to_agent)],
-        output_schema=TranslatorEval,
-        after_agent_callback=lambda callback_context: tool_provider.reset()
-    )
+class TranslationGreenAgent(GreenAgent):
+    def __init__(self, tool_provider: ToolProvider):
+        self._tool_provider = tool_provider
+        # Initialize Gemini Client
+        self.client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
+
+    # Removed _create_judge_agent as we use genai.Client directly
+
+    def validate_request(self, request: EvalRequest) -> tuple[bool, str]:
+        if not request.participants:
+            return False, "No participants provided in the evaluation request."
+        if len(request.participants) > 1:
+            return False, "Only one participant is supported per evaluation."
+        if "code_to_translate" not in request.config:
+            return False, "Missing 'code_to_translate' in config."
+        if "source_language" not in request.config:
+            return False, "Missing 'source_language' in config."
+        if "target_language" not in request.config:
+            return False, "Missing 'target_language' in config."
+        return True, ""
+
+    async def run_eval(self, request: EvalRequest, updater: TaskUpdater) -> None:
+        # Extract the single participant
+        role, endpoint = next(iter(request.participants.items()))
+        code_to_translate = request.config["code_to_translate"]
+        source_language = request.config["source_language"]
+        target_language = request.config["target_language"]
+
+        # Step 1: Request translation from the participant agent
+        await updater.update_status(
+            "working",
+            new_agent_text_message(f"Requesting translation from participant '{role}'...")
+        )
+        try:
+            # Send the code to translate to the participant agent
+            print(f"[DEBUG] Sending message to Purple Agent at {endpoint}", flush=True)
+            response = await self._tool_provider.talk_to_agent(
+                url=endpoint,
+                message=json.dumps({
+                    "code_to_translate": code_to_translate,
+                    "source_language": source_language,
+                    "target_language": target_language
+                })
+            )
+            print(f"[DEBUG] Received response from Purple Agent: '{response}'", flush=True)
+            # The response is expected to be a JSON string with the translated code
+            translated_code_data = json.loads(response)
+            translated_code = translated_code_data.get("translated_code", "")
+
+            if not translated_code:
+                await updater.failed(new_agent_text_message("Participant did not return translated code."))
+                return
+
+        except Exception as e:
+            print(f"[DEBUG] Exception communicating with participant: {e}", flush=True)
+            await updater.failed(new_agent_text_message(f"Error communicating with participant: {e}"))
+            return
+
+        await updater.update_status(
+            "working",
+            new_agent_text_message("Received translated code. Evaluating...")
+        )
+
+        # Step 2: Use the judge agent to evaluate the translated code
+        prompt = f"""
+{SYSTEM_PROMPT}
+
+Please evaluate the following code translation based on the criteria:
+- Execution Correctness
+- Style & Documentation
+- Conciseness
+- Relevance
+
+Original {source_language} code:
+```
+{code_to_translate}
+```
+
+Translated {target_language} code (from participant '{role}'):
+```
+{translated_code}
+```
+
+Provide your evaluation in the TranslatorEval schema, including reasoning, winner (the participant's role if it's a good translation, or 'N/A' otherwise), and scores.
+"""
+        models_to_try = [
+            "gemini-2.5-flash",
+            "gemini-2.0-flash",
+            "gemini-flash-latest",
+            "gemini-pro-latest",
+            "gemini-2.5-pro"
+        ]
+        
+        last_error = None
+        for model in models_to_try:
+            try:
+                print(f"[DEBUG] Trying evaluation with model: {model}")
+                response = await self.client.aio.models.generate_content(
+                    model=model,
+                    contents=prompt,
+                    config=types.GenerateContentConfig(
+                        response_mime_type='application/json',
+                        response_schema=TranslatorEval
+                    )
+                )
+                eval_result: TranslatorEval = response.parsed
+                
+                # If parsed is None (should not happen with structured output)
+                if not eval_result:
+                     raise ValueError("Model failed to return structured output")
+    
+                await updater.update_status(
+                    "completed",
+                    new_agent_text_message(f"Evaluation complete. Winner: {eval_result.winner}, Scores: {eval_result.scores}")
+                )
+                # You might want to store the full eval_result or just the scores in the task result
+                await updater.update_result(eval_result.model_dump())
+                return # Assessment successful, exit function
+
+            except Exception as e:
+                print(f"[DEBUG] Model {model} failed: {e}")
+                last_error = e
+                # Check for resource exhausted and wait if needed
+                if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
+                    print("[DEBUG] Quota exhausted. Waiting 30 seconds before trying next model...", flush=True)
+                    import asyncio
+                    await asyncio.sleep(30)
+                # Continue to next model
+        
+        # If all models failed
+        await updater.failed(new_agent_text_message(f"All evaluation models failed. Last error: {last_error}"))
Original file line number	Diff line number	Diff line change
`@@ -12,3 +12,4 @@ wheels/`
`12`	`12`
`13`	`13`	`# Virtual environments`
`14`	`14`	`.venv`
	`15`	`+*.log`