🌟 Purpose
This repository builds on version 4 of our multi-agent A2A + MCP architecture by adding a Gemini-based VisionAgent that can analyze images via file path or URL. You’ll see how a lightweight front-end client, a central Host OrchestratorAgent, and specialized agents (like VisionAgent, TellTimeAgent, and GreetingAgent) collaborate over Google’s Agent-to-Agent (A2A) protocol and Anthropic’s Model Context Protocol (MCP).
- Gemini Vision Agent – Accepts text+image input and answers image-based questions using
gemini-2.0-flash - A2A Protocol – Agents discover and call each other over JSON-RPC
- MCP Integration – Automatically discovers and invokes tools hosted via MCP
- Orchestrator Agent – A central Gemini-powered router that connects everything
- Modular & Extensible – Add agents or tools by updating JSON config
version_4p01_with_vision_agent/
├── .env # Contains GOOGLE_API_KEY (gitignored)
├── pyproject.toml # Project metadata & dependencies
├── README.md # This file
├── utilities/
│ ├── a2a/
│ │ ├── agent_discovery.py # Loads agent_registry.json
│ │ ├── agent_connect.py # Calls agents using JSON-RPC
│ │ └── agent_registry.json # Defines A2A agents (VisionAgent, etc.)
│ └── mcp/
│ ├── mcp_discovery.py # Loads MCP servers
│ ├── mcp_connect.py # Loads & calls tools via MCP
│ └── mcp_config.json # Defines MCP servers & tools
├── agents/
│ ├── tell_time_agent/ # Returns the current time
│ ├── greeting_agent/ # Returns a poetic greeting based on time of day
│ ├── vision_agent/ # NEW: Accepts image + query
│ │ ├── __main__.py # Starts the vision agent
│ │ ├── agent.py # Loads Gemini + handles image input
│ │ └── task_manager.py # Routes incoming tasks to the agent
│ └── host_agent/ # Central orchestrator
│ ├── entry.py
│ ├── orchestrator.py
├── server/
│ ├── server.py # A2A JSON-RPC server
│ └── task_manager.py # In-memory task tracking
├── client/
│ ├── a2a_client.py # Makes JSON-RPC task requests
├── app/
│ └── cmd/
│ └── cmd.py # CLI to interact with host agent
└── models/
├── agent.py
├── json_rpc.py
├── request.py
└── task.py- Python 3.11+
- uv
- Valid
GOOGLE_API_KEYwith Gemini access
# Clone the repo
cd version_4p01_with_vision_agent
uv venv
source .venv/bin/activate
uv sync --all-groups
# Add your Gemini API Key
echo "GOOGLE_API_KEY=your_key_here" > .env# VisionAgent
uv run python3 -m agents.vision_agent --host localhost --port 10003
# TellTimeAgent
uv run python3 -m agents.tell_time_agent --host localhost --port 10002
# GreetingAgent
uv run python3 -m agents.greeting_agent --host localhost --port 10001uv run python3 -m agents.host_agent.entry --host localhost --port 10000uv run python3 -m app.cmd.cmd --agent http://localhost:10000- Client → Sends query + image to HostAgent
- HostAgent → Detects VisionAgent via A2A registry, forwards task
- VisionAgent → Uses Gemini to analyze the image & reply
- Also includes → TellTimeAgent, GreetingAgent, and MCP Tool support
- How to combine Gemini LLM with vision and external query routing
- How to architect multi-agent workflows using A2A
- How to wrap visual AI into callable services