Skip to content

Commit 762b11e

Browse files
authored
feat: real APIs, Bedrock LLM integration, and Travel Agent Control Panel (#273)
* feat: replace mock tools with real free APIs and add Travel Agent Control Panel Replace dummy/hardcoded tool implementations in the multi-agent planner with real free APIs that require no API keys: - Weather: Open-Meteo API (geocoding + forecast) - Attractions: Wikipedia API (page summary + search) - Currency: Frankfurter API (live ECB exchange rates) - Flights: realistic simulated data (no free API exists) Add a Travel Agent Control Panel (fault-panel) service that provides a web UI for toggling fault injection scenarios in real time. The canary polls the panel each cycle for live config updates. Features: - One-click presets: All Clean, Default, Chaos, Latency Spike, Cascading Failure, Deep Traces, Custom - Read-only bar visualization for presets, editable sliders for Custom - Each preset shows what to look for in the observability dashboards - State persists to a JSON file across container restarts - Master switch to pause/resume canary traffic The orchestrator now calls 4 tools (weather, attractions, flights, currency) producing deeper and more varied trace waterfalls ideal for observability demos. Signed-off-by: ps48 <pshenoy36@gmail.com> * feat: add Amazon Bedrock LLM integration with live toggle Add real LLM support via Amazon Bedrock Converse API (Claude Opus 4.8) with a live toggle in the Travel Agent Control Panel: - New bedrock_client.py utility in each agent (tool_use, fallback handling) - Orchestrator: Bedrock generates trip plan and synthesizes recommendation - Weather Agent: full agentic tool_use loop (select tool → execute → respond) - Events Agent: Bedrock reasons about attractions before MCP call - Graceful fallback to mock mode if no AWS credentials or Bedrock errors Also: - Fix fault-panel internal port to 8085 (avoids conflict with OTel demo Envoy) - Add use_real_llm toggle to control panel state + UI - Pass AWS credentials + BEDROCK_MODEL_ID through docker-compose env vars - Add boto3 + requests to all agent Dockerfiles - Comprehensive README covering real APIs, Bedrock, control panel, faults Signed-off-by: ps48 <pshenoy36@gmail.com> * chore: switch default model to Haiku 4.5 with global. prefix Address review feedback: - Default to claude-haiku-4-5 (cheapest) instead of opus-4-8 - Use global. prefix for cross-region inference profile Signed-off-by: ps48 <pshenoy36@gmail.com> * fix: weather-agent NameError when Bedrock succeeds The Bedrock success path never assigned llm_response, causing a NameError at the post-branch logging/metrics code that referenced llm_response["id"] and llm_response["model"]. Use conditional variables that work for both Bedrock and mock paths. Signed-off-by: ps48 <pshenoy36@gmail.com> --------- Signed-off-by: ps48 <pshenoy36@gmail.com>
1 parent 6b9223e commit 762b11e

16 files changed

Lines changed: 1879 additions & 287 deletions

File tree

docker-compose.examples.yml

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,18 @@ services:
2727
# FastAPI server endpoint
2828
- "${WEATHER_AGENT_PORT}:8000"
2929
environment:
30-
# Override OTLP endpoint to use docker network
3130
- OTEL_EXPORTER_OTLP_ENDPOINT=http://${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT_GRPC}
3231
- MCP_SERVER_URL=http://mcp-server:8003
32+
- FAULT_PANEL_URL=http://fault-panel:8085
33+
- AWS_REGION=${AWS_REGION:-us-west-2}
34+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
35+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:-}
36+
- AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN:-}
37+
- BEDROCK_MODEL_ID=${BEDROCK_MODEL_ID:-global.anthropic.claude-haiku-4-5-20251001-v1:0}
3338
depends_on:
3439
- otel-collector
3540
- example-mcp-server
41+
- example-fault-panel
3642
networks:
3743
- observability-stack-network
3844
restart: unless-stopped
@@ -42,6 +48,25 @@ services:
4248
memory: ${WEATHER_AGENT_MEMORY_LIMIT}
4349
logging: *logging
4450

51+
# Travel Agent Control Panel - UI for toggling fault injection scenarios
52+
example-fault-panel:
53+
build:
54+
context: ./docker-compose/fault-panel
55+
dockerfile: Dockerfile
56+
container_name: fault-panel
57+
ports:
58+
- "${FAULT_PANEL_PORT:-8085}:8085"
59+
volumes:
60+
- fault-panel-data:/data
61+
networks:
62+
- observability-stack-network
63+
restart: unless-stopped
64+
deploy:
65+
resources:
66+
limits:
67+
memory: 64M
68+
logging: *logging
69+
4570
# Canary Service - Periodically invokes travel-planner for testing
4671
example-canary:
4772
build:
@@ -52,11 +77,13 @@ services:
5277
- TRAVEL_PLANNER_URL=http://travel-planner:8000
5378
- WEATHER_AGENT_URL=http://weather-agent:8000
5479
- EVENTS_AGENT_URL=http://events-agent:8002
80+
- FAULT_PANEL_URL=http://fault-panel:8085
5581
- CANARY_INTERVAL=${CANARY_INTERVAL}
5682
depends_on:
5783
- example-travel-planner
5884
- example-weather-agent
5985
- example-events-agent
86+
- example-fault-panel
6087
networks:
6188
- observability-stack-network
6289
restart: unless-stopped
@@ -79,6 +106,12 @@ services:
79106
- WEATHER_AGENT_URL=http://weather-agent:8000
80107
- EVENTS_AGENT_URL=http://events-agent:8002
81108
- MCP_SERVER_URL=http://mcp-server:8003
109+
- FAULT_PANEL_URL=http://fault-panel:8085
110+
- AWS_REGION=${AWS_REGION:-us-west-2}
111+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
112+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:-}
113+
- AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN:-}
114+
- BEDROCK_MODEL_ID=${BEDROCK_MODEL_ID:-global.anthropic.claude-haiku-4-5-20251001-v1:0}
82115
depends_on:
83116
- otel-collector
84117
- example-weather-agent
@@ -125,9 +158,16 @@ services:
125158
environment:
126159
- OTEL_EXPORTER_OTLP_ENDPOINT=http://${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT_GRPC}
127160
- MCP_SERVER_URL=http://mcp-server:8003
161+
- FAULT_PANEL_URL=http://fault-panel:8085
162+
- AWS_REGION=${AWS_REGION:-us-west-2}
163+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
164+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:-}
165+
- AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN:-}
166+
- BEDROCK_MODEL_ID=${BEDROCK_MODEL_ID:-global.anthropic.claude-haiku-4-5-20251001-v1:0}
128167
depends_on:
129168
- otel-collector
130169
- example-mcp-server
170+
- example-fault-panel
131171
networks:
132172
- observability-stack-network
133173
restart: unless-stopped
@@ -160,4 +200,7 @@ services:
160200
resources:
161201
limits:
162202
memory: ${CANARY_MEMORY_LIMIT}
163-
logging: *logging
203+
logging: *logging
204+
205+
volumes:
206+
fault-panel-data:

docker-compose/canary/canary.py

Lines changed: 70 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,13 @@
22
"""
33
Canary Service - Periodic Travel Planner Invocation with Fault Injection
44
5+
Polls the Fault Control Panel for live configuration (fault weights, trace shapes,
6+
interval). Falls back to env-var defaults if the panel is unreachable.
7+
58
Generates traces with varying depths and shapes:
6-
- "normal": standard orchestrator call (37 spans, 4 services)
9+
- "normal": standard orchestrator call (40+ spans, 5 services, includes flights + currency)
710
- "shallow": direct sub-agent call bypassing orchestrator (5-8 spans, 1-2 services)
8-
- "deep": multi-destination comparison via sequential orchestrator calls (70+ spans)
11+
- "deep": multi-destination comparison via sequential orchestrator calls (100+ spans)
912
"""
1013

1114
import json
@@ -19,11 +22,13 @@
1922
TRAVEL_PLANNER_URL = os.getenv("TRAVEL_PLANNER_URL", "http://travel-planner:8000")
2023
WEATHER_AGENT_URL = os.getenv("WEATHER_AGENT_URL", "http://weather-agent:8000")
2124
EVENTS_AGENT_URL = os.getenv("EVENTS_AGENT_URL", "http://events-agent:8002")
25+
FAULT_PANEL_URL = os.getenv("FAULT_PANEL_URL", "http://fault-panel:8085")
2226
CANARY_INTERVAL = int(os.getenv("CANARY_INTERVAL", "30"))
2327

2428
DESTINATIONS = ["Paris", "Tokyo", "London", "Berlin", "Sydney", "New York", "Mumbai", "Seattle"]
29+
ORIGINS = ["Portland", "Seattle", "San Francisco", "New York", "Chicago", "Denver", "Austin", "Boston"]
2530

26-
# Fault weights (applied to normal traces only)
31+
# Defaults (used when fault panel is unreachable)
2732
DEFAULT_FAULT_WEIGHTS = {
2833
"none": 0.50,
2934
"weather_error": 0.10,
@@ -33,15 +38,12 @@
3338
"events_rate_limited": 0.07,
3439
"partial_failure": 0.10,
3540
}
36-
FAULT_WEIGHTS = json.loads(os.getenv("FAULT_WEIGHTS", json.dumps(DEFAULT_FAULT_WEIGHTS)))
3741

38-
# Trace shape weights control the mix of shallow / normal / deep traces
3942
DEFAULT_TRACE_SHAPE_WEIGHTS = {
4043
"normal": 0.60,
4144
"shallow": 0.25,
4245
"deep": 0.15,
4346
}
44-
TRACE_SHAPE_WEIGHTS = json.loads(os.getenv("TRACE_SHAPE_WEIGHTS", json.dumps(DEFAULT_TRACE_SHAPE_WEIGHTS)))
4547

4648
FAULT_CONFIGS = {
4749
"none": None,
@@ -60,8 +62,37 @@ def weighted_choice(weights_dict):
6062
return random.choices(keys, weights=weights, k=1)[0]
6163

6264

63-
def select_fault():
64-
selected = weighted_choice(FAULT_WEIGHTS)
65+
def fetch_config():
66+
"""Poll fault panel for current config. Returns None if unreachable."""
67+
try:
68+
resp = requests.get(f"{FAULT_PANEL_URL}/config", timeout=2)
69+
if resp.status_code == 200:
70+
return resp.json()
71+
except Exception:
72+
pass
73+
return None
74+
75+
76+
def get_config():
77+
"""Get active config from panel or fall back to defaults."""
78+
panel_config = fetch_config()
79+
if panel_config:
80+
return {
81+
"enabled": panel_config.get("enabled", True),
82+
"fault_weights": panel_config.get("fault_weights", DEFAULT_FAULT_WEIGHTS),
83+
"trace_shape_weights": panel_config.get("trace_shape_weights", DEFAULT_TRACE_SHAPE_WEIGHTS),
84+
"canary_interval": panel_config.get("canary_interval", CANARY_INTERVAL),
85+
}
86+
return {
87+
"enabled": True,
88+
"fault_weights": json.loads(os.getenv("FAULT_WEIGHTS", json.dumps(DEFAULT_FAULT_WEIGHTS))),
89+
"trace_shape_weights": json.loads(os.getenv("TRACE_SHAPE_WEIGHTS", json.dumps(DEFAULT_TRACE_SHAPE_WEIGHTS))),
90+
"canary_interval": CANARY_INTERVAL,
91+
}
92+
93+
94+
def select_fault(fault_weights):
95+
selected = weighted_choice(fault_weights)
6596
return selected, FAULT_CONFIGS.get(selected)
6697

6798

@@ -76,20 +107,29 @@ def check_health():
76107
return False
77108

78109

79-
def invoke_normal(destination):
80-
"""Standard orchestrator call — produces normal-depth traces."""
81-
fault_name, fault_config = select_fault()
82-
payload = {"destination": destination}
110+
def invoke_normal(destination, fault_weights):
111+
"""Standard orchestrator call — produces normal-depth traces with flights + currency."""
112+
fault_name, fault_config = select_fault(fault_weights)
113+
origin = random.choice(ORIGINS)
114+
payload = {"destination": destination, "origin": origin}
83115
if fault_config:
84116
payload["fault"] = fault_config
85117

86-
print(f" [normal] {destination} (fault: {fault_name})")
118+
print(f" [normal] {origin}{destination} (fault: {fault_name})")
87119
response = requests.post(f"{TRAVEL_PLANNER_URL}/plan", json=payload, timeout=60)
88120
data = response.json()
89121

90122
if response.status_code == 200:
91123
status = "partial" if data.get("partial") else "ok"
92-
print(f" → {status}")
124+
has_flights = "flights" in data and data["flights"]
125+
has_currency = "currency" in data and data["currency"]
126+
extras = []
127+
if has_flights:
128+
extras.append("flights")
129+
if has_currency:
130+
extras.append("currency")
131+
extra_str = f" +{','.join(extras)}" if extras else ""
132+
print(f" → {status}{extra_str}")
93133
return True
94134
print(f" → error: {response.status_code}")
95135
return False
@@ -118,10 +158,11 @@ def invoke_shallow(destination):
118158

119159
def invoke_deep(destinations):
120160
"""Sequential multi-destination calls — produces deep traces."""
121-
print(f" [deep] comparing {len(destinations)} destinations: {', '.join(destinations)}")
161+
origin = random.choice(ORIGINS)
162+
print(f" [deep] comparing {len(destinations)} destinations from {origin}: {', '.join(destinations)}")
122163
results = 0
123164
for dest in destinations:
124-
payload = {"destination": dest}
165+
payload = {"destination": dest, "origin": origin}
125166
try:
126167
response = requests.post(f"{TRAVEL_PLANNER_URL}/plan", json=payload, timeout=60)
127168
if response.status_code == 200:
@@ -136,8 +177,8 @@ def main():
136177
print("=" * 50)
137178
print("Canary - Travel Planner with Fault Injection")
138179
print(f"URL: {TRAVEL_PLANNER_URL}")
139-
print(f"Interval: {CANARY_INTERVAL}s")
140-
print(f"Trace shapes: {json.dumps(TRACE_SHAPE_WEIGHTS)}")
180+
print(f"Fault Panel: {FAULT_PANEL_URL}")
181+
print(f"Default Interval: {CANARY_INTERVAL}s")
141182
print("=" * 50)
142183

143184
# Wait for service
@@ -154,9 +195,16 @@ def main():
154195

155196
while True:
156197
try:
198+
config = get_config()
199+
200+
if not config["enabled"]:
201+
print(f"[{datetime.now().strftime('%H:%M:%S')}] paused (disabled via panel)")
202+
time.sleep(config["canary_interval"])
203+
continue
204+
157205
count += 1
158206
timestamp = datetime.now().strftime("%H:%M:%S")
159-
shape = weighted_choice(TRACE_SHAPE_WEIGHTS)
207+
shape = weighted_choice(config["trace_shape_weights"])
160208
destination = random.choice(DESTINATIONS)
161209

162210
print(f"[{timestamp}] invocation #{count}")
@@ -167,21 +215,20 @@ def main():
167215
dests = random.sample(DESTINATIONS, k=random.randint(2, 4))
168216
ok = invoke_deep(dests)
169217
else:
170-
ok = invoke_normal(destination)
218+
ok = invoke_normal(destination, config["fault_weights"])
171219

172220
if ok:
173221
success += 1
174222

175223
print(f" Success: {success}/{count} ({100*success/count:.0f}%)\n")
176224

177-
# Sleep after first invocation (not before) so data appears immediately on startup
178-
time.sleep(CANARY_INTERVAL)
225+
time.sleep(config["canary_interval"])
179226

180227
except KeyboardInterrupt:
181228
break
182229
except Exception as e:
183230
print(f"Error: {e}")
184-
time.sleep(CANARY_INTERVAL)
231+
time.sleep(config.get("canary_interval", CANARY_INTERVAL))
185232

186233

187234
if __name__ == "__main__":
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
FROM python:3.12-slim
2+
3+
WORKDIR /app
4+
COPY main.py .
5+
6+
RUN pip install --no-cache-dir fastapi uvicorn
7+
8+
EXPOSE 8085
9+
CMD ["python", "main.py"]

0 commit comments

Comments
 (0)