Skip to content

Commit 047314a

Browse files
Gkrumbach07claude
andauthored
fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP (#803)
## Summary - **Vertex AI auth fixed**: map platform env vars (`ANTHROPIC_VERTEX_PROJECT_ID` → `GOOGLE_CLOUD_PROJECT`, `CLOUD_ML_REGION` → `GOOGLE_CLOUD_LOCATION`) in Gemini CLI subprocess env; validate credentials file exists at startup with a clear error - **System prompt**: write `.gemini/system.md` using `${SubAgents}`/`${AgentSkills}`/`${AvailableTools}` variable substitutions to preserve Gemini's default instructions, then append platform context (workspace paths, repos, git, workflow, rubric/correction command hints) - **Custom commands**: `/ambient:evaluate-rubric` and `/ambient:log-correction` written to `.gemini/commands/ambient/` at session setup - **Feedback MCP server**: minimal stdlib stdio MCP server (`feedback_server.py`) exposing `evaluate_rubric` and `log_correction` tools backed by existing Langfuse logging; registered in `.gemini/settings.json` with Langfuse credentials injected via `env` block (bypasses the Gemini CLI subprocess env blocklist) - **`add_dirs` bug fixed**: `resolve_workspace_paths` return value was silently dropped; now seeded into `include_directories` ## Platform refactoring Moved shared code out of individual bridges into the base layer: - `platform/auth.py`: `validate_vertex_credentials_file()` — shared Vertex credential validation used by both Claude and Gemini auth modules - `bridge.py` (base class): `set_context()`, `_ensure_ready()`, `_setup_platform()`, `_refresh_credentials_if_stale()` — identical in both bridges, now inherited; bridges only contain framework-specific code - `bridge.py`: `_async_safe_manager_shutdown()` — the async-safe fire-and-forget manager shutdown pattern shared by both `mark_dirty()` implementations ## Test plan - [ ] Create a Gemini session with Vertex AI enabled — should connect and respond - [ ] Verify `/ambient:evaluate-rubric` and `/ambient:log-correction` commands appear in Gemini CLI - [ ] Run `/ambient:log-correction` — should log to Langfuse without "credentials missing" error - [ ] Verify Claude sessions still work (no regression from base class refactor) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1 parent 33e8bc4 commit 047314a

14 files changed

Lines changed: 881 additions & 120 deletions

File tree

.github/workflows/components-build-deploy.yml

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -252,13 +252,13 @@ jobs:
252252
if [ "${{ needs.detect-changes.outputs.claude-runner }}" == "true" ]; then
253253
echo "runner_tag=${{ github.sha }}" >> $GITHUB_OUTPUT
254254
else
255-
echo "runner_tag=stage" >> $GITHUB_OUTPUT
255+
echo "runner_tag=latest" >> $GITHUB_OUTPUT
256256
fi
257257
258258
if [ "${{ needs.detect-changes.outputs.state-sync }}" == "true" ]; then
259259
echo "state_sync_tag=${{ github.sha }}" >> $GITHUB_OUTPUT
260260
else
261-
echo "state_sync_tag=stage" >> $GITHUB_OUTPUT
261+
echo "state_sync_tag=latest" >> $GITHUB_OUTPUT
262262
fi
263263
264264
if [ "${{ needs.detect-changes.outputs.public-api }}" == "true" ]; then
@@ -303,6 +303,27 @@ jobs:
303303
AMBIENT_CODE_RUNNER_IMAGE="quay.io/ambient_code/vteam_claude_runner:${{ steps.image-tags.outputs.runner_tag }}" \
304304
STATE_SYNC_IMAGE="quay.io/ambient_code/vteam_state_sync:${{ steps.image-tags.outputs.state_sync_tag }}"
305305
306+
- name: Update agent registry ConfigMap with pinned image tags
307+
if: needs.detect-changes.outputs.claude-runner == 'true' || needs.detect-changes.outputs.state-sync == 'true'
308+
run: |
309+
# Fetch live JSON from cluster so unchanged images keep their current tag
310+
REGISTRY=$(oc get configmap ambient-agent-registry -n ambient-code \
311+
-o jsonpath='{.data.agent-registry\.json}')
312+
313+
# Only replace images that were actually rebuilt this run.
314+
# Pattern [@:][^"]* matches both :tag and @sha256:digest refs.
315+
if [ "${{ needs.detect-changes.outputs.claude-runner }}" == "true" ]; then
316+
REGISTRY=$(echo "$REGISTRY" | sed \
317+
"s|quay.io/ambient_code/vteam_claude_runner[@:][^\"]*|quay.io/ambient_code/vteam_claude_runner:${{ github.sha }}|g")
318+
fi
319+
if [ "${{ needs.detect-changes.outputs.state-sync }}" == "true" ]; then
320+
REGISTRY=$(echo "$REGISTRY" | sed \
321+
"s|quay.io/ambient_code/vteam_state_sync[@:][^\"]*|quay.io/ambient_code/vteam_state_sync:${{ github.sha }}|g")
322+
fi
323+
324+
oc patch configmap ambient-agent-registry -n ambient-code --type=merge \
325+
-p "{\"data\":{\"agent-registry.json\":$(echo "$REGISTRY" | jq -Rs .)}}"
326+
306327
deploy-with-disptach:
307328
runs-on: ubuntu-latest
308329
needs: [detect-changes, build-and-push, update-rbac-and-crd]
@@ -357,5 +378,15 @@ jobs:
357378
- name: Update operator environment variables
358379
run: |
359380
oc set env deployment/agentic-operator -n ambient-code -c agentic-operator \
360-
AMBIENT_CODE_RUNNER_IMAGE="quay.io/ambient_code/vteam_claude_runner:stage" \
361-
STATE_SYNC_IMAGE="quay.io/ambient_code/vteam_state_sync:stage"
381+
AMBIENT_CODE_RUNNER_IMAGE="quay.io/ambient_code/vteam_claude_runner:${{ github.sha }}" \
382+
STATE_SYNC_IMAGE="quay.io/ambient_code/vteam_state_sync:${{ github.sha }}"
383+
384+
- name: Update agent registry ConfigMap with pinned image tags
385+
run: |
386+
REGISTRY=$(oc get configmap ambient-agent-registry -n ambient-code \
387+
-o jsonpath='{.data.agent-registry\.json}')
388+
REGISTRY=$(echo "$REGISTRY" | sed \
389+
-e "s|quay.io/ambient_code/vteam_claude_runner[@:][^\"]*|quay.io/ambient_code/vteam_claude_runner:${{ github.sha }}|g" \
390+
-e "s|quay.io/ambient_code/vteam_state_sync[@:][^\"]*|quay.io/ambient_code/vteam_state_sync:${{ github.sha }}|g")
391+
oc patch configmap ambient-agent-registry -n ambient-code --type=merge \
392+
-p "{\"data\":{\"agent-registry.json\":$(echo "$REGISTRY" | jq -Rs .)}}"

.github/workflows/prod-release-deploy.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ on:
2222
required: false
2323
type: string
2424
default: ''
25+
concurrency:
26+
group: prod-release-deploy
27+
cancel-in-progress: false
28+
2529
jobs:
2630
release:
2731
runs-on: ubuntu-latest
@@ -273,3 +277,14 @@ jobs:
273277
oc set env deployment/agentic-operator -n ambient-code -c agentic-operator \
274278
AMBIENT_CODE_RUNNER_IMAGE="quay.io/ambient_code/vteam_claude_runner:${{ needs.release.outputs.new_tag }}" \
275279
STATE_SYNC_IMAGE="quay.io/ambient_code/vteam_state_sync:${{ needs.release.outputs.new_tag }}"
280+
281+
- name: Update agent registry ConfigMap with release image tags
282+
run: |
283+
RELEASE_TAG="${{ needs.release.outputs.new_tag }}"
284+
REGISTRY=$(oc get configmap ambient-agent-registry -n ambient-code \
285+
-o jsonpath='{.data.agent-registry\.json}')
286+
REGISTRY=$(echo "$REGISTRY" | sed \
287+
-e "s|quay.io/ambient_code/vteam_claude_runner[@:][^\"]*|quay.io/ambient_code/vteam_claude_runner:${RELEASE_TAG}|g" \
288+
-e "s|quay.io/ambient_code/vteam_state_sync[@:][^\"]*|quay.io/ambient_code/vteam_state_sync:${RELEASE_TAG}|g")
289+
oc patch configmap ambient-agent-registry -n ambient-code --type=merge \
290+
-p "{\"data\":{\"agent-registry.json\":$(echo "$REGISTRY" | jq -Rs .)}}"

components/runners/ambient-runner/ambient_runner/bridge.py

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,10 @@ async def interrupt(self, thread_id=None) -> None:
2323
pass
2424
"""
2525

26+
import asyncio
2627
import logging
2728
import os
29+
import time
2830
from abc import ABC, abstractmethod
2931
from dataclasses import dataclass, field
3032
from typing import Any, AsyncIterator, Optional
@@ -43,6 +45,30 @@ async def interrupt(self, thread_id=None) -> None:
4345
TOOL_REFRESH_MIN_INTERVAL_SEC = 30
4446

4547

48+
def _async_safe_manager_shutdown(manager: Any) -> None:
49+
"""Fire-and-forget async shutdown of a session manager from sync context.
50+
51+
Used by ``mark_dirty()`` implementations in all bridges. Handles both
52+
cases: called from within a running event loop (schedules as a task)
53+
and called outside any loop (blocks via ``asyncio.run``).
54+
"""
55+
try:
56+
loop = asyncio.get_running_loop()
57+
task = loop.create_task(manager.shutdown())
58+
task.add_done_callback(
59+
lambda f: _bridge_logger.warning(
60+
"mark_dirty: session_manager shutdown error: %s", f.exception()
61+
)
62+
if f.exception()
63+
else None
64+
)
65+
except RuntimeError:
66+
try:
67+
asyncio.run(manager.shutdown())
68+
except Exception as exc:
69+
_bridge_logger.warning("mark_dirty: session_manager shutdown error: %s", exc)
70+
71+
4672
@dataclass
4773
class FrameworkCapabilities:
4874
"""Declares what a framework adapter supports.
@@ -77,6 +103,11 @@ class PlatformBridge(ABC):
77103
- ``get_error_context()`` — returns extra error info for failed runs
78104
"""
79105

106+
def __init__(self) -> None:
107+
self._context: Optional[RunnerContext] = None
108+
self._ready: bool = False
109+
self._last_creds_refresh: float = 0.0
110+
80111
# ------------------------------------------------------------------
81112
# Required (abstract)
82113
# ------------------------------------------------------------------
@@ -116,11 +147,39 @@ async def interrupt(self, thread_id: Optional[str] = None) -> None:
116147
# ------------------------------------------------------------------
117148

118149
def set_context(self, context: RunnerContext) -> None:
119-
"""Store the runner context for later use.
150+
"""Store the runner context (called from lifespan before any requests)."""
151+
self._context = context
120152

121-
Called by the platform lifespan before any requests are served.
122-
Override to capture the context for use in ``run()``.
153+
async def _refresh_credentials_if_stale(self) -> None:
154+
"""Refresh platform credentials if the refresh interval has elapsed.
155+
156+
Call this at the start of each ``run()`` to keep tokens fresh.
123157
"""
158+
now = time.monotonic()
159+
if now - self._last_creds_refresh > CREDS_REFRESH_INTERVAL_SEC:
160+
from ambient_runner.platform.auth import populate_runtime_credentials
161+
162+
await populate_runtime_credentials(self._context)
163+
self._last_creds_refresh = now
164+
165+
async def _ensure_ready(self) -> None:
166+
"""Run one-time platform setup on the first ``run()`` call.
167+
168+
Calls ``_setup_platform()`` the first time, then sets ``self._ready``.
169+
"""
170+
if self._ready:
171+
return
172+
if not self._context:
173+
raise RuntimeError("Context not set — call set_context() first")
174+
await self._setup_platform()
175+
self._ready = True
176+
_bridge_logger.info(
177+
"Platform ready — model: %s",
178+
getattr(self, "_configured_model", ""),
179+
)
180+
181+
async def _setup_platform(self) -> None:
182+
"""Framework-specific platform setup. Override in each bridge."""
124183
pass
125184

126185
async def shutdown(self) -> None:

components/runners/ambient-runner/ambient_runner/bridges/claude/auth.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99

1010
import logging
1111
import os
12-
from pathlib import Path
1312

13+
from ambient_runner.platform.auth import validate_vertex_credentials_file
1414
from ambient_runner.platform.context import RunnerContext
1515
from ambient_runner.platform.utils import is_vertex_enabled
1616

@@ -38,26 +38,17 @@ def map_to_vertex_model(model: str) -> str:
3838

3939
async def setup_vertex_credentials(context: RunnerContext) -> dict:
4040
"""Set up Google Cloud Vertex AI credentials from service account."""
41-
service_account_path = context.get_env("GOOGLE_APPLICATION_CREDENTIALS", "").strip()
41+
service_account_path = validate_vertex_credentials_file(context)
4242
project_id = context.get_env("ANTHROPIC_VERTEX_PROJECT_ID", "").strip()
4343
region = context.get_env("CLOUD_ML_REGION", "").strip()
4444

45-
if not service_account_path:
46-
raise RuntimeError(
47-
"GOOGLE_APPLICATION_CREDENTIALS must be set when USE_VERTEX is enabled"
48-
)
4945
if not project_id:
5046
raise RuntimeError(
5147
"ANTHROPIC_VERTEX_PROJECT_ID must be set when USE_VERTEX is enabled"
5248
)
5349
if not region:
5450
raise RuntimeError("CLOUD_ML_REGION must be set when USE_VERTEX is enabled")
5551

56-
if not Path(service_account_path).exists():
57-
raise RuntimeError(
58-
f"Service account key file not found at {service_account_path}"
59-
)
60-
6152
logger.info(f"Vertex AI configured: project={project_id}, region={region}")
6253
return {
6354
"credentials_path": service_account_path,

components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py

Lines changed: 4 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
- Interrupt and graceful shutdown
1010
"""
1111

12-
import asyncio
1312
import logging
1413
import os
1514
import time
@@ -19,9 +18,9 @@
1918
from ag_ui_claude_sdk import ClaudeAgentAdapter
2019

2120
from ambient_runner.bridge import (
22-
CREDS_REFRESH_INTERVAL_SEC,
2321
FrameworkCapabilities,
2422
PlatformBridge,
23+
_async_safe_manager_shutdown,
2524
setup_bridge_observability,
2625
)
2726
from ambient_runner.bridges.claude.session import SessionManager
@@ -43,13 +42,12 @@ class ClaudeBridge(PlatformBridge):
4342
"""
4443

4544
def __init__(self) -> None:
45+
super().__init__()
4646
self._adapter: ClaudeAgentAdapter | None = None
4747
self._session_manager: SessionManager | None = None
4848
self._obs: Any = None
49-
self._context: RunnerContext | None = None
5049

5150
# Platform state (populated by _setup_platform)
52-
self._ready: bool = False
5351
self._first_run: bool = True
5452
self._configured_model: str = ""
5553
self._cwd_path: str = ""
@@ -58,7 +56,6 @@ def __init__(self) -> None:
5856
self._allowed_tools: list[str] = []
5957
self._system_prompt: dict = {}
6058
self._stderr_lines: list[str] = []
61-
self._last_creds_refresh: float = 0.0
6259

6360
# ------------------------------------------------------------------
6461
# PlatformBridge interface
@@ -89,14 +86,7 @@ async def run(self, input_data: RunAgentInput) -> AsyncIterator[BaseEvent]:
8986
"""Full run lifecycle: lazy setup → adapter → session worker → tracing."""
9087
# 1. Lazy platform setup
9188
await self._ensure_ready()
92-
93-
# Refresh credentials if stale (tokens may have expired)
94-
now = time.monotonic()
95-
if now - self._last_creds_refresh > CREDS_REFRESH_INTERVAL_SEC:
96-
from ambient_runner.platform.auth import populate_runtime_credentials
97-
98-
await populate_runtime_credentials(self._context)
99-
self._last_creds_refresh = now
89+
await self._refresh_credentials_if_stale()
10090

10191
# 2. Ensure adapter exists
10292
self._ensure_adapter()
@@ -153,10 +143,6 @@ async def interrupt(self, thread_id: Optional[str] = None) -> None:
153143
# Lifecycle methods
154144
# ------------------------------------------------------------------
155145

156-
def set_context(self, context: RunnerContext) -> None:
157-
"""Store the runner context (called from lifespan)."""
158-
self._context = context
159-
160146
async def shutdown(self) -> None:
161147
"""Graceful shutdown: persist sessions, finalise tracing."""
162148
if self._session_manager:
@@ -179,22 +165,7 @@ def mark_dirty(self) -> None:
179165
if self._session_manager:
180166
manager = self._session_manager
181167
self._session_manager = None
182-
try:
183-
asyncio.get_running_loop() # raises RuntimeError if no loop
184-
future = asyncio.ensure_future(manager.shutdown())
185-
future.add_done_callback(
186-
lambda f: logger.warning(
187-
"mark_dirty: session_manager shutdown error: %s", f.exception()
188-
)
189-
if f.exception()
190-
else None
191-
)
192-
except RuntimeError:
193-
# No running loop — safe to block
194-
try:
195-
asyncio.run(manager.shutdown())
196-
except Exception as e:
197-
logger.warning("mark_dirty: session_manager shutdown error: %s", e)
168+
_async_safe_manager_shutdown(manager)
198169
logger.info("ClaudeBridge: marked dirty — will reinitialise on next run")
199170

200171
def get_error_context(self) -> str:
@@ -300,18 +271,6 @@ def session_manager(self) -> SessionManager | None:
300271
# Private: platform setup (lazy, called on first run)
301272
# ------------------------------------------------------------------
302273

303-
async def _ensure_ready(self) -> None:
304-
"""Run one-time platform setup if not already done."""
305-
if self._ready:
306-
return
307-
if not self._context:
308-
raise RuntimeError("Context not set — call set_context() first")
309-
await self._setup_platform()
310-
self._ready = True
311-
logger.info(
312-
f"Platform ready — model: {self._configured_model}, cwd: {self._cwd_path}"
313-
)
314-
315274
async def _setup_platform(self) -> None:
316275
"""Full platform setup: auth, workspace, MCP, observability."""
317276
# Session manager

components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/auth.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import logging
44

5+
from ambient_runner.platform.auth import validate_vertex_credentials_file
56
from ambient_runner.platform.context import RunnerContext
67
from ambient_runner.platform.utils import is_vertex_enabled
78

@@ -26,8 +27,19 @@ async def setup_gemini_cli_auth(context: RunnerContext) -> tuple[str, str, bool]
2627
use_vertex = is_vertex_enabled(legacy_var="GEMINI_USE_VERTEX", context=context)
2728

2829
if use_vertex:
29-
project = context.get_env("GOOGLE_CLOUD_PROJECT", "").strip()
30-
location = context.get_env("GOOGLE_CLOUD_LOCATION", "").strip()
30+
validate_vertex_credentials_file(context)
31+
32+
# Resolve project/location — may be set directly or via platform aliases
33+
# (ANTHROPIC_VERTEX_PROJECT_ID / CLOUD_ML_REGION). The subprocess mapping
34+
# in session.py handles the latter; here we just log what we found.
35+
project = (
36+
context.get_env("GOOGLE_CLOUD_PROJECT", "").strip()
37+
or context.get_env("ANTHROPIC_VERTEX_PROJECT_ID", "").strip()
38+
)
39+
location = (
40+
context.get_env("GOOGLE_CLOUD_LOCATION", "").strip()
41+
or context.get_env("CLOUD_ML_REGION", "").strip()
42+
)
3143

3244
logger.info(
3345
"Gemini CLI: Vertex AI mode (project=%s, location=%s, model=%s)",

0 commit comments

Comments
 (0)