-
Notifications
You must be signed in to change notification settings - Fork 25
Cross-SDK backend abstraction with copilot_sdk adapter #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
f35254a
sdk: add backend-neutral abstraction skeleton
anticomputer 222eb1d
sdk: add neutral TaskRunHooks/TaskAgentHooks in sdk.hooks
anticomputer bce76d4
sdk: add AgentBackend protocol and adapter registry
anticomputer c87f2e9
sdk: add openai-agents adapter + neutral error hierarchy
anticomputer 60b0c19
models: add optional backend field to ModelConfigDocument
anticomputer cf0a7db
pyproject: add 'copilot' optional dependency
anticomputer 40c90c3
runner: drive agent loop through the SDK abstraction
anticomputer 443ecc6
sdk: gate task config against active backend capabilities
anticomputer bce8ca1
Add copilot_sdk backend adapter
anticomputer c512f1f
Document backend selection and add parametrised cross-backend tests
anticomputer c05e062
Add live integration smoke tests for both backends
anticomputer a4a3377
Simplify SDK abstraction to what backends actually use
anticomputer fa51d86
Make copilot_sdk backend usable for real taskflows
anticomputer e2c3359
Use gpt-5-mini for the Responses API example
anticomputer c4accb1
Honour exclude_from_context on copilot_sdk
anticomputer f9d627e
Pin copilot session config and refuse empty model
anticomputer b264e84
Extract stream-driving helpers from runner
anticomputer dff2af1
Fix README capability list and re-export BackendSdk
anticomputer 974a83f
Pin httpx timeouts and clean up streaming resources
anticomputer 90deb8e
Add stream idle timeout and process watchdog
anticomputer 16c8a53
Force-exit at the CLI boundary
anticomputer 5beaa53
Fix CI lint failures and CodeQL noise
anticomputer f142fa5
Skip POSIX-only PATH-resolution test on Windows
anticomputer ac9456b
Merge branch 'main' into anticomputer/copilot-sdk-support
anticomputer 1ca3446
fix: deterministic backend selection, token resolution, logging flush
anticomputer 6eb8360
fix: address remaining PR review feedback
anticomputer 55d3c58
Merge branch 'main' into anticomputer/copilot-sdk-support
anticomputer 3f8a732
fix: remove duplicate logging.shutdown() call
anticomputer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| # SPDX-FileCopyrightText: GitHub, Inc. | ||
| # SPDX-License-Identifier: MIT | ||
|
|
||
| """Stream-driving helpers for the runner. | ||
|
|
||
| This module owns the inner loop that consumes events from a backend | ||
| adapter (`TextDelta` / `ToolEnd`), renders text deltas to the user, and | ||
| bridges Copilot-side tool events into the run-hook callbacks that the | ||
| runner uses to capture MCP results for ``repeat_prompt`` and session | ||
| checkpointing. | ||
|
|
||
| Extracted from ``runner.py`` so the rate-limit/retry loop and the | ||
| backend-event translation are independently readable and testable. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| __all__ = ["STREAM_IDLE_TIMEOUT", "bridge_copilot_tool_event", "drive_backend_stream"] | ||
|
|
||
| import asyncio | ||
| import json | ||
| import logging | ||
| from types import SimpleNamespace | ||
| from typing import Any | ||
|
|
||
| from ._watchdog import watchdog_ping | ||
| from .render_utils import render_model_output | ||
| from .sdk import TextDelta, ToolEnd | ||
| from .sdk.errors import BackendRateLimitError, BackendTimeoutError | ||
|
|
||
| # Application-level backstop: if the backend's event stream goes silent | ||
| # for this long, surface a BackendTimeoutError so the retry loop can | ||
| # recover. This complements the TCP-level httpx timeouts in the | ||
| # openai-agents adapter — those catch dead sockets, this catches the | ||
| # subtler case where the connection stays open but nothing is flowing. | ||
| STREAM_IDLE_TIMEOUT = 1800 | ||
|
|
||
|
|
||
| async def bridge_copilot_tool_event(event: ToolEnd, run_hooks: Any) -> None: | ||
| """Forward a Copilot ``ToolEnd`` into the openai-agents-style hooks. | ||
|
|
||
| The runner captures MCP tool output via ``run_hooks.on_tool_end``, | ||
| which the openai-agents path drives natively. The Copilot adapter | ||
| surfaces tool completions as ``ToolEnd`` events instead, so we | ||
| invoke the same hooks here with: | ||
|
|
||
| * a ``SimpleNamespace(name=...)`` placeholder in lieu of the | ||
| openai-agents ``Tool`` object — the hooks only read ``.name``. | ||
| * a ``json.dumps({"text": ...})`` envelope around the result text, | ||
| matching the wire format openai-agents uses when serialising MCP | ||
| ``TextContent`` lists. ``_build_prompts_to_run`` in the runner | ||
| depends on that exact envelope shape, so both backends produce | ||
| identical entries in ``last_mcp_tool_results``. | ||
| """ | ||
| if run_hooks is None: | ||
| return | ||
| fake_tool = SimpleNamespace(name=event.tool_name) | ||
| payload = json.dumps({"text": event.text}) | ||
| await run_hooks.on_tool_start(None, None, fake_tool) | ||
| await run_hooks.on_tool_end(None, None, fake_tool, payload) | ||
|
anticomputer marked this conversation as resolved.
|
||
|
|
||
|
|
||
| async def drive_backend_stream( | ||
| *, | ||
| backend_impl: Any, | ||
| agent_handle: Any, | ||
| prompt: str, | ||
| max_turns: int, | ||
| run_hooks: Any, | ||
| async_task: bool, | ||
| task_id: str, | ||
| max_api_retry: int, | ||
| initial_rate_limit_backoff: int, | ||
| max_rate_limit_backoff: int, | ||
| ) -> None: | ||
| """Run the backend's event stream to completion with retry/backoff. | ||
|
|
||
| Renders ``TextDelta`` events to stdout, forwards ``ToolEnd`` events | ||
| to the run-hook bridge, retries up to *max_api_retry* times on | ||
| :class:`BackendTimeoutError`, and applies exponential backoff up to | ||
| *max_rate_limit_backoff* seconds on :class:`BackendRateLimitError` | ||
| before giving up with a :class:`BackendTimeoutError`. | ||
| """ | ||
| max_retry = max_api_retry | ||
| rate_limit_backoff = initial_rate_limit_backoff | ||
| last_rate_limit_exc: BackendRateLimitError | None = None | ||
|
|
||
| while rate_limit_backoff: | ||
| try: | ||
| stream = backend_impl.run_streamed( | ||
| agent_handle, prompt, max_turns=max_turns | ||
| ) | ||
| stream_iter = stream.__aiter__() | ||
| try: | ||
| while True: | ||
| try: | ||
| event = await asyncio.wait_for( | ||
| stream_iter.__anext__(), timeout=STREAM_IDLE_TIMEOUT | ||
| ) | ||
| except StopAsyncIteration: | ||
| break | ||
| except asyncio.TimeoutError as exc: | ||
| raise BackendTimeoutError( | ||
| f"Backend stream idle for {STREAM_IDLE_TIMEOUT}s" | ||
| ) from exc | ||
| watchdog_ping() | ||
| if isinstance(event, TextDelta): | ||
| await render_model_output( | ||
| event.text, async_task=async_task, task_id=task_id | ||
| ) | ||
| elif isinstance(event, ToolEnd): | ||
| await bridge_copilot_tool_event(event, run_hooks) | ||
| finally: | ||
| # Close the async generator so its finally block runs even | ||
| # if we abort early (timeout / consumer break) — the | ||
| # adapters use that to release backend-native resources. | ||
| aclose = getattr(stream_iter, "aclose", None) | ||
| if aclose is not None: | ||
| try: | ||
| await aclose() | ||
| except Exception: # noqa: BLE001 - best-effort cleanup | ||
| logging.exception("Failed to aclose backend stream iterator") | ||
| await render_model_output("\n\n", async_task=async_task, task_id=task_id) | ||
| return | ||
| except BackendTimeoutError: | ||
| if not max_retry: | ||
| logging.exception("Max retries for BackendTimeoutError reached") | ||
| raise | ||
| max_retry -= 1 | ||
| except BackendRateLimitError as exc: | ||
| last_rate_limit_exc = exc | ||
| if rate_limit_backoff == max_rate_limit_backoff: | ||
| raise BackendTimeoutError("Max rate limit backoff reached") from exc | ||
| if rate_limit_backoff > max_rate_limit_backoff: | ||
| rate_limit_backoff = max_rate_limit_backoff | ||
| else: | ||
| rate_limit_backoff += rate_limit_backoff | ||
| logging.exception(f"Hit rate limit ... holding for {rate_limit_backoff}") | ||
| await asyncio.sleep(rate_limit_backoff) | ||
|
|
||
| if last_rate_limit_exc is not None: # pragma: no cover - loop always returns/raises above | ||
| raise BackendTimeoutError("Rate limit backoff exhausted") from last_rate_limit_exc | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| # SPDX-FileCopyrightText: GitHub, Inc. | ||
| # SPDX-License-Identifier: MIT | ||
|
|
||
| """Process-level watchdog that force-exits if the event loop stops progressing. | ||
|
|
||
| The asyncio retry loop, the httpx client timeouts, and the per-stream | ||
| idle timeout already cover the cases we know how to recover from. This | ||
| module is the last-resort backstop for everything else (a stuck MCP | ||
| cleanup, an asyncio loop spinning on a leaked task, a kernel-level | ||
| socket pathology) — a daemon thread polls a monotonic timestamp that | ||
| the runtime updates from every interesting event and force-exits the | ||
| process if the timestamp ever goes stale for too long. | ||
|
|
||
| Sources of pings: | ||
|
|
||
| * :func:`drive_backend_stream` — every backend event. | ||
| * The runner's ``on_tool_start`` / ``on_tool_end`` hooks. | ||
| * The runner's MCP cleanup / backend ``aclose`` paths. | ||
|
|
||
| The default timeout is intentionally larger than every recoverable | ||
| timeout below it so the watchdog never fires before the asyncio layer | ||
| has had a chance to recover. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| __all__ = ["WATCHDOG_IDLE_TIMEOUT", "start_watchdog", "watchdog_ping"] | ||
|
|
||
| import logging | ||
| import os | ||
| import sys | ||
| import threading | ||
| import time | ||
|
|
||
| # 35 minutes by default — comfortably above the per-stream idle timeout | ||
| # (30 min) and the rate-limit backoff cap (2 min) so the watchdog only | ||
| # trips on hangs the asyncio path could not recover from. | ||
| WATCHDOG_IDLE_TIMEOUT = int(os.environ.get("WATCHDOG_IDLE_TIMEOUT", "2100")) | ||
|
|
||
| _last_activity = time.monotonic() | ||
| _lock = threading.Lock() | ||
| _started = False | ||
|
|
||
|
|
||
| def watchdog_ping() -> None: | ||
| """Record activity. Safe to call from any coroutine or callback.""" | ||
| global _last_activity | ||
| with _lock: | ||
| _last_activity = time.monotonic() | ||
|
|
||
|
|
||
| def _watchdog_loop(timeout: int) -> None: | ||
| check_interval = min(60, max(1, timeout // 5)) | ||
| while True: | ||
| time.sleep(check_interval) | ||
| with _lock: | ||
| idle = time.monotonic() - _last_activity | ||
| if idle > timeout: | ||
| logging.error( | ||
| "Watchdog: no activity for %.0fs (limit %ds) — force-exiting to prevent hang", | ||
| idle, | ||
| timeout, | ||
| ) | ||
| sys.stderr.flush() | ||
| sys.stdout.flush() | ||
| os._exit(2) | ||
|
|
||
|
|
||
| def start_watchdog(timeout: int = WATCHDOG_IDLE_TIMEOUT) -> None: | ||
| """Start the watchdog thread once per process (idempotent).""" | ||
| global _started | ||
| if _started: | ||
| return | ||
| _started = True | ||
|
anticomputer marked this conversation as resolved.
Dismissed
|
||
| watchdog_ping() # reset timestamp so a late call doesn't trip immediately | ||
| threading.Thread( | ||
| target=_watchdog_loop, args=(timeout,), daemon=True, name="seclab-watchdog" | ||
| ).start() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.