This document explains what GuardLoop is, what has been implemented so far, how the current system works, and what the next development goals are.
GuardLoop is a Python library for adding production-style runtime guardrails to AI agents. Its main idea is simple: agent code can still own the reasoning loop, but GuardLoop owns enforcement around risky operations such as LLM calls and tool calls.
GuardLoop is currently published as version 0.4.1.
- GitHub repository:
awesome-pro/guardloop - PyPI package:
guardloop - Import package:
guardloop - Main public class:
GuardLoop - Compatibility aliases:
AgentRuntimeandAgentRuntimeError - Python support:
>=3.11 - Package layout:
src/guardloop - Build backend: Hatchling
- Package manager:
uv
The current portfolio story is:
- GuardLoop prevents runaway agent cost before an expensive LLM call is sent.
- GuardLoop prevents repeated calls to failing tools using circuit breakers.
- GuardLoop re-runs an agent against verifiers until the output passes, bounded by the shared budget.
- GuardLoop returns structured run results instead of leaving failures hidden.
- GuardLoop emits OpenTelemetry spans for agent runs, LLM calls, tool calls, and verifier runs.
- GuardLoop slots under an existing LangGraph graph (
guarded_graph) or OpenAI Agents SDKAgent(guarded_runner) without rewriting it. - GuardLoop is typed, tested, packaged, released on GitHub, and published on PyPI.
The four pillars from the original design — resource limits, circuit breakers, the self-healing verifier loop, and OpenTelemetry-native observability — are all implemented as of v0.3. v0.4 adds framework adapters (LangGraph in v0.4.0, the OpenAI Agents SDK in v0.4.1), each of which applies all four pillars inside a third-party agent.
AI agents usually run as loops around LLM calls and external tools. When those loops fail, they can:
- call expensive models repeatedly,
- burn through tokens and cost,
- retry broken tools again and again,
- run too long,
- fail without clear observability.
GuardLoop adds a runtime layer around the agent loop. The user still writes a
normal async Python agent function, but it receives a RunContext with protected
LLM clients and tool helpers.
The main entry point is GuardLoop.
from guardloop import GuardLoop, BudgetConfig, RunContext
runtime = GuardLoop(
budget=BudgetConfig(
cost_limit_usd="0.10",
token_limit=10_000,
time_limit_seconds=60,
tool_call_limit=20,
)
)
async def agent(ctx: RunContext, prompt: str) -> str:
response = await ctx.openai.responses.create(
model="gpt-5.2",
input=prompt,
max_output_tokens=300,
)
return str(response.output_text)
result = await runtime.run(agent, "research this topic")runtime.run() always returns a RunResult. Controlled guardrail stops are
reported as success=False with a terminated_reason, error metadata, cost,
token counts, duration, and tool-call count.
GuardLoop supports hard limits for:
- total estimated and actual cost,
- total tokens,
- wall-clock runtime,
- number of tool calls.
Before each LLM call, GuardLoop estimates input tokens, reserves the declared maximum output tokens, checks pricing, and blocks calls that would exceed the configured cap.
After each LLM response, GuardLoop reads provider usage metadata and records the actual input tokens, output tokens, and cost.
Important behavior:
- Cost math uses
Decimal, not float. - Time measurement uses
time.monotonic(). - Missing output-token limits are blocked before sending provider requests.
- Unknown model pricing is blocked unless custom pricing is supplied.
GuardLoop currently wraps direct provider SDK clients:
AsyncOpenAI.responses.createAsyncAnthropic.messages.create
The wrappers preserve the normal SDK calling style while adding budget checks, usage accounting, pricing, and tracing.
Example:
response = await ctx.openai.responses.create(
model="gpt-5.2",
input=prompt,
max_output_tokens=300,
)For Anthropic:
message = await ctx.anthropic.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=300,
messages=[{"role": "user", "content": prompt}],
)v0.2 adds per-tool circuit breakers.
Circuit breaker state lives on the GuardLoop instance, so it persists across
multiple runtime.run() calls without becoming global process state.
Supported states:
closed: tool calls are allowed.open: tool calls are rejected immediately.half_open: one or more trial calls are allowed after cooldown.
Default behavior:
- Circuit breakers are enabled by default.
- Default failure threshold is
3. - Default recovery timeout is
30seconds. - Default half-open success threshold is
1. - Per-tool overrides are supported.
Example:
from guardloop import CircuitBreakerConfig, CircuitBreakerPolicy, GuardLoop
runtime = GuardLoop(
circuit_breakers=CircuitBreakerConfig(
default=CircuitBreakerPolicy(
failure_threshold=3,
recovery_timeout_seconds=30,
),
tool_overrides={
"web_search": CircuitBreakerPolicy(
failure_threshold=2,
recovery_timeout_seconds=10,
)
},
)
)Tool calls are protected through:
await ctx.call_tool("tool_name", tool_function, *args, **kwargs)or:
protected_tool = ctx.wrap_tool("tool_name", tool_function)
await protected_tool(*args, **kwargs)Open breaker calls are rejected before the tool-call budget is incremented and before user tool code is invoked.
GuardLoop emits spans for:
- agent runs,
- LLM calls,
- tool calls.
Spans include useful attributes such as provider, model, input tokens, output tokens, estimated cost, actual cost, tool name, circuit breaker state, and termination reason.
OpenTelemetry export is optional. The core library depends only on
opentelemetry-api; exporters are available through the otel extra.
pip install "guardloop[otel]"GuardLoop has a public exception hierarchy for controlled runtime stops:
GuardLoopErrorBudgetExceededTokenLimitExceededToolCallLimitExceededTimeLimitExceededModelPricingMissingTokenLimitMissingCircuitBreakerOpenVerificationFailed(only in verifier strict mode)VerifierExecutionError(a verifier callable itself raised)
The runtime catches these controlled exceptions and converts them into
structured RunResult objects.
v0.3 adds the self-healing pillar: after an agent returns, GuardLoop runs a chain of verifiers against the output and, on rejection, feeds feedback back and retries the agent.
A verifier is any callable — sync or async — with the signature
(output, VerifierContext) -> VerifierResult | bool | None:
from guardloop import GuardLoop, RunContext, VerifierConfig, VerifierContext, VerifierResult
def no_todo(output: object, ctx: VerifierContext) -> VerifierResult:
if "TODO" in str(output):
return VerifierResult(passed=False, feedback="Replace the TODO placeholder.")
return VerifierResult(passed=True)
runtime = GuardLoop(verifiers=[no_todo], verifier_config=VerifierConfig(max_retries=2))
async def agent(ctx: RunContext, task: str) -> str:
# On a retry, ctx.retry_feedback holds the verifier's complaints, oldest first.
...Built-in rule-based verifier factories ship in guardloop:
non_empty(*, allow_whitespace=False)matches_regex(pattern, *, flags=0)is_json_object(*, required_keys=())
Behavior:
- Verifiers are configured per
GuardLoopinstance viaverifiers=[...]orruntime.add_verifier(fn); there is no persistent verifier state across runs. VerifierChainruns verifiers in order, fail-fast: the first failing verdict wins. Anything that isn't aVerifierResultis normalized (True/None→ passed,False→ failed with generated feedback).VerifierConfigcontrols the loop:max_retries(extra agent invocations after the first;0means no retry),raise_on_failure(strict mode),pass_feedback_to_agent, andenabled.- The retry loop reuses the same
RunContextandBudgetControlleracross attempts — cost, tokens, time, and tool calls accumulate, so a verifier loop cannot bypass any cap, and the run's singleasyncio.timeout()bounds the whole loop. RunResultreportsverification_passed: bool | None(Noneif no verifiers ran),verification_attempts: int, andverification_feedback: list[str].- When retries are exhausted: by default
success=False,terminated_reason="verification_failed",outputstill set to the last attempt. Withraise_on_failure=True, the runtime surfaces aVerificationFailedinstead (output=None, attempt count and feedback inmetadata). A verifier that itself raises becomes aVerifierExecutionError(terminated_reason="verifier_error") and is not retried.
No-key demo:
uv run python examples/verifier_retry_loop.pyv0.4 adds framework adapters: thin modules that produce a GuardLoop-compatible
async def agent(ctx, ...) callable wrapping a third-party agent, so all four
pillars apply inside it without rewriting it. Each adapter lives behind its own
optional extra and is intentionally not re-exported from the top-level guardloop
package, so import guardloop stays dependency-light.
LangGraph (guardloop.adapters.langgraph.guarded_graph, v0.4.0). LangGraph
nodes call LangChain chat models, which do not flow through GuardLoop's
ctx.openai / ctx.anthropic wrappers, so the adapter binds a LangChain callback
handler to the RunContext — pre-flight budget check before each LLM call, usage
recorded afterward, tool calls routed through the per-tool circuit breaker and the
tool-call budget. Cost / token / time caps, breakers, and llm_call / tool_call
OpenTelemetry spans all apply inside a LangGraph run; the verifier retry loop wraps
the whole graph run.
from langchain_core.messages import HumanMessage
from guardloop import GuardLoop, BudgetConfig
from guardloop.adapters.langgraph import guarded_graph
runtime = GuardLoop(budget=BudgetConfig(cost_limit_usd="0.10", token_limit=10_000))
agent = guarded_graph(my_compiled_graph, input_key="messages")
result = await runtime.run(agent, {"messages": [HumanMessage("...")]})guarded_graph(compiled_graph, *, input_key="messages", reserved_output_tokens=1024, feedback_to_state=None, output_from_state=None, config=None). Notes:
reserved_output_tokens is the pre-flight output reservation (LangChain chat models
often omit max_tokens); on a verifier retry the feedback is injected into a copy of
the input state (feedback_to_state to customise; never mutates the original);
tool-side enforcement is as hard as the graph's own error handling (a ToolNode with
the default handle_tool_errors=True turns a budget/breaker exception into a
ToolMessage — the breaker still records it, LLM-side caps still terminate; use
handle_tool_errors=False for hard tool-call enforcement); streaming (astream /
astream_events) is out of scope. Behind the langgraph extra
(pip install "guardloop[langgraph]"); exports guarded_graph and
GuardLoopCallbackHandler. No-key demo: uv run python examples/langgraph_guarded.py.
OpenAI Agents SDK (guardloop.adapters.openai_agents.guarded_runner, v0.4.1).
The SDK's Agent runs via Runner.run, whose model calls bypass GuardLoop's
provider wrappers too, so the adapter binds a GuardLoopRunHooks (a subclass of the
SDK's RunHooks) to the RunContext — pre-flight budget check on on_llm_start,
actual usage recorded on on_llm_end (from response.usage), tool calls routed
through before_call / record_tool_call_started / record_success on
on_tool_start / on_tool_end. Same caps, breakers, and llm_call / tool_call
spans, now inside Runner.run(...); the verifier retry loop wraps the whole run.
from agents import Agent
from guardloop import GuardLoop, BudgetConfig
from guardloop.adapters.openai_agents import guarded_runner
runtime = GuardLoop(budget=BudgetConfig(cost_limit_usd="0.10", token_limit=10_000))
agent = guarded_runner(Agent(name="researcher", model="gpt-5.2", instructions="..."))
result = await runtime.run(agent, "research agent runtime safety")guarded_runner(agent, *, reserved_output_tokens=1024, feedback_to_input=None, output_from_result=None, context=None, run_config=None, max_turns=None). Notes:
RunHooks methods are natively async (unlike LangChain's async callbacks, which run
in a detached loop that swallows exceptions — that's why the LangGraph handler had to
be synchronous), so guardrail exceptions propagate; the SDK wraps exceptions raised
from its tool lifecycle hooks in agents.exceptions.UserError, so guarded_runner
unwraps a GuardLoopError from the cause chain before re-raising; reserved_output_tokens
is the pre-flight output reservation when agent.model_settings.max_tokens is unset;
on a verifier retry the feedback is injected into a copy of the run input
(feedback_to_input to customise; output_from_result for the answer; default is
result.final_output). Known gap: the SDK has no on_tool_error hook and, by
default, turns a tool exception into an error string fed back to the model, so the
breaker records tool attempts and successes but not failures — a flaky
SDK-managed tool won't open the breaker on its own (an already-open breaker still
rejects the next call; route a tool body through ctx.call_tool(...) for full breaker
semantics). max_turns (the SDK's per-turn loop cap) and GuardLoop's pre-flight
budget compose; the SDK's own tracing is independent of GuardLoop's OpenTelemetry;
streaming (Runner.run_streamed) is out of scope for v0.4.1. Behind the
openai-agents extra (pip install "guardloop[openai-agents]"); exports
guarded_runner and GuardLoopRunHooks. No-key demo:
uv run python examples/openai_agents_guarded.py.
No-key demos:
uv run python examples/runaway_cost_prevention.py
uv run python examples/tool_circuit_breaker.py
uv run python examples/verifier_retry_loop.py
uv run python examples/langgraph_guarded.py
uv run python examples/openai_agents_guarded.pyThe runaway-cost demo proves that GuardLoop stops an agent before the next LLM request would exceed the configured budget.
The circuit-breaker demo proves that GuardLoop lets a flaky tool fail up to the configured threshold, then rejects the next attempt without invoking the tool.
The verifier-retry demo proves that GuardLoop rejects a bad answer, hands the
verifier's feedback back through ctx.retry_feedback, and accepts the corrected
answer on a later attempt.
The LangGraph and OpenAI Agents SDK demos prove that the budget / telemetry envelope
applies inside a real graph / Runner.run: the first run succeeds with cost and
tokens recorded, and the second run is stopped before the model call by a tiny token
budget.
Optional live provider demos:
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
uv run python examples/live_openai_basic.py
uv run python examples/live_anthropic_basic.pyflowchart LR
LG["LangGraph graph"] -. "guarded_graph(...)" .-> A
OA["OpenAI Agents SDK agent"] -. "guarded_runner(...)" .-> A
A["User agent function"] --> R["GuardLoop.run()"]
R --> C["RunContext"]
R --> B["BudgetController"]
R --> CB["CircuitBreakerRegistry"]
R --> V["VerifierChain"]
R --> T["Telemetry"]
C --> O["Wrapped OpenAI client"]
C --> AN["Wrapped Anthropic client"]
C --> TO["ToolRunner"]
C --> H["GuardLoopCallbackHandler (LangChain)"]
C --> RH["GuardLoopRunHooks (OpenAI Agents SDK)"]
O --> B
AN --> B
TO --> B
TO --> CB
H --> B
H --> CB
RH --> B
RH --> CB
O --> T
AN --> T
TO --> T
H --> T
RH --> T
V --> T
V -. "feedback on retry" .-> C
Important modules:
src/guardloop/runtime.py: mainGuardLoopexecution wrapper and retry loop.src/guardloop/context.py:RunContextpassed into user agents (incl.circuit_breakers).src/guardloop/budget.py: cost, token, time, and tool-call enforcement.src/guardloop/circuit_breaker.py: per-tool circuit breaker state machines.src/guardloop/verifier.py: verifier types, theVerifierChainrunner, and built-in verifiers.src/guardloop/providers/openai.py: OpenAI Responses wrapper.src/guardloop/providers/anthropic.py: Anthropic Messages wrapper.src/guardloop/tools.py: protected sync/async tool execution.src/guardloop/adapters/langgraph.py: LangGraph adapter —guarded_graphandGuardLoopCallbackHandler(behind thelanggraphextra).src/guardloop/adapters/openai_agents.py: OpenAI Agents SDK adapter —guarded_runnerandGuardLoopRunHooks(behind theopenai-agentsextra).src/guardloop/telemetry/: OpenTelemetry setup and attribute conventions.src/guardloop/models.py: Pydantic config/result models.src/guardloop/pricing.py: model pricing catalog and custom pricing support.src/guardloop/exceptions.py: public controlled exception hierarchy.
Current primary exports:
GuardLoopBudgetConfigTelemetryConfigRunContextRunResultModelPricingCircuitBreakerConfigCircuitBreakerPolicyCircuitBreakerStateCircuitBreakerSnapshotVerifierVerifierResultVerifierContextVerifierConfigVerifierChainnon_empty,matches_regex,is_json_object(built-in verifier factories)GuardLoopErrorBudgetExceededTokenLimitExceededToolCallLimitExceededTimeLimitExceededModelPricingMissingTokenLimitMissingCircuitBreakerOpenVerificationFailedVerifierExecutionError
Compatibility aliases:
AgentRuntime = GuardLoopAgentRuntimeError = GuardLoopError
Framework adapters are not part of the top-level guardloop namespace (so
import guardloop pulls only the base dependencies). Import them from their
submodule, behind the matching extra:
# pip install "guardloop[langgraph]"
from guardloop.adapters.langgraph import guarded_graph, GuardLoopCallbackHandler
# pip install "guardloop[openai-agents]"
from guardloop.adapters.openai_agents import guarded_runner, GuardLoopRunHooksCurrent test coverage includes:
- cost cap pre-flight blocking,
- token cap blocking,
- tool-call limit blocking,
- Decimal cost accounting,
- OpenAI usage accounting,
- Anthropic usage accounting,
- missing output token limit errors,
- missing model pricing errors,
- successful runtime results,
- runaway fake agent termination,
- timeout handling,
- tool exception handling,
- circuit breaker open/closed/half-open transitions,
- per-tool circuit breaker overrides,
- circuit breaker reset helpers,
- verifier passes / fails-then-passes / exhausts retries,
max_retries=0, fail-fast chains, sync and async verifiers,- bool-shorthand verdicts and generated feedback,
- verifier exceptions surface as
verifier_error, - budget shared across retry attempts (cannot be bypassed),
- run timeout bounds the whole retry loop,
ctx.retry_feedbackvisibility andverification_feedbackrecording,- strict mode, disabled verifiers, built-in verifiers,
- OpenTelemetry span attributes (LLM, tool, and
verifier_runspans), - LangGraph adapter: happy path (usage + output recorded), sync- and async-node
graphs, budget cap tripping inside the graph,
reserved_output_tokenspre-flight, tool-call limit inside aToolNodeloop, circuit breaker opening across runs, verifier loop wrapping the graph with feedback injection, caller state not mutated, customfeedback_to_state/output_from_statehooks,llm_call/tool_callspans parented underagent_run, model errors propagating with no recorded usage, and an unresolvable model name producing a clear error, - OpenAI Agents SDK adapter: happy path (usage + output recorded), budget cap
tripping inside
Runner.run,reserved_output_tokenspre-flight, tool-call limit across turns, an open breaker blocking an SDK tool call (and an SDK-managed tool failure not opening the breaker — the documented gap), verifier loop wrappingRunner.runwith feedback injection, caller input not mutated, customfeedback_to_input/output_from_resulthooks,llm_call/tool_callspans parented underagent_run, model errors propagating with no recorded usage, an unpriced model name producing aModelPricingMissing, an unresolvable model object producing a clearRuntimeError, andreserved_output_tokens<=0rejected.
CI (.github/workflows/ci.yml) runs the lint / type-check / test gates on every
push to main and every pull request, across Python 3.11, 3.12, and 3.13.
Quality gates:
uv sync --all-extras --group dev
uv run pytest
uv run pytest --cov=guardloop
uv run ruff check .
uv run ruff format --check .
uv run pyright
uv build
uvx twine check dist/guardloop-0.4.1.tar.gz dist/guardloop-0.4.1-py3-none-any.whl
unzip -l dist/guardloop-0.4.1-py3-none-any.whl | grep adapters # adapters subpackage shipped in the wheelGuardLoop is configured and published as a real Python package.
- PyPI project:
guardloop - Latest release:
v0.4.1 - Optional extras:
otel(OpenTelemetry exporters),langgraph(LangGraph adapter),openai-agents(OpenAI Agents SDK adapter) - Build artifacts: wheel and source distribution (both include
guardloop/adapters/) - Trusted Publishing: GitHub Actions to PyPI
- GitHub environment:
pypi - Changelog:
CHANGELOG.md
Workflows:
.github/workflows/ci.yml— lint / type-check / test on push and pull request (Python 3.11–3.13)..github/workflows/publish-pypi.yml— builds distributions withuv build, uploads artifacts, and publishes to PyPI usingpypa/gh-action-pypi-publishwith OIDC Trusted Publishing instead of a long-lived API token.
GuardLoop is intentionally focused. It does not yet include:
- LLM-based verifier ergonomics (budget-tracked clients on
VerifierContext), - streaming support in the framework adapters (
astream/astream_events,Runner.run_streamed), - circuit-breaker failure tracking from SDK-managed tools in the OpenAI Agents adapter (the SDK has no tool-error lifecycle hook; attempts and successes are tracked, and an already-open breaker still blocks calls),
- persistent circuit breaker state in Redis/database,
- provider-level circuit breakers,
- loop detection (repeated
tool + args), - UI dashboard,
- an OpenTelemetry metrics layer and a packaged trace-viewer stack (Jaeger / Phoenix),
- automated docs site,
- semantic versioning automation.
These are good future layers, but v0.4 already implements all four pillars plus two framework adapters (LangGraph, OpenAI Agents SDK), and is a complete portfolio-grade library foundation.
Shipped: deterministic and async verifier callables, a fail-fast VerifierChain,
built-in rule-based verifiers, a bounded retry loop that reuses the same budget
and run timeout, ctx.retry_feedback, structured RunResult.verification_*
fields, an opt-in strict mode, and verifier_run OpenTelemetry spans. The one
ergonomic gap deferred to later: budget-tracked LLM clients on VerifierContext
for LLM-based verifiers (today an LLM verifier must close over its own client).
Goal: integrate GuardLoop with common agent frameworks without changing the core runtime model or rewriting the user's agent.
- v0.4.0 — LangGraph (delivered).
guardloop.adapters.langgraph.guarded_graphreturns a GuardLoop-compatible agent; a synchronous LangChain callback handler bound to theRunContextruns the pre-flight budget check, records usage, and routes tool calls through the breaker and tool-call budget — so all four pillars apply inside a LangGraph run, and the verifier loop wraps the whole graph run. Behind thelanggraphextra; no-key demo atexamples/langgraph_guarded.py. Also addedRunContext.circuit_breakersand a CI workflow. - v0.4.1 — OpenAI Agents SDK (delivered).
guardloop.adapters.openai_agents.guarded_runnerreturns a GuardLoop-compatible agent that callsRunner.rununder the hood; aGuardLoopRunHooks(aRunHookssubclass) bound to theRunContextdoes the pre-flight budget check (on_llm_start), usage accounting (on_llm_end), and breaker + tool-call-budget routing (on_tool_start/on_tool_end). The verifier loop wraps the whole run; feedback is injected into a copy of the run input. Behind theopenai-agentsextra; no-key demo atexamples/openai_agents_guarded.py. Known gap: the SDK has no tool-error hook, so the breaker tracks tool attempts/successes but not failures from SDK-managed tools; streaming (Runner.run_streamed) is out of scope.
Portfolio value:
- shows practical ecosystem integration,
- makes GuardLoop easier to demonstrate with real agent workflows,
- proves the core design is framework-agnostic.
Goal: extend the OpenTelemetry layer past spans and make the whole trace tree inspectable end to end.
Planned capabilities:
- an OpenTelemetry metrics provider (counters/histograms for cost, tokens, tool calls, and verifier attempts, alongside the existing spans),
- per-attempt
agent_attemptspan nesting so a verifier retry loop reads as a tree rather than flat sibling LLM calls, - a one-command
docker-composestack (Jaeger + Arize Phoenix) the no-key demos export to, - an architecture write-up plus a recorded walkthrough of the
agent_run -> llm_call -> tool_call -> verifier_runtree as the proof artifact.
Portfolio value:
- demonstrates production observability beyond "we emit traces" — metrics, span hierarchy, and a working backend,
- produces one concrete, inspectable artifact (the trace tree) a reviewer can look at.
Goal: support longer-lived runtime state and reusable policy configuration.
Possible capabilities:
- pluggable state backends for circuit breakers (Redis/SQL),
- YAML/TOML policy loading and organization/team default policies,
- loop detection (repeated
tool + argswithin a run), - multi-model pricing (Gemini, Groq, Mistral) and a model pricing update workflow,
- LiteLLM integration for one wrapping path across providers.
Portfolio value:
- moves the project closer to production deployment,
- shows system design beyond in-memory library code.
Goal: define a stable public API and production readiness baseline.
Potential requirements:
- documented API stability policy,
- stronger compatibility tests,
- more provider models and pricing coverage,
- richer examples,
CHANGELOG.mdas the source of truth (started at v0.3),- auto-published docs site,
- release checklist,
- streaming-response accounting and Anthropic token-counting API.
Use this short explanation:
GuardLoop is a production runtime layer for AI agents. Instead of building another agent framework, it wraps the risky parts of an agent loop: model calls and tool calls. It enforces cost, token, time, and tool-call limits; stops runaway agent loops before sending expensive requests; opens circuit breakers around flaky tools; re-runs the agent against verifiers — feeding their feedback back in — until the output passes, all under the same shared budget; and emits OpenTelemetry traces so failures are observable. It is packaged as a typed Python library, published on PyPI, tested with fake clients, and — because the core stayed framework-agnostic — it now slots under both LangGraph and the OpenAI Agents SDK via one-file adapters, with no change to the runtime.
Good questions to be ready for:
- Why use pre-flight token and cost checks before an LLM call?
- Why use
Decimalfor cost? - Why does circuit breaker state live on
GuardLoopinstead of globally? - Why direct provider wrappers before framework adapters?
- Why are the adapters converter functions (
guarded_graph/guarded_runner) instead of subclasses or framework plugins? - Why is the LangGraph callback handler synchronous but the OpenAI Agents SDK
RunHookssubclassasync? (LangChain swallows exceptions from async callbacks; the SDK awaits hooks inline.) - How would you add Redis-backed circuit breaker state?
- How would verifier retries interact with budget limits?
- How would you keep model pricing accurate over time?
- What should happen when provider usage metadata is missing?
All four pillars plus two framework adapters (LangGraph in v0.4.0, the OpenAI Agents SDK in v0.4.1) are implemented. The best next engineering milestone is v0.5: observability depth — extending the OpenTelemetry layer past spans.
Build it narrowly:
- add an OpenTelemetry metrics provider (
Counter/Histogramfor cost, tokens, tool calls, and verifier attempts) alongside the existing spans, - land the deferred per-attempt
agent_attempt {n}span nesting so a verifier retry loop reads as a tree, not a flat list of sibling LLM calls, - add a
docker-compose.ymlrunning Jaeger + Arize Phoenix plus a script that runs the no-key demos (including both adapter demos) against it, - write the architecture write-up and capture a walkthrough of the
agent_run -> llm_call -> tool_call -> verifier_runtree as the proof artifact, - bump
Development Statusfrom3 - Alphato4 - Beta.
After v0.5: v0.6 (persistence + team policies + loop detection + multi-model
pricing + LiteLLM, with the deferred failure_error_function-wrapping option for
the OpenAI Agents adapter's breaker as a candidate) → v1.0 (stable API, docs
site, release checklist, hardened token story). The "wrapper, not framework"
thesis is already proven — the same budget / circuit breaker / verifier /
telemetry envelope works around two different agent frameworks with no change to
the core runtime — so the remaining work is about depth, not new shape.