feat(utils): add token budget enforcement for all LangGraph agents#332
feat(utils): add token budget enforcement for all LangGraph agents#332adickinson72 wants to merge 7 commits into
Conversation
📊 Test Coverage ReportMain Tests Coverage
RAG Tests Coverage
📁 Coverage Artifacts
|
|
Thank you for your contribution! This PR has been automatically marked as |
7ec005f to
fdea07f
Compare
Integration Analysis:
|
| Component | Location | Size |
|---|---|---|
BaseLangGraphAgent |
utils/a2a_common/base_langgraph_agent.py |
~112 KB |
BaseLangGraphAgentExecutor |
utils/a2a_common/base_langgraph_agent_executor.py |
~20 KB |
The PetStoreAgent in the template extends BaseLangGraphAgent using an abstract-method override pattern:
class PetStoreAgent(BaseLangGraphAgent):
def get_agent_name(self) -> str: ...
def get_system_instruction(self) -> str: ...
def get_mcp_config(self, server_path: str) -> dict: ...
def get_mcp_http_config(self) -> dict | None: ...
def get_tool_working_message(self) -> str: ...
def get_tool_processing_message(self) -> str: ...mas_agent_base proposes a constructor-injection pattern instead:
class PetstoreAgent(BaseAgent):
def __init__(self):
super().__init__(
agent_name="petstore",
system_instruction=SYSTEM_PROMPT,
mcp_config=MCPConfig(server_name="petstore", server_path="...", required_env_vars=["PETSTORE_API_KEY"]),
)Feature Gap Analysis
Features present in BaseLangGraphAgent that are absent from mas_agent_base.BaseAgent:
| Feature | BaseLangGraphAgent |
mas_agent_base.BaseAgent |
|---|---|---|
| Persistent checkpointing (Redis / MongoDB / PostgreSQL) | ✅ via get_checkpointer() |
❌ InMemorySaver only |
| LangMem message summarization | ✅ auto-compression | ❌ not present |
| Context token management | ✅ provider-aware limits | ❌ not present |
Custom CA bundle / TLS (CUSTOM_CA_BUNDLE, SSL_VERIFY) |
✅ _build_httpx_client_factory() |
❌ not present |
| Tool output chunking (>50 KB → temp file) | ✅ _chunk_large_output() |
❌ not present |
| Tool output truncation with refine-query guidance | ✅ _truncate_tool_output() |
❌ not present |
| ExceptionGroup recovery for Go MCP servers | ✅ handled | ❌ not present |
| Multi-server MCP config (pass full dict) | ✅ supported | ❌ single-server MCPConfig only |
get_additional_tools() hook for non-MCP tools |
✅ supported | ❌ not present |
| Date injection in system prompt | ✅ auto-prepended | ❌ not present |
| Streaming artifact accumulation (partial chunks) | ✅ in executor | ❌ executor emits single-pass |
Features in mas_agent_base that are absent or weaker in BaseLangGraphAgent:
| Feature | BaseLangGraphAgent |
mas_agent_base.BaseAgent |
|---|---|---|
| Per-request tool-call budget (hard cap) | ❌ no per-call limit | ✅ TokenBudgetManager + BudgetAwareTool |
Bedrock prompt caching (create_cache_point) |
❌ not present | ✅ built-in |
| Lazy initialization / context-manager lifecycle | ❌ eagerly initializes | ✅ initialize() / async with |
Clean dataclass MCP config (MCPConfig) |
❌ abstract methods | ✅ structured, validateable |
Integration Options
Option A — Full Migration: Template Adopts mas_agent_base
Replace BaseLangGraphAgent / BaseLangGraphAgentExecutor with BaseAgent / BaseAgentExecutor from this PR as the template's base.
What changes in the template:
# Before (agent.py)
from ai_platform_engineering.utils.a2a_common.base_langgraph_agent import BaseLangGraphAgent
class PetStoreAgent(BaseLangGraphAgent):
def get_mcp_config(self, server_path: str) -> dict:
return {"command": "uv", "args": [...], "env": {...}, "transport": "stdio"}
...
# After (agent.py)
from ai_platform_engineering.agents.mas_agent_base import BaseAgent, MCPConfig
class PetStoreAgent(BaseAgent):
def __init__(self):
super().__init__(
agent_name="petstore",
system_instruction=SYSTEM_PROMPT,
mcp_config=MCPConfig(
server_name="petstore",
server_path="path/to/mcp_petstore/__main__.py",
required_env_vars=["PETSTORE_API_KEY"],
transport="stdio",
),
)# Before (agent_executor.py)
from ai_platform_engineering.utils.a2a_common.base_langgraph_agent_executor import BaseLangGraphAgentExecutor
class PetStoreAgentExecutor(BaseLangGraphAgentExecutor):
def __init__(self):
super().__init__(PetStoreAgent())
# After (agent_executor.py) — nearly identical
from ai_platform_engineering.agents.mas_agent_base import BaseAgentExecutor
class PetStoreAgentExecutor(BaseAgentExecutor):
def __init__(self):
super().__init__(PetStoreAgent())Trade-offs:
| ✅ Cleaner, more readable agent definitions | |
✅ MCPConfig is self-documenting and validateable |
|
| ✅ Bedrock prompt caching for free | |
| ✅ Hard per-request tool-call budget | |
| ❌ Loses persistent checkpointing — sessions reset on pod restart | |
| ❌ Loses LangMem summarization — long conversations will hit context limits | |
| ❌ Loses TLS / CA bundle support — breaks enterprise proxies | |
| ❌ Loses tool output chunking — large API responses risk context overflow | |
❌ MCPConfig only handles single-server stdio or single HTTP endpoint |
Verdict: Not recommended as a direct drop-in today. The feature regressions are too significant for production agents. Better suited as the base for new, simple agents that do not need persistent memory or complex MCP setups.
Option B — Selective Backport: Cherry-Pick Features Into Existing Base
Keep BaseLangGraphAgent as the production base, but pull specific innovations from mas_agent_base into it.
B1 — Adopt MCPConfig dataclass (replaces abstract get_mcp_config / get_mcp_http_config):
# utils/mcp_config.py — extend with MCPConfig dataclass
@dataclass
class MCPConfig:
server_name: str
required_env_vars: list[str]
transport: str = "stdio"
server_path: str | None = None
http_url: str | None = None
http_headers: dict[str, str] | None = None
def validate_env_vars(self) -> None: ...
def get_client_config(self) -> dict: ...
# BaseLangGraphAgent — add optional hook
def get_mcp_config_object(self) -> MCPConfig | None:
return None # subclasses opt in; existing abstract methods still workThis is backward-compatible — existing agents keep abstract methods, new agents use MCPConfig.
B2 — Adopt TokenBudgetManager + BudgetAwareTool (adds hard tool-call cap):
# In BaseLangGraphAgent._setup_mcp_and_graph():
if self.enable_token_budget:
tools = self._wrap_tools_with_budget(tools)This is additive — controlled by AGENT_MAX_TOOL_CALLS env var, disabled by default.
B3 — Adopt prompt caching (Bedrock create_cache_point):
# In BaseLangGraphAgent.__init__():
if enable_prompt_caching and hasattr(self.model, "create_cache_point"):
self.cache_point = self.model.create_cache_point()B4 — Adopt lazy initialization pattern (decouple MCP startup from object construction):
Currently BaseLangGraphAgent initializes synchronously at import time for the graph, but the actual MCP tool loading is deferred. Aligning with BaseAgent's explicit initialize() + async with pattern would improve testability and allow agents to start without live MCP servers available.
Trade-offs:
| ✅ No regressions — all existing features preserved | |
| ✅ Each change is independently mergeable | |
| ✅ Aligns interfaces without rewriting ~150 KB of base class | |
| ❌ Does not achieve full consolidation — two base classes still exist | |
| ❌ More incremental, slower to clean up the overall architecture |
Verdict: Recommended near-term path. B1 + B2 are the highest value changes and can be done in 2–3 focused PRs.
Option C — Parallel Base Classes: mas_agent_base as Lightweight Tier
Ship mas_agent_base alongside BaseLangGraphAgent as an explicitly lighter tier for agents that do not need persistent memory or advanced context management.
The template agent ships two variants:
agents/template/
agent_petstore/ ← current (uses BaseLangGraphAgent, full-featured)
agent_petstore_simple/ ← new (uses BaseAgent from mas_agent_base, lighter)
Document the trade-offs clearly so agent authors choose the right tier. mas_agent_base becomes the entry point for new agents; BaseLangGraphAgent is the upgrade path when you need persistence or context management.
Trade-offs:
| ✅ Both tiers coexist — no migrations required | |
| ✅ Simpler onboarding for new agent authors using the lightweight path | |
✅ mas_agent_base is immediately useful without full parity |
|
| ❌ Two diverging base classes increases maintenance surface | |
| ❌ Feature gaps between tiers create confusion about when to upgrade | |
| ❌ Requires clear docs + governance to prevent fragmentation |
Verdict: Viable if paired with a clear migration guide. Requires agreement on which features will eventually be unified and which will remain tier-specific.
Recommended Path
Short term (this PR + 2 follow-ups):
- Merge
mas_agent_baseas-is, scoped toagents/mas_agent_base/(Option C foundation). - Add
MCPConfigdataclass toutils/mcp_config.pyand makeBaseLangGraphAgentaccept it as an alternative to abstract methods (Option B1). - Port
TokenBudgetManager+BudgetAwareToolintoBaseLangGraphAgentas opt-in (Option B2).
Medium term:
- Port Bedrock prompt caching into
BaseLangGraphAgent(Option B3). - Update the template agent to demonstrate
mas_agent_baseas the lightweight path (Option C). - Add persistent checkpointing and TLS support to
mas_agent_base.BaseAgentto close the gap withBaseLangGraphAgent.
Long term:
- When
mas_agent_basereaches feature parity, deprecateBaseLangGraphAgentin favor of the cleaner constructor-injection pattern.
Open Questions
- Persistent memory: Is
InMemorySaver-only acceptable for agents usingmas_agent_base? Or shouldBaseAgent.__init__accept an optionalcheckpointerparameter? - Multi-server MCP:
MCPConfigcurrently models a single server. Several production agents (e.g. GitHub) use multi-server configs. ShouldMCPConfigsupport a list of servers, or shouldBaseAgentacceptlist[MCPConfig]? - Streaming executor:
BaseAgentExecutor._handle_agent_eventemits one artifact per completion event, whileBaseLangGraphAgentExecutoraccumulates chunks and streams progressively. Which behavior do downstream consumers (UI, supervisor) require? - Module location: Should
mas_agent_baselive underagents/(current PR) or underutils/a2a_common/where the other base classes live?
fdea07f to
b86ca67
Compare
Add MAS Agent Core shared base classes and utilities for building A2A agents, including BaseAgent, BaseAgentExecutor, TokenBudgetManager, and MCP config. Signed-off-by: Adam Dickinson <adickinson@demandbase.com>
…conventions Rename mas_agent_base to mas-agent-base folder, add pyproject.toml, Makefile, uv.lock, and Docker compose entries to match existing agent conventions. Register in .github/agents.json for CI builds. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Assisted-by: Claude:claude-opus-4-6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b86ca67 to
1ebd4cb
Compare
|
✅ No proprietary content detected. This PR is clear for review! |
1 similar comment
|
✅ No proprietary content detected. This PR is clear for review! |
Remove utils/ (logging, retry, temperature), mcp_config.py, and response_format.py — all duplicate existing repo infrastructure. Simplify base_agent.py (drop prompt caching, use plain dict for MCP config), token_budget.py (drop tiktoken, use char heuristic), and budget_aware_tool.py. 11 files / ~1640 lines -> 5 files / 540 lines. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Assisted-by: Claude:claude-opus-4-6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ No proprietary content detected. This PR is clear for review! |
….json This is a base library, not a production agent. Keep only the dev compose entry under its own profile (not all-agents). Remove from agents.json until ready for CI builds. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Assisted-by: Claude:claude-opus-4-6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ No proprietary content detected. This PR is clear for review! |
Move token_budget.py and budget_aware_tool.py to ai_platform_engineering/utils/ since these are the only novel additions — the rest (base_agent, base_executor) duplicated existing BaseLangGraphAgent and BaseLangGraphAgentExecutor. Remove the mas-agent-base agent directory, its docker-compose entry, pyproject.toml, and uv.lock entirely. Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Assisted-by: Claude:claude-opus-4-6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ No proprietary content detected. This PR is clear for review! |
Integrate TokenBudgetManager into the shared base agent so all LangGraph agents get optional token budget enforcement. Controlled by ENABLE_TOKEN_BUDGET=true (off by default, zero behavior change). - Initialize TokenBudgetManager in __init__ when env var is set - Inject budget checks (pre-call limit check + post-call tracking) into all three tool wrapper paths: safe_coroutine, safe_run, safe_arun - Reset budget at start of each stream() query - All budget operations are wrapped in try/except to never crash tool execution — budget failures log warnings and proceed Env vars: ENABLE_TOKEN_BUDGET=true — opt in AGENT_MAX_TOKENS=20000 — token ceiling (default) AGENT_MAX_TOOL_CALLS=8 — tool call ceiling (default) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ No proprietary content detected. This PR is clear for review! |
Use self.__class__.__name__ for TokenBudgetManager initialization instead of self.get_agent_name() to avoid dynamic dispatch during base class __init__ before subclass initialization completes. Addresses code quality review comment on PR #332. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ No proprietary content detected. This PR is clear for review! |
|
@subbaksh @suwhang-cisco - Need your eyes on this on. I repurposed the original PR |
Address 4 findings from github-code-quality[bot] on PR #1527: audit_logger.py:69 - unused global "_indexes_ensured" auth/jwt_context.py:148 - unused global "_discovery_cache" auth/jwt_context.py:149 - unused global "_discovery_cache_expiry" a2a_common/base_langgraph_agent.py:1979 - empty except clause Root cause analysis: Findings #1-#3 are CodeQL false positives on a textbook Python lazy-init pattern (read at the top of a function, write at the bottom, idempotent guard on a module-level scalar). The CodeQL "py/unused-global-variable" query is intra-procedural: it sees the write at the bottom of the function does not flow-reach the read at the top of the same function, and the cross-invocation read on a subsequent call is invisible to the analyser. The same module already uses mutable containers like "_userinfo_cache" for the same lazy-init pattern and those are NOT flagged, which is the canonical workaround. Finding #4 is a genuine code-smell: silently swallowing exceptions during audit-callback registration hides operator-visible problems (missing audit_callback module, MongoDB driver init failures, etc.). Fixes: ai_platform_engineering/utils/audit_logger.py - Replace the scalar "_indexes_ensured" global with a mutable container "_audit_state = {indexes_ensured: False}". Behaviour is identical (idempotent lazy-init guarded by a threading.Lock). - Drop the now-unnecessary "global _indexes_ensured" declaration. tests/test_audit_logger.py - Update the autouse fixture to reset audit_logger._audit_state["indexes_ensured"] rather than the old scalar. This is the only external reference to the renamed symbol. ai_platform_engineering/utils/auth/jwt_context.py - Replace scalar "_discovery_cache" + "_discovery_cache_expiry" globals with a mutable container "_discovery_state = {doc: None, expiry: 0.0}". Same idempotent lazy-fetch pattern as before. - Drop the "global _discovery_cache, _discovery_cache_expiry" declaration. The "_DISCOVERY_CACHE_TTL_SECONDS" constant is unchanged. ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py - Replace the empty "except Exception: pass" with "except Exception as exc: logger.warning(..., exc_info=exc)" so audit-callback registration failures surface in service logs. Behaviour is preserved: agent execution still continues without the audit sink, but the failure is no longer silent. - Add an explanatory comment so future readers understand why this exception is intentionally non-fatal. Verification: ruff check: no new lint errors on the 4 edited files (7 pre-existing E402/E501 errors on base_langgraph_agent.py are unrelated and present on the unmodified base) unit tests (PYTHONPATH=. uv run pytest): tests/test_audit_logger.py ........... 5/5 pass tests/test_audit_callback.py ........... 3/3 pass smoke tests (module import + lazy-init exercise): audit_logger._ensure_indexes() flips _audit_state[indexes_ensured] to True and is idempotent on a second call. jwt_context._get_oidc_discovery() populates _discovery_state on first call and serves cache hits without HTTP traffic on second. Cross-PR merge impact: NONE. - audit_logger.py and auth/jwt_context.py are unique to PR #1527 - base_langgraph_agent.py is also touched by stale PRs #324 and #332 (last updated 2026-04-15) but in non-overlapping line ranges (their edits are around lines 1138-1151 and 2017-2238; ours is at 1969-1990). Any future rebase of #324/#332 will not conflict. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Add opt-in token budget enforcement to all LangGraph agents via
BaseLangGraphAgent._wrap_mcp_tools.token_budget.py—TokenBudgetManagertracks estimated token consumption across tool calls, raises graceful exceptions when limits are exceededbudget_aware_tool.py—BudgetAwareToolstandalone wrapper for custom tool pipelinesbase_langgraph_agent.py— Budget checks wired into all three tool wrapper paths (safe_coroutine,safe_run,safe_arun); reset per query instream()How it works
When
ENABLE_TOKEN_BUDGET=true, each MCP tool call is checked against token and call-count limits before execution. If a limit is exceeded, the tool returns a partial-results message instead of executing — no exceptions propagate to LangGraph/A2A.All budget operations are defensively wrapped so bugs in budget tracking never crash tool execution.
Configuration (env vars)
ENABLE_TOKEN_BUDGETfalseAGENT_MAX_TOKENS20000AGENT_MAX_TOOL_CALLS8What was removed
The original
mas-agent-baseagent directory was deleted — its base agent/executor duplicatedBaseLangGraphAgent/BaseLangGraphAgentExecutor. Only the novel token budget utilities were kept and integrated.Test plan
ENABLE_TOKEN_BUDGET=trueandAGENT_MAX_TOOL_CALLS=3, confirm agent stops after 3 tool calls with partial results message