Skip to content

feat(utils): add token budget enforcement for all LangGraph agents#332

Open
adickinson72 wants to merge 7 commits into
mainfrom
agent-base-consolidation
Open

feat(utils): add token budget enforcement for all LangGraph agents#332
adickinson72 wants to merge 7 commits into
mainfrom
agent-base-consolidation

Conversation

@adickinson72
Copy link
Copy Markdown
Collaborator

@adickinson72 adickinson72 commented Oct 2, 2025

Summary

Add opt-in token budget enforcement to all LangGraph agents via BaseLangGraphAgent._wrap_mcp_tools.

  • token_budget.pyTokenBudgetManager tracks estimated token consumption across tool calls, raises graceful exceptions when limits are exceeded
  • budget_aware_tool.pyBudgetAwareTool standalone wrapper for custom tool pipelines
  • base_langgraph_agent.py — Budget checks wired into all three tool wrapper paths (safe_coroutine, safe_run, safe_arun); reset per query in stream()

How it works

When ENABLE_TOKEN_BUDGET=true, each MCP tool call is checked against token and call-count limits before execution. If a limit is exceeded, the tool returns a partial-results message instead of executing — no exceptions propagate to LangGraph/A2A.

All budget operations are defensively wrapped so bugs in budget tracking never crash tool execution.

Configuration (env vars)

Variable Default Description
ENABLE_TOKEN_BUDGET false Opt-in toggle
AGENT_MAX_TOKENS 20000 Max estimated tokens per query
AGENT_MAX_TOOL_CALLS 8 Max tool calls per query

What was removed

The original mas-agent-base agent directory was deleted — its base agent/executor duplicated BaseLangGraphAgent/BaseLangGraphAgentExecutor. Only the novel token budget utilities were kept and integrated.

Test plan

  • Verify all existing agents work unchanged (default: budget disabled)
  • Set ENABLE_TOKEN_BUDGET=true and AGENT_MAX_TOOL_CALLS=3, confirm agent stops after 3 tool calls with partial results message
  • Verify no exceptions propagate to A2A streams when budget is exceeded
  • Run existing test suite

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 2, 2025

📊 Test Coverage Report

Main Tests Coverage

Metric Coverage Details
Lines 12.6% 245/1940 lines
Branches 0.0% 0/0 branches

RAG Tests Coverage

Metric Coverage Details
Lines 59.7% 462/774 lines
Branches 35.7% 70/196 branches

📁 Coverage Artifacts

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution! This PR has been automatically marked as stale because it has no recent activity in the last 90 days. It will be closed in 7 days, if no further activity occurs. If this pull request is still relevant, please leave a comment to let us know, and the stale label will be automatically removed.

@sriaradhyula
Copy link
Copy Markdown
Member

Integration Analysis: mas_agent_base ↔ Template Agent

Thanks for sharing this reference implementation @adickinson72. This comment proposes concrete integration options for aligning mas_agent_base with the existing template agent and BaseLangGraphAgent infrastructure.


Current State

The template agent (ai_platform_engineering/agents/template/) and all production agents (argocd, github, jira, etc.) currently rely on:

Component Location Size
BaseLangGraphAgent utils/a2a_common/base_langgraph_agent.py ~112 KB
BaseLangGraphAgentExecutor utils/a2a_common/base_langgraph_agent_executor.py ~20 KB

The PetStoreAgent in the template extends BaseLangGraphAgent using an abstract-method override pattern:

class PetStoreAgent(BaseLangGraphAgent):
    def get_agent_name(self) -> str: ...
    def get_system_instruction(self) -> str: ...
    def get_mcp_config(self, server_path: str) -> dict: ...
    def get_mcp_http_config(self) -> dict | None: ...
    def get_tool_working_message(self) -> str: ...
    def get_tool_processing_message(self) -> str: ...

mas_agent_base proposes a constructor-injection pattern instead:

class PetstoreAgent(BaseAgent):
    def __init__(self):
        super().__init__(
            agent_name="petstore",
            system_instruction=SYSTEM_PROMPT,
            mcp_config=MCPConfig(server_name="petstore", server_path="...", required_env_vars=["PETSTORE_API_KEY"]),
        )

Feature Gap Analysis

Features present in BaseLangGraphAgent that are absent from mas_agent_base.BaseAgent:

Feature BaseLangGraphAgent mas_agent_base.BaseAgent
Persistent checkpointing (Redis / MongoDB / PostgreSQL) ✅ via get_checkpointer() InMemorySaver only
LangMem message summarization ✅ auto-compression ❌ not present
Context token management ✅ provider-aware limits ❌ not present
Custom CA bundle / TLS (CUSTOM_CA_BUNDLE, SSL_VERIFY) _build_httpx_client_factory() ❌ not present
Tool output chunking (>50 KB → temp file) _chunk_large_output() ❌ not present
Tool output truncation with refine-query guidance _truncate_tool_output() ❌ not present
ExceptionGroup recovery for Go MCP servers ✅ handled ❌ not present
Multi-server MCP config (pass full dict) ✅ supported ❌ single-server MCPConfig only
get_additional_tools() hook for non-MCP tools ✅ supported ❌ not present
Date injection in system prompt ✅ auto-prepended ❌ not present
Streaming artifact accumulation (partial chunks) ✅ in executor ❌ executor emits single-pass

Features in mas_agent_base that are absent or weaker in BaseLangGraphAgent:

Feature BaseLangGraphAgent mas_agent_base.BaseAgent
Per-request tool-call budget (hard cap) ❌ no per-call limit TokenBudgetManager + BudgetAwareTool
Bedrock prompt caching (create_cache_point) ❌ not present ✅ built-in
Lazy initialization / context-manager lifecycle ❌ eagerly initializes initialize() / async with
Clean dataclass MCP config (MCPConfig) ❌ abstract methods ✅ structured, validateable

Integration Options

Option A — Full Migration: Template Adopts mas_agent_base

Replace BaseLangGraphAgent / BaseLangGraphAgentExecutor with BaseAgent / BaseAgentExecutor from this PR as the template's base.

What changes in the template:

# Before (agent.py)
from ai_platform_engineering.utils.a2a_common.base_langgraph_agent import BaseLangGraphAgent

class PetStoreAgent(BaseLangGraphAgent):
    def get_mcp_config(self, server_path: str) -> dict:
        return {"command": "uv", "args": [...], "env": {...}, "transport": "stdio"}
    ...

# After (agent.py)
from ai_platform_engineering.agents.mas_agent_base import BaseAgent, MCPConfig

class PetStoreAgent(BaseAgent):
    def __init__(self):
        super().__init__(
            agent_name="petstore",
            system_instruction=SYSTEM_PROMPT,
            mcp_config=MCPConfig(
                server_name="petstore",
                server_path="path/to/mcp_petstore/__main__.py",
                required_env_vars=["PETSTORE_API_KEY"],
                transport="stdio",
            ),
        )
# Before (agent_executor.py)
from ai_platform_engineering.utils.a2a_common.base_langgraph_agent_executor import BaseLangGraphAgentExecutor

class PetStoreAgentExecutor(BaseLangGraphAgentExecutor):
    def __init__(self):
        super().__init__(PetStoreAgent())

# After (agent_executor.py) — nearly identical
from ai_platform_engineering.agents.mas_agent_base import BaseAgentExecutor

class PetStoreAgentExecutor(BaseAgentExecutor):
    def __init__(self):
        super().__init__(PetStoreAgent())

Trade-offs:

✅ Cleaner, more readable agent definitions
MCPConfig is self-documenting and validateable
✅ Bedrock prompt caching for free
✅ Hard per-request tool-call budget
❌ Loses persistent checkpointing — sessions reset on pod restart
❌ Loses LangMem summarization — long conversations will hit context limits
❌ Loses TLS / CA bundle support — breaks enterprise proxies
❌ Loses tool output chunking — large API responses risk context overflow
MCPConfig only handles single-server stdio or single HTTP endpoint

Verdict: Not recommended as a direct drop-in today. The feature regressions are too significant for production agents. Better suited as the base for new, simple agents that do not need persistent memory or complex MCP setups.


Option B — Selective Backport: Cherry-Pick Features Into Existing Base

Keep BaseLangGraphAgent as the production base, but pull specific innovations from mas_agent_base into it.

B1 — Adopt MCPConfig dataclass (replaces abstract get_mcp_config / get_mcp_http_config):

# utils/mcp_config.py — extend with MCPConfig dataclass
@dataclass
class MCPConfig:
    server_name: str
    required_env_vars: list[str]
    transport: str = "stdio"
    server_path: str | None = None
    http_url: str | None = None
    http_headers: dict[str, str] | None = None

    def validate_env_vars(self) -> None: ...
    def get_client_config(self) -> dict: ...

# BaseLangGraphAgent — add optional hook
def get_mcp_config_object(self) -> MCPConfig | None:
    return None  # subclasses opt in; existing abstract methods still work

This is backward-compatible — existing agents keep abstract methods, new agents use MCPConfig.

B2 — Adopt TokenBudgetManager + BudgetAwareTool (adds hard tool-call cap):

# In BaseLangGraphAgent._setup_mcp_and_graph():
if self.enable_token_budget:
    tools = self._wrap_tools_with_budget(tools)

This is additive — controlled by AGENT_MAX_TOOL_CALLS env var, disabled by default.

B3 — Adopt prompt caching (Bedrock create_cache_point):

# In BaseLangGraphAgent.__init__():
if enable_prompt_caching and hasattr(self.model, "create_cache_point"):
    self.cache_point = self.model.create_cache_point()

B4 — Adopt lazy initialization pattern (decouple MCP startup from object construction):

Currently BaseLangGraphAgent initializes synchronously at import time for the graph, but the actual MCP tool loading is deferred. Aligning with BaseAgent's explicit initialize() + async with pattern would improve testability and allow agents to start without live MCP servers available.

Trade-offs:

✅ No regressions — all existing features preserved
✅ Each change is independently mergeable
✅ Aligns interfaces without rewriting ~150 KB of base class
❌ Does not achieve full consolidation — two base classes still exist
❌ More incremental, slower to clean up the overall architecture

Verdict: Recommended near-term path. B1 + B2 are the highest value changes and can be done in 2–3 focused PRs.


Option C — Parallel Base Classes: mas_agent_base as Lightweight Tier

Ship mas_agent_base alongside BaseLangGraphAgent as an explicitly lighter tier for agents that do not need persistent memory or advanced context management.

The template agent ships two variants:

agents/template/
  agent_petstore/               ← current (uses BaseLangGraphAgent, full-featured)
  agent_petstore_simple/        ← new (uses BaseAgent from mas_agent_base, lighter)

Document the trade-offs clearly so agent authors choose the right tier. mas_agent_base becomes the entry point for new agents; BaseLangGraphAgent is the upgrade path when you need persistence or context management.

Trade-offs:

✅ Both tiers coexist — no migrations required
✅ Simpler onboarding for new agent authors using the lightweight path
mas_agent_base is immediately useful without full parity
❌ Two diverging base classes increases maintenance surface
❌ Feature gaps between tiers create confusion about when to upgrade
❌ Requires clear docs + governance to prevent fragmentation

Verdict: Viable if paired with a clear migration guide. Requires agreement on which features will eventually be unified and which will remain tier-specific.


Recommended Path

Short term (this PR + 2 follow-ups):

  1. Merge mas_agent_base as-is, scoped to agents/mas_agent_base/ (Option C foundation).
  2. Add MCPConfig dataclass to utils/mcp_config.py and make BaseLangGraphAgent accept it as an alternative to abstract methods (Option B1).
  3. Port TokenBudgetManager + BudgetAwareTool into BaseLangGraphAgent as opt-in (Option B2).

Medium term:

  1. Port Bedrock prompt caching into BaseLangGraphAgent (Option B3).
  2. Update the template agent to demonstrate mas_agent_base as the lightweight path (Option C).
  3. Add persistent checkpointing and TLS support to mas_agent_base.BaseAgent to close the gap with BaseLangGraphAgent.

Long term:

  1. When mas_agent_base reaches feature parity, deprecate BaseLangGraphAgent in favor of the cleaner constructor-injection pattern.

Open Questions

  1. Persistent memory: Is InMemorySaver-only acceptable for agents using mas_agent_base? Or should BaseAgent.__init__ accept an optional checkpointer parameter?
  2. Multi-server MCP: MCPConfig currently models a single server. Several production agents (e.g. GitHub) use multi-server configs. Should MCPConfig support a list of servers, or should BaseAgent accept list[MCPConfig]?
  3. Streaming executor: BaseAgentExecutor._handle_agent_event emits one artifact per completion event, while BaseLangGraphAgentExecutor accumulates chunks and streams progressively. Which behavior do downstream consumers (UI, supervisor) require?
  4. Module location: Should mas_agent_base live under agents/ (current PR) or under utils/a2a_common/ where the other base classes live?

@sriaradhyula sriaradhyula force-pushed the agent-base-consolidation branch from fdea07f to b86ca67 Compare April 14, 2026 23:00
Adam Dickinson and others added 2 commits April 14, 2026 18:01
Add MAS Agent Core shared base classes and utilities for building A2A agents,
including BaseAgent, BaseAgentExecutor, TokenBudgetManager, and MCP config.

Signed-off-by: Adam Dickinson <adickinson@demandbase.com>
…conventions

Rename mas_agent_base to mas-agent-base folder, add pyproject.toml, Makefile,
uv.lock, and Docker compose entries to match existing agent conventions.
Register in .github/agents.json for CI builds.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Assisted-by: Claude:claude-opus-4-6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sriaradhyula sriaradhyula force-pushed the agent-base-consolidation branch from b86ca67 to 1ebd4cb Compare April 14, 2026 23:01
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@eti-sre-cicd eti-sre-cicd added this to the 0.6.0 milestone Apr 14, 2026
Remove utils/ (logging, retry, temperature), mcp_config.py, and
response_format.py — all duplicate existing repo infrastructure.
Simplify base_agent.py (drop prompt caching, use plain dict for MCP config),
token_budget.py (drop tiktoken, use char heuristic), and budget_aware_tool.py.

11 files / ~1640 lines -> 5 files / 540 lines.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Assisted-by: Claude:claude-opus-4-6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

….json

This is a base library, not a production agent. Keep only the dev compose
entry under its own profile (not all-agents). Remove from agents.json
until ready for CI builds.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Assisted-by: Claude:claude-opus-4-6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

Move token_budget.py and budget_aware_tool.py to ai_platform_engineering/utils/
since these are the only novel additions — the rest (base_agent, base_executor)
duplicated existing BaseLangGraphAgent and BaseLangGraphAgentExecutor.

Remove the mas-agent-base agent directory, its docker-compose entry,
pyproject.toml, and uv.lock entirely.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Assisted-by: Claude:claude-opus-4-6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@sriaradhyula sriaradhyula marked this pull request as ready for review April 15, 2026 01:14
@sriaradhyula sriaradhyula self-requested a review as a code owner April 15, 2026 01:14
Integrate TokenBudgetManager into the shared base agent so all
LangGraph agents get optional token budget enforcement. Controlled
by ENABLE_TOKEN_BUDGET=true (off by default, zero behavior change).

- Initialize TokenBudgetManager in __init__ when env var is set
- Inject budget checks (pre-call limit check + post-call tracking)
  into all three tool wrapper paths: safe_coroutine, safe_run, safe_arun
- Reset budget at start of each stream() query
- All budget operations are wrapped in try/except to never crash
  tool execution — budget failures log warnings and proceed

Env vars:
  ENABLE_TOKEN_BUDGET=true   — opt in
  AGENT_MAX_TOKENS=20000     — token ceiling (default)
  AGENT_MAX_TOOL_CALLS=8     — tool call ceiling (default)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sriaradhyula sriaradhyula changed the title Add mas-agent-base consolidation (example only) feat(utils): add token budget enforcement for all LangGraph agents Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

Comment thread ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py Fixed
Use self.__class__.__name__ for TokenBudgetManager initialization
instead of self.get_agent_name() to avoid dynamic dispatch during
base class __init__ before subclass initialization completes.

Addresses code quality review comment on PR #332.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@sriaradhyula
Copy link
Copy Markdown
Member

@subbaksh @suwhang-cisco - Need your eyes on this on. I repurposed the original PR

sriaradhyula added a commit that referenced this pull request May 23, 2026
Address 4 findings from github-code-quality[bot] on PR #1527:

audit_logger.py:69 - unused global "_indexes_ensured"
auth/jwt_context.py:148 - unused global "_discovery_cache"
auth/jwt_context.py:149 - unused global "_discovery_cache_expiry"
a2a_common/base_langgraph_agent.py:1979 - empty except clause

Root cause analysis:

  Findings #1-#3 are CodeQL false positives on a textbook Python lazy-init
  pattern (read at the top of a function, write at the bottom, idempotent
  guard on a module-level scalar). The CodeQL "py/unused-global-variable"
  query is intra-procedural: it sees the write at the bottom of the
  function does not flow-reach the read at the top of the same function,
  and the cross-invocation read on a subsequent call is invisible to the
  analyser. The same module already uses mutable containers like
  "_userinfo_cache" for the same lazy-init pattern and those are NOT
  flagged, which is the canonical workaround.

  Finding #4 is a genuine code-smell: silently swallowing exceptions
  during audit-callback registration hides operator-visible problems
  (missing audit_callback module, MongoDB driver init failures, etc.).

Fixes:

  ai_platform_engineering/utils/audit_logger.py
    - Replace the scalar "_indexes_ensured" global with a mutable
      container "_audit_state = {indexes_ensured: False}". Behaviour is
      identical (idempotent lazy-init guarded by a threading.Lock).
    - Drop the now-unnecessary "global _indexes_ensured" declaration.

  tests/test_audit_logger.py
    - Update the autouse fixture to reset
      audit_logger._audit_state["indexes_ensured"] rather than the old
      scalar. This is the only external reference to the renamed symbol.

  ai_platform_engineering/utils/auth/jwt_context.py
    - Replace scalar "_discovery_cache" + "_discovery_cache_expiry"
      globals with a mutable container
      "_discovery_state = {doc: None, expiry: 0.0}". Same idempotent
      lazy-fetch pattern as before.
    - Drop the "global _discovery_cache, _discovery_cache_expiry"
      declaration. The "_DISCOVERY_CACHE_TTL_SECONDS" constant is
      unchanged.

  ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py
    - Replace the empty "except Exception: pass" with
      "except Exception as exc: logger.warning(..., exc_info=exc)" so
      audit-callback registration failures surface in service logs.
      Behaviour is preserved: agent execution still continues without
      the audit sink, but the failure is no longer silent.
    - Add an explanatory comment so future readers understand why this
      exception is intentionally non-fatal.

Verification:

  ruff check:
    no new lint errors on the 4 edited files
    (7 pre-existing E402/E501 errors on base_langgraph_agent.py are
     unrelated and present on the unmodified base)

  unit tests (PYTHONPATH=. uv run pytest):
    tests/test_audit_logger.py   ........... 5/5 pass
    tests/test_audit_callback.py ........... 3/3 pass

  smoke tests (module import + lazy-init exercise):
    audit_logger._ensure_indexes() flips _audit_state[indexes_ensured]
      to True and is idempotent on a second call.
    jwt_context._get_oidc_discovery() populates _discovery_state on
      first call and serves cache hits without HTTP traffic on second.

Cross-PR merge impact: NONE.
  - audit_logger.py and auth/jwt_context.py are unique to PR #1527
  - base_langgraph_agent.py is also touched by stale PRs #324 and #332
    (last updated 2026-04-15) but in non-overlapping line ranges
    (their edits are around lines 1138-1151 and 2017-2238; ours is at
    1969-1990). Any future rebase of #324/#332 will not conflict.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

3 participants