Summary
Introduce three architectural improvements to the agent pipeline: concurrent tool execution, automatic conversation compaction, and a coordinator-based orchestration pattern.
These changes reduce end-to-end latency, prevent context window failures on complex tasks, and enable parallel research workflows — all critical for beta readiness.
Context
The current agent runs a strictly linear pipeline where every operation executes sequentially and conversation history grows unbounded. This works for simple tasks but creates three problems at scale:
- Serial bottleneck — Read-only operations (validation, documentation lookups, schema checks) that are independent of each other still wait in line. On multi-file form generation, this adds unnecessary latency.
- Unbounded context growth — Complex tasks with many files generate long conversation histories. There is no mechanism to summarize or compact history before it hits LLM context limits, causing hard failures.
- No parallel research — Independent research tasks (scanning the repo, checking documentation, validating existing layouts) run one after another even though they share no state and could execute simultaneously.
Production-grade agent systems address these with concurrency-aware tool batching, token-budget-driven compaction, and coordinator patterns that fan out independent work to parallel workers.
Goal
- Classify tool calls by safety (read-only vs write) and execute read-only operations concurrently
- Track token usage and automatically summarize conversation history before hitting context limits, with circuit breakers to avoid retry loops
- Support a coordinator node that dispatches independent research tasks in parallel, synthesizes findings, then delegates serial implementation
- Maintain full backward compatibility — coordinator mode is opt-in via feature flag
In scope
- Parallel tool execution — Concurrency classification for tools, batch partitioning, and concurrent execution of read-only tools via
asyncio.gather
- Auto-compaction service — Token counting, configurable threshold, conversation summarization, circuit breaker after repeated failures, and preservation of critical state through compaction boundaries
- Coordinator mode — A coordinator node that fans out parallel research workers, a synthesis step, and serial implementation dispatch using LangGraph's dynamic routing, gated behind a feature flag
Out of scope
- Changes to the user-facing API contract or SSE event format
- New tools or modifications to the MCP server
- Infrastructure or deployment changes
- Permission system or cost tracking (separate future work)
Acceptance criteria
Why this matters
The agent currently handles simple single-page forms well, but beta users will submit complex multi-page forms with data models, text resources, and validation rules. These tasks generate dozens of tool calls and long conversation histories. Without these improvements:
- Performance degrades linearly with task complexity because every tool call waits for the previous one, even when they are independent
- Complex tasks fail unpredictably when conversation history exceeds context limits with no recovery path
- Latency compounds because research that could run in 5 seconds across 3 parallel workers takes 15 seconds sequentially
These are not optimizations — they are architectural prerequisites for handling real-world form generation workloads reliably.
Relationship to other work
Contributes to Altinity Agents Performance & Scalability and supports beta readiness. Complements the existing MCP graceful degradation work (connection retry, degraded flag propagation) and the evaluation pipeline by ensuring the agent can handle longer, more complex workflows without hitting context limits.
Summary
Introduce three architectural improvements to the agent pipeline: concurrent tool execution, automatic conversation compaction, and a coordinator-based orchestration pattern.
These changes reduce end-to-end latency, prevent context window failures on complex tasks, and enable parallel research workflows — all critical for beta readiness.
Context
The current agent runs a strictly linear pipeline where every operation executes sequentially and conversation history grows unbounded. This works for simple tasks but creates three problems at scale:
Production-grade agent systems address these with concurrency-aware tool batching, token-budget-driven compaction, and coordinator patterns that fan out independent work to parallel workers.
Goal
In scope
asyncio.gatherOut of scope
Acceptance criteria
Why this matters
The agent currently handles simple single-page forms well, but beta users will submit complex multi-page forms with data models, text resources, and validation rules. These tasks generate dozens of tool calls and long conversation histories. Without these improvements:
These are not optimizations — they are architectural prerequisites for handling real-world form generation workloads reliably.
Relationship to other work
Contributes to Altinity Agents Performance & Scalability and supports beta readiness. Complements the existing MCP graceful degradation work (connection retry, degraded flag propagation) and the evaluation pipeline by ensuring the agent can handle longer, more complex workflows without hitting context limits.