Implement Parallel Tool Execution, Auto-Compaction, and Coordinator Architecture

## Summary

Introduce three architectural improvements to the agent pipeline: concurrent tool execution, automatic conversation compaction, and a coordinator-based orchestration pattern.

These changes reduce end-to-end latency, prevent context window failures on complex tasks, and enable parallel research workflows — all critical for beta readiness.

## Context

The current agent runs a strictly linear pipeline where every operation executes sequentially and conversation history grows unbounded. This works for simple tasks but creates three problems at scale:

1. **Serial bottleneck** — Read-only operations (validation, documentation lookups, schema checks) that are independent of each other still wait in line. On multi-file form generation, this adds unnecessary latency.
2. **Unbounded context growth** — Complex tasks with many files generate long conversation histories. There is no mechanism to summarize or compact history before it hits LLM context limits, causing hard failures.
3. **No parallel research** — Independent research tasks (scanning the repo, checking documentation, validating existing layouts) run one after another even though they share no state and could execute simultaneously.

Production-grade agent systems address these with concurrency-aware tool batching, token-budget-driven compaction, and coordinator patterns that fan out independent work to parallel workers.

## Goal

- Classify tool calls by safety (read-only vs write) and execute read-only operations concurrently
- Track token usage and automatically summarize conversation history before hitting context limits, with circuit breakers to avoid retry loops
- Support a coordinator node that dispatches independent research tasks in parallel, synthesizes findings, then delegates serial implementation
- Maintain full backward compatibility — coordinator mode is opt-in via feature flag

## In scope

- **Parallel tool execution** — Concurrency classification for tools, batch partitioning, and concurrent execution of read-only tools via `asyncio.gather`
- **Auto-compaction service** — Token counting, configurable threshold, conversation summarization, circuit breaker after repeated failures, and preservation of critical state through compaction boundaries
- **Coordinator mode** — A coordinator node that fans out parallel research workers, a synthesis step, and serial implementation dispatch using LangGraph's dynamic routing, gated behind a feature flag

## Out of scope

- Changes to the user-facing API contract or SSE event format
- New tools or modifications to the MCP server
- Infrastructure or deployment changes
- Permission system or cost tracking (separate future work)

## Acceptance criteria

- [ ] Read-only tool calls execute concurrently; write operations always run serially
- [ ] Conversations exceeding the token threshold are automatically compacted without user intervention
- [ ] Compaction failures trigger a circuit breaker and do not crash the workflow
- [ ] Coordinator mode completes multi-file research faster than serial execution
- [ ] Coordinator mode is feature-flagged and does not affect the default pipeline
- [ ] All existing tests pass without modification
- [ ] Langfuse traces capture parallel execution spans and compaction events

## Why this matters

The agent currently handles simple single-page forms well, but beta users will submit complex multi-page forms with data models, text resources, and validation rules. These tasks generate dozens of tool calls and long conversation histories. Without these improvements:

- **Performance degrades linearly** with task complexity because every tool call waits for the previous one, even when they are independent
- **Complex tasks fail unpredictably** when conversation history exceeds context limits with no recovery path
- **Latency compounds** because research that could run in 5 seconds across 3 parallel workers takes 15 seconds sequentially

These are not optimizations — they are architectural prerequisites for handling real-world form generation workloads reliably.

## Relationship to other work

Contributes to **Altinity Agents Performance & Scalability** and supports beta readiness. Complements the existing MCP graceful degradation work (connection retry, degraded flag propagation) and the evaluation pipeline by ensuring the agent can handle longer, more complex workflows without hitting context limits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Parallel Tool Execution, Auto-Compaction, and Coordinator Architecture #172

Summary

Context

Goal

In scope

Out of scope

Acceptance criteria

Why this matters

Relationship to other work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement Parallel Tool Execution, Auto-Compaction, and Coordinator Architecture #172

Description

Summary

Context

Goal

In scope

Out of scope

Acceptance criteria

Why this matters

Relationship to other work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions