| layout | default |
|---|---|
| title | Nanocoder - Chapter 2: Architecture & Agent Loop |
| nav_order | 2 |
| has_children | false |
| parent | Nanocoder - AI Coding Agent Deep Dive |
Welcome to Chapter 2: Architecture & Agent Loop. In this part of Nanocoder Tutorial: Building and Understanding AI Coding Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Understanding the core architecture that powers every AI coding agent—from message orchestration to the agentic execution loop.
The agent loop is the beating heart of every AI coding agent. This chapter dissects how nanocoder's architecture works internally: how user messages flow through the system, how the LLM decides when to use tools versus respond with text, and how results are fed back to create a coherent multi-turn interaction.
flowchart TB
subgraph UI["Terminal UI Layer"]
Input[User Input]
Output[Response Stream]
Approval[Approval Prompts]
end
subgraph Core["Agent Core"]
Loop[Agent Loop]
History[Message History]
SystemPrompt[System Prompt Builder]
end
subgraph Providers["Provider Layer"]
Router[Provider Router]
Ollama[Ollama]
OpenAI[OpenAI-Compatible]
Local[Local Server]
end
subgraph Tools["Tool Layer"]
Registry[Tool Registry]
ReadFile[read_file]
WriteFile[write_file]
BashExec[bash]
Search[search]
end
Input --> Loop
Loop --> SystemPrompt
SystemPrompt --> History
History --> Router
Router --> Ollama
Router --> OpenAI
Router --> Local
Ollama --> Loop
OpenAI --> Loop
Local --> Loop
Loop --> Registry
Registry --> ReadFile
Registry --> WriteFile
Registry --> BashExec
Registry --> Search
ReadFile --> Loop
WriteFile --> Approval
BashExec --> Approval
Approval --> Loop
Loop --> Output
The agent loop is a while-loop that continues until the LLM produces a final text response with no tool calls. Here's the core pattern:
interface Message {
role: "system" | "user" | "assistant" | "tool";
content: string;
tool_calls?: ToolCall[];
tool_call_id?: string;
}
interface ToolCall {
id: string;
type: "function";
function: {
name: string;
arguments: string; // JSON string
};
}
async function agentLoop(
userMessage: string,
history: Message[],
provider: LLMProvider,
tools: ToolRegistry
): Promise<string> {
// Add user message to history
history.push({ role: "user", content: userMessage });
while (true) {
// Send full history to LLM
const response = await provider.chat({
messages: history,
tools: tools.getSchemas(),
stream: true,
});
// Add assistant response to history
history.push({
role: "assistant",
content: response.content,
tool_calls: response.toolCalls,
});
// If no tool calls, we're done
if (!response.toolCalls || response.toolCalls.length === 0) {
return response.content;
}
// Execute each tool call
for (const toolCall of response.toolCalls) {
const result = await tools.execute(
toolCall.function.name,
JSON.parse(toolCall.function.arguments)
);
// Add tool result to history
history.push({
role: "tool",
content: JSON.stringify(result),
tool_call_id: toolCall.id,
});
}
// Loop continues — LLM will see tool results on next iteration
}
}flowchart TD
A[Receive User Message] --> B[Append to History]
B --> C[Send History + Tools to LLM]
C --> D{Response Type?}
D -->|Text Only| E[Return Response to User]
D -->|Tool Calls| F[Parse Tool Calls]
F --> G[Execute Tools]
G --> H{Approval Required?}
H -->|Yes| I[Prompt User]
I -->|Approved| J[Execute & Capture Result]
I -->|Denied| K[Return Denial Message]
H -->|No| J
J --> L[Append Tool Results to History]
K --> L
L --> C
The agent loop exploits a key property of modern LLMs: tool use is a native capability. When you provide tool schemas alongside your messages, the LLM can choose to:
- Respond with text — when it has enough information
- Request tool calls — when it needs to read files, execute commands, or gather more context
- Chain multiple tools — reading a file, then modifying it, then running tests
The loop naturally handles multi-step tasks because each tool result becomes part of the conversation history, giving the LLM the context it needs to decide the next action.
The system prompt is the hidden instruction set that shapes agent behavior. Nanocoder builds it dynamically based on the current project and configuration:
function buildSystemPrompt(config: AgentConfig): string {
const sections: string[] = [];
// Core identity
sections.push(`You are a coding assistant running in the user's terminal.
You have access to tools for reading files, writing files, and executing commands.
Always explain what you're about to do before using tools.`);
// Working directory context
sections.push(`Current working directory: ${process.cwd()}
Project type: ${detectProjectType()}
Package manager: ${detectPackageManager()}`);
// Tool usage guidelines
sections.push(`## Tool Usage Guidelines
- Use read_file to examine code before making changes
- Always show file changes to the user before writing
- Use bash for running tests, builds, and other commands
- Never execute destructive commands without explicit approval`);
// Custom instructions from config
if (config.systemPrompt) {
sections.push(`## Project-Specific Instructions\n${config.systemPrompt}`);
}
return sections.join("\n\n");
}function detectProjectType(): string {
const indicators: Record<string, string> = {
"package.json": "Node.js/TypeScript",
"pyproject.toml": "Python",
"Cargo.toml": "Rust",
"go.mod": "Go",
"pom.xml": "Java (Maven)",
"build.gradle": "Java (Gradle)",
"Gemfile": "Ruby",
};
for (const [file, type] of Object.entries(indicators)) {
if (existsSync(file)) return type;
}
return "Unknown";
}Conversation history is the LLM's working memory. Managing it well is critical for agent performance:
class MessageHistory {
private messages: Message[] = [];
private maxTokens: number;
constructor(maxTokens: number = 100000) {
this.maxTokens = maxTokens;
}
push(message: Message): void {
this.messages.push(message);
this.trim();
}
// Keep recent messages, summarize or drop older ones
private trim(): void {
const totalTokens = this.estimateTokens();
if (totalTokens <= this.maxTokens) return;
// Strategy: keep system prompt + last N messages
const systemMessages = this.messages.filter(
(m) => m.role === "system"
);
const nonSystemMessages = this.messages.filter(
(m) => m.role !== "system"
);
// Remove oldest non-system messages until under budget
while (
this.estimateTokens() > this.maxTokens &&
nonSystemMessages.length > 2
) {
nonSystemMessages.shift();
}
this.messages = [...systemMessages, ...nonSystemMessages];
}
private estimateTokens(): number {
// Rough estimate: 1 token ≈ 4 characters
return this.messages.reduce(
(sum, m) => sum + Math.ceil(m.content.length / 4),
0
);
}
getMessages(): Message[] {
return [...this.messages];
}
}AI coding agents stream responses token-by-token for a responsive user experience:
async function streamResponse(
provider: LLMProvider,
messages: Message[],
tools: ToolSchema[]
): Promise<StreamedResponse> {
const stream = await provider.chat({
messages,
tools,
stream: true,
});
let content = "";
let toolCalls: ToolCall[] = [];
for await (const chunk of stream) {
if (chunk.type === "content_delta") {
content += chunk.text;
process.stdout.write(chunk.text); // Real-time output
}
if (chunk.type === "tool_call_delta") {
// Accumulate tool call data
const existing = toolCalls.find((tc) => tc.id === chunk.id);
if (existing) {
existing.function.arguments += chunk.argumentsDelta;
} else {
toolCalls.push({
id: chunk.id,
type: "function",
function: {
name: chunk.functionName,
arguments: chunk.argumentsDelta,
},
});
}
}
}
return { content, toolCalls };
}Robust error handling prevents the agent from crashing mid-task:
async function executeToolWithRetry(
tools: ToolRegistry,
toolCall: ToolCall,
maxRetries: number = 2
): Promise<ToolResult> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const result = await tools.execute(
toolCall.function.name,
JSON.parse(toolCall.function.arguments)
);
return { success: true, output: result };
} catch (error) {
if (attempt === maxRetries) {
return {
success: false,
output: `Error executing ${toolCall.function.name}: ${error.message}`,
};
}
// Feed error back to LLM for self-correction
}
}
}The error result is added to the conversation history, allowing the LLM to:
- Understand what went wrong
- Try a different approach
- Ask the user for help
Different AI coding agents implement the loop differently:
| Aspect | Nanocoder | Aider | Claude Code | OpenHands |
|---|---|---|---|---|
| Loop Type | Single-threaded | Single-threaded | Multi-threaded | Process-based |
| History | In-memory | Git-backed | In-memory + disk | Database |
| Tool Protocol | OpenAI function calling | Custom edit format | Anthropic tool use | Custom action space |
| Streaming | Token-level | Line-level | Token-level | Chunk-level |
| Error Recovery | Retry + feedback | Git rollback | Retry + feedback | Checkpoint restore |
The agent loop is deceptively simple—a while-loop that sends messages to an LLM and executes tool calls until the LLM responds with text only. The sophistication lies in the details: how history is managed, how tools are defined, how errors are handled, and how the system prompt shapes behavior.
- The agent loop is a
while(true)that breaks when the LLM produces no tool calls - Tool results become part of conversation history, enabling multi-step reasoning
- System prompts are built dynamically based on project context
- Message history must be actively managed to stay within token limits
- Streaming provides responsive UX while accumulating tool call data
- Error handling feeds failures back to the LLM for self-correction
In Chapter 3: Tool System Internals, we'll explore how tools are defined, registered, and executed—the interface between LLM intelligence and real-world actions.
Built with insights from the Nanocoder project.
This chapter is expanded to v1-style depth for production-grade learning and implementation quality.
- tutorial: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- tutorial slug: nanocoder-tutorial
- chapter focus: Chapter 2: Architecture & Agent Loop
- system context: Nanocoder Tutorial
- objective: move from surface-level usage to repeatable engineering operation
- Define the runtime boundary for
Chapter 2: Architecture & Agent Loop. - Separate control-plane decisions from data-plane execution.
- Capture input contracts, transformation points, and output contracts.
- Trace state transitions across request lifecycle stages.
- Identify extension hooks and policy interception points.
- Map ownership boundaries for team and automation workflows.
- Specify rollback and recovery paths for unsafe changes.
- Track observability signals for correctness, latency, and cost.
| Decision Area | Low-Risk Path | High-Control Path | Tradeoff |
|---|---|---|---|
| Runtime mode | managed defaults | explicit policy config | speed vs control |
| State handling | local ephemeral | durable persisted state | simplicity vs auditability |
| Tool integration | direct API use | mediated adapter layer | velocity vs governance |
| Rollout method | manual change | staged + canary rollout | effort vs safety |
| Incident response | best effort logs | runbooks + SLO alerts | cost vs reliability |
| Failure Mode | Early Signal | Root Cause Pattern | Countermeasure |
|---|---|---|---|
| stale context | inconsistent outputs | missing refresh window | enforce context TTL and refresh hooks |
| policy drift | unexpected execution | ad hoc overrides | centralize policy profiles |
| auth mismatch | 401/403 bursts | credential sprawl | rotation schedule + scope minimization |
| schema breakage | parser/validation errors | unmanaged upstream changes | contract tests per release |
| retry storms | queue congestion | no backoff controls | jittered backoff + circuit breakers |
| silent regressions | quality drop without alerts | weak baseline metrics | eval harness with thresholds |
- Establish a reproducible baseline environment.
- Capture chapter-specific success criteria before changes.
- Implement minimal viable path with explicit interfaces.
- Add observability before expanding feature scope.
- Run deterministic tests for happy-path behavior.
- Inject failure scenarios for negative-path validation.
- Compare output quality against baseline snapshots.
- Promote through staged environments with rollback gates.
- Record operational lessons in release notes.
- chapter-level assumptions are explicit and testable
- API/tool boundaries are documented with input/output examples
- failure handling includes retry, timeout, and fallback policy
- security controls include auth scopes and secret rotation plans
- observability includes logs, metrics, traces, and alert thresholds
- deployment guidance includes canary and rollback paths
- docs include links to upstream sources and related tracks
- post-release verification confirms expected behavior under load
- Nanocoder Repository
- Nanocoder Releases
- Nanocoder Documentation Directory
- Nanocoder MCP Configuration Guide
- Nano Collective Website
- Build a minimal end-to-end implementation for
Chapter 2: Architecture & Agent Loop. - Add instrumentation and measure baseline latency and error rate.
- Introduce one controlled failure and confirm graceful recovery.
- Add policy constraints and verify they are enforced consistently.
- Run a staged rollout and document rollback decision criteria.
- Which execution boundary matters most for this chapter and why?
- What signal detects regressions earliest in your environment?
- What tradeoff did you make between delivery speed and governance?
- How would you recover from the highest-impact failure mode?
- What must be automated before scaling to team-wide adoption?
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: incoming request volume spikes after release
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: introduce adaptive concurrency limits and queue bounds
- verification target: latency p95 and p99 stay within defined SLO windows
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: tool dependency latency increases under concurrency
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: enable staged retries with jitter and circuit breaker fallback
- verification target: error budget burn rate remains below escalation threshold
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: schema updates introduce incompatible payloads
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: pin schema versions and add compatibility shims
- verification target: throughput remains stable under target concurrency
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: environment parity drifts between staging and production
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: restore environment parity via immutable config promotion
- verification target: retry volume stays bounded without feedback loops
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: access policy changes reduce successful execution rates
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: re-scope credentials and rotate leaked or stale keys
- verification target: data integrity checks pass across write/read cycles
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: background jobs accumulate and exceed processing windows
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: activate degradation mode to preserve core user paths
- verification target: audit logs capture all control-plane mutations
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: incoming request volume spikes after release
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: introduce adaptive concurrency limits and queue bounds
- verification target: latency p95 and p99 stay within defined SLO windows
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Nanocoder Tutorial: Building and Understanding AI Coding Agents
- trigger condition: tool dependency latency increases under concurrency
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: enable staged retries with jitter and circuit breaker fallback
- verification target: error budget burn rate remains below escalation threshold
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for messages, Loop, content so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 2: Architecture & Agent Loop as an operating subsystem inside Nanocoder Tutorial: Building and Understanding AI Coding Agents, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around tools, push, chunk as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 2: Architecture & Agent Loop usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
messages. - Input normalization: shape incoming data so
Loopreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
content. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- Nanocoder Repository
Why it matters: authoritative reference on
Nanocoder Repository(github.com). - Nanocoder Releases
Why it matters: authoritative reference on
Nanocoder Releases(github.com). - Nanocoder Documentation Directory
Why it matters: authoritative reference on
Nanocoder Documentation Directory(github.com). - Nanocoder MCP Configuration Guide
Why it matters: authoritative reference on
Nanocoder MCP Configuration Guide(github.com). - Nano Collective Website
Why it matters: authoritative reference on
Nano Collective Website(nanocollective.org).
Suggested trace strategy:
- search upstream code for
messagesandLoopto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production