perf: investigate & reduce agent reply latency (umbrella)

**Problem**

The agent is slow to reply. Need a systematic latency investigation + fixes.

**Suspected contributors (from triage):**
- Every user message goes through the full pipeline serially ("goes through XX and stop") — needs more async/concurrency.
- Reasoning runs after every response, adding latency on simple turns.
- Use of `-p` (print/headless) mode — evaluate alternatives.
- Messages DB + growing context slow things down.
- Memory file reads slow; consider refreshing/removing old data.

**Expected**

- Profile the per-turn path end to end; identify the dominant cost.
- Concrete fixes (async, conditional reasoning, transport change, DB/context/memory optimizations).

Related: #54 (context compaction), #55 (memory RAG), #62 (effort level), #64 (SQL cache).

**From triage dump:** "The model is slow, understand why" + the whole "What slows down" section.

Labels: enhancement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: investigate & reduce agent reply latency (umbrella) #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: investigate & reduce agent reply latency (umbrella) #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions