Skip to content

perf: investigate & reduce agent reply latency (umbrella) #63

Description

@Rustam-Z

Problem

The agent is slow to reply. Need a systematic latency investigation + fixes.

Suspected contributors (from triage):

  • Every user message goes through the full pipeline serially ("goes through XX and stop") — needs more async/concurrency.
  • Reasoning runs after every response, adding latency on simple turns.
  • Use of -p (print/headless) mode — evaluate alternatives.
  • Messages DB + growing context slow things down.
  • Memory file reads slow; consider refreshing/removing old data.

Expected

  • Profile the per-turn path end to end; identify the dominant cost.
  • Concrete fixes (async, conditional reasoning, transport change, DB/context/memory optimizations).

Related: #54 (context compaction), #55 (memory RAG), #62 (effort level), #64 (SQL cache).

From triage dump: "The model is slow, understand why" + the whole "What slows down" section.

Labels: enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions