Skip to content

Feature: Round-2 Gap Closure — Interrupts, Large Tool-Result Storage, Parallel-Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto-Gen, Dialectic User Model, Credential Pool, Reasoning-Block Handling, Safe Skill Installs #1480

@MervinPraison

Description

@MervinPraison

Feature: Round‑2 Gap Closure — Interrupts, Large Tool‑Result Storage, Parallel‑Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto‑Gen, Dialectic User Model, Credential Pool, Reasoning‑Block Handling, Safe Skill Installs

Overview

Follow‑up to #1471 (merged as #1472). After shipping skill mutation, nudges, and the improvements extraction fix, a second end‑to‑end walk — from first‑run onboarding → agent runtime → multi‑agent orchestration → self‑improvement persistence — surfaced 11 additional gaps against a best‑in‑class self‑improving agent. All are additive, safe‑by‑default, and behind feature flags.

NOTE: This issue has been audited and corrected based on findings in #1490. Several gaps identified below already exist in the codebase.

Implementable by another agent from this issue alone. Each phase is independently mergeable.


Background — Why a round 2

PR #1472 closed 3 of the 8 acceptance criteria in #1471 (skill CRUD protocol + tool, nudge mechanism, LearnConfig.improvements auto‑extraction). A full re‑walk of the agent lifecycle confirmed the remaining architectural gaps are outside the scope of that PR and warrant their own tracked ticket so they can be phased in without bloating one PR.

Evidence — confirmed gaps as of commit d25e852c on main (Corrected per #1490)

All greps executed against praisonaiagents + praisonai packages (excluding venv/ and tests/).

# Capability Grep / probe Result Status
❌ G1 Interactive first‑run wizard (praisonai init) ls praisonai/cli/commands/*.py EXISTS: setup.py (173 L) provides wizard functionality Won't Fix
G2 Interrupt / cancel mid‑run grep "set_interrupt|is_interrupted|interrupt_requested" praisonaiagents praisonai 0 hits Keep
G3 Auto‑persist large tool results → reference by ID grep "maybe_persist_tool_result|tool_result_storage" praisonaiagents praisonai 0 hits Keep (Narrow Scope)
G4 Parallel tool path‑overlap detection grep "_paths_overlap|overlap_detect" praisonaiagents praisonai 0 hits; call_executor.py runs tools concurrently with no write‑conflict guard Keep
G5 Multi‑category error classifier llm/llm.py:628 _is_rate_limit_error exists Only rate‑limit detected; no context_limit / auth / transient / permanent categories Keep (Narrow Scope)
❌ G6 Surrogate / non‑ASCII sanitisation of outgoing messages grep "sanitize_surrogate|surrogatepass|_strip_non_ascii" 0 hits Won't Fix
❌ G7 Shared iteration budget across parent + subagents grep -r "IterationBudget|iteration_budget" praisonaiagents; autonomy.py:131 max_iterations: int = 20 is per‑agent Each subagent gets its own budget — total work across a tree can exceed the user's intended cap Won't Fix
G8 Session title auto‑generation from first exchange grep "generate_title|auto_title" session/hierarchy.py has a title field but no generator Keep
G9 Reasoning‑block (<think>…</think> / provider thinking blocks) handling grep "scratchpad|thinking_blocks|<think>" 0 hits — reasoning models' internal tokens leak or break caching Keep (Narrow Scope)
❌ G10 Dialectic user model (structured, evolving) LearnConfig.persona=True stores flat strings in PersonaStore EXISTS: PersonaStore already supports structured categorization with add_preference(), add_profile() methods Won't Fix
❌ G11 Credential pool / multi‑API‑key rotation grep "credential_pool|CredentialPool|key_rotation|OPENAI_API_KEY_POOL" 0 hits; only provider‑level failover in llm/failover.py Won't Fix
G12 Skill supply‑chain scan (OSV / content hash on install) grep "osv.dev|OSV" 0 hits; SkillMutatorProtocol writes files but no OSV check on installed scripts/references Keep (Gated)

Audit Results (From #1490)

The following gaps have been reassessed and corrected:

❌ Won't Fix Items (Already Exist or Low Value)

  • G1 (init wizard): praisonai setup command already provides 173-line wizard with provider selection, API key management, non-interactive mode
  • G6 (surrogate sanitization): No real-world user reports; most users hit litellm which handles unicode properly
  • G7 (shared iteration budget): Low real-world impact; users can set explicit limits on delegator agents
  • G10 (dialectic user model): PersonaStore already supports structured categorization via add_preference(category) and add_profile(aspect) methods
  • G11 (credential pool): litellm already provides provider-level failover; niche feature that doubles key management complexity

✅ Keep Items (With Reduced Scope)

  • G2 (interrupt): Critical safety feature - reduce scope to ~50 LOC by implementing existing protocol stub
  • G3 (large tool results): Keep but integrate with existing truncation infrastructure (~150 LOC in wrapper only)
  • G4 (path overlap): Keep as-is (~100 LOC) - genuine data-loss prevention
  • G5 (error classifier): Reduce scope to context-overflow detection only (~30 LOC)
  • G8 (session title): Keep as-is (~100 LOC) - real UX improvement
  • G9 (thinking blocks): Reduce scope to render-path stripping (~15 LOC)
  • G12 (OSV scan): Keep but gate on Skills Hub Phase 6 completion

Revised LOC Estimate

Original Estimate Revised Estimate Reduction
~2200 LOC ~595 LOC -73%
Core SDK: +1200 LOC Core SDK: ≤150 LOC Protocol-first preserved

What PraisonAI already has (DRY — reuse, don't duplicate)

  • praisonaiagents/llm/failover.pyLLMFailover + _is_rate_limit_error → extend with context-overflow classifier
  • praisonaiagents/llm/protocols.pyLLMRateLimiterProtocol → reuse for rate tracking
  • praisonaiagents/tools/call_executor.pycreate_tool_call_executor(parallel=True) → extend with overlap detection
  • praisonaiagents/agent/protocols.py:406interrupt() protocol stub exists → implement
  • praisonaiagents/session/hierarchy.py — session title field present → add generator
  • praisonaiagents/context/{store,aggregator,instrumentation}.py — existing truncation infrastructure → extend
  • praisonai/cli/commands/setup.py — existing 173-line setup wizard → sufficient for G1
  • praisonaiagents/memory/learn/stores.py:208PersonaStore with categorization → sufficient for G10

Architecture Analysis

Current agent turn (simplified)

Agent.start(prompt)
  └─► execution_mixin._run_loop()
        └─► for i in range(max_iterations):
              ├─► llm.get_response(messages, tools)
              ├─► call_executor.execute(tool_calls)   ◄── no overlap check
              ├─► append tool results to messages     ◄── full result inline
              └─► if no more tool_calls: break
        └─► _trigger_after_agent_hook()
              ├─► _process_auto_memory
              ├─► _process_auto_learning
              └─► _maybe_emit_nudge                   (merged in #1472)

Target agent turn (this issue)

Agent.start(prompt)
  ├─► if not session.title: schedule title_gen         [G8]
  │
  └─► execution_mixin._run_loop()
        └─► while not max_iterations_reached:
              ├─► if interrupt_requested: break        [G2]
              ├─► llm.get_response(...)
              │     └─► on_error → check context_overflow [G5]
              │           → trigger compressor → retry
              ├─► call_executor.execute(
              │        tool_calls,
              │        overlap_guard=True              [G4]
              │     )
              ├─► for r in tool_results:
              │     if len(r) > threshold:
              │        ref = tool_result_store.put(r)  [G3]
              │        replace inline content with ref
              │        strip <think>...</think>        [G9]
              └─► if no more tool_calls: break

  └─► post‑turn:
        ├─► _process_auto_learning 
        ├─► _maybe_emit_nudge
        └─► trajectory.record_turn

Revised Implementation Plan

G2 — Interrupt (50 LOC)

Wire existing agent/protocols.py:406 interrupt stub:

  • Add Agent.interrupt() → sets Event flag
  • Check flag once per iteration in _run_loop
  • Propagate to subagents via delegator

G3 — Large Tool Result Storage (150 LOC, wrapper only)

Extend existing truncation:

  • Optional retrieval escape hatch when tool output truncated
  • Store full content to session file, expose load_truncated(ref) tool
  • Reuse context/store.py and context/aggregator.py

G4 — Path Overlap Guard (100 LOC)

Add to call_executor.py:

  • Detect concurrent writes to same path
  • Fall back to sequential execution on conflict

G5 — Error Classifier (30 LOC)

Context-overflow detection only:

  • is_context_overflow(err) -> bool helper
  • Wire to existing context/compressor.py → retry

G8 — Session Title Auto-gen (100 LOC)

Add to existing session infrastructure:

  • generate_title() from first user+assistant exchange
  • Lazy, failure-silent

G9 — Thinking Block Handling (15 LOC)

Render-path stripping only:

  • Strip <think>...</think> in output formatter
  • No new module needed

G12 — OSV Scan (150 LOC, gated)

Only after Skills Hub Phase 6:

  • OSV.dev integration for skill security scanning

Acceptance Criteria (Revised)

G2 — interrupt

  • agent.interrupt() stops mid‑run within ≤1 iteration
  • Partial response returned; hooks still fire
  • Works under asyncio and threading

G3 — tool result storage

  • Tool result > 8KB → replaced with JSON ref containing preview
  • Agent can call load_truncated(ref) for full content
  • Integrates with existing context/aggregator.py truncation

G4 — path overlap guard

  • Concurrent file writes → sequential fallback automatically
  • No false positives on different files
  • Performance impact only when overlap detected

G5 — context overflow classifier

  • Context limit errors → auto-compress → retry once
  • Other errors pass through unchanged
  • Reuses existing compressor infrastructure

G8 — session title

  • First user+assistant exchange → auto-generates 3-6 word title
  • Failure-silent fallback to truncated user message
  • Lazy - only when session.title is empty

G9 — thinking blocks

  • <think>...</think> content stripped from user-visible output
  • Thinking content preserved for observability if needed
  • Minimal render-path change

G12 — OSV scan

  • Gated behind Skills Hub Phase 6 completion
  • CRITICAL vulns → user approval prompt
  • Graceful offline fallback

Technical Considerations

Dependencies

  • New required: none
  • New optional: none (reuse existing httpx, tiktoken)

Performance

  • Import time: < 25ms budget (current: 24.1ms after Claude/issue 1471 20260420 0633 #1479)
  • Hot-path: interrupt check = 1 Event.is_set() per iteration (nanoseconds)
  • All other features only active when triggered

Backward Compatibility

  • All changes are opt-in or behind existing feature flags
  • Existing APIs unchanged
  • PersonaStore and setup command preserved as-is

Files to Create / Modify (Revised)

New files (minimal)

File Purpose LOC
praisonaiagents/agent/interrupt.py Interrupt implementation ~30
praisonai/tools/result_escape.py Large result storage (wrapper) ~80
praisonaiagents/session/title.py Title generator ~60
tests/unit/agent/test_interrupt.py TDD ~50
tests/unit/session/test_title.py TDD ~40

Modified files

File Change LOC
praisonaiagents/agent/agent.py Wire interrupt protocol +10
praisonaiagents/tools/call_executor.py Path overlap guard +40
praisonaiagents/llm/llm.py Context overflow detection +20
praisonaiagents/session/hierarchy.py Auto-title hook +15

Total: ~345 LOC (vs original 2200 LOC estimate)


References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclaudeAuto-trigger Claude analysisdocumentationImprovements or additions to documentationperformancesecurity

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions