Feature: Round-2 Gap Closure — Interrupts, Large Tool-Result Storage, Parallel-Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto-Gen, Dialectic User Model, Credential Pool, Reasoning-Block Handling, Safe Skill Installs

# Feature: Round‑2 Gap Closure — Interrupts, Large Tool‑Result Storage, Parallel‑Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto‑Gen, Dialectic User Model, Credential Pool, Reasoning‑Block Handling, Safe Skill Installs

## Overview

Follow‑up to #1471 (merged as #1472). After shipping skill mutation, nudges, and the improvements extraction fix, a second end‑to‑end walk — **from first‑run onboarding → agent runtime → multi‑agent orchestration → self‑improvement persistence** — surfaced **11 additional gaps** against a best‑in‑class self‑improving agent. All are additive, safe‑by‑default, and behind feature flags.

**NOTE**: This issue has been **audited and corrected** based on findings in #1490. Several gaps identified below already exist in the codebase.

Implementable by another agent from this issue alone. Each phase is independently mergeable.

---

## Background — Why a round 2

PR #1472 closed 3 of the 8 acceptance criteria in #1471 (skill CRUD protocol + tool, nudge mechanism, `LearnConfig.improvements` auto‑extraction). A full re‑walk of the agent lifecycle confirmed the remaining architectural gaps are outside the scope of that PR and warrant their own tracked ticket so they can be phased in without bloating one PR.

### Evidence — confirmed gaps as of commit `d25e852c` on `main` (Corrected per #1490)

All greps executed against `praisonaiagents` + `praisonai` packages (excluding `venv/` and `tests/`).

| # | Capability | Grep / probe | Result | Status |
|---|---|---|---|---|
| ❌ G1 | Interactive first‑run wizard (`praisonai init`) | `ls praisonai/cli/commands/*.py` | **EXISTS**: `setup.py` (173 L) provides wizard functionality | **Won't Fix** |
| G2 | Interrupt / cancel mid‑run | `grep "set_interrupt\|is_interrupted\|interrupt_requested" praisonaiagents praisonai` | **0 hits** | **Keep** |
| G3 | Auto‑persist large tool results → reference by ID | `grep "maybe_persist_tool_result\|tool_result_storage" praisonaiagents praisonai` | **0 hits** | **Keep (Narrow Scope)** |
| G4 | Parallel tool path‑overlap detection | `grep "_paths_overlap\|overlap_detect" praisonaiagents praisonai` | **0 hits**; `call_executor.py` runs tools concurrently with **no write‑conflict guard** | **Keep** |
| G5 | Multi‑category error classifier | `llm/llm.py:628 _is_rate_limit_error` exists | Only rate‑limit detected; no `context_limit` / `auth` / `transient` / `permanent` categories | **Keep (Narrow Scope)** |
| ❌ G6 | Surrogate / non‑ASCII sanitisation of outgoing messages | `grep "sanitize_surrogate\|surrogatepass\|_strip_non_ascii"` | **0 hits** | **Won't Fix** |
| ❌ G7 | Shared iteration budget across parent + subagents | `grep -r "IterationBudget\|iteration_budget" praisonaiagents`; `autonomy.py:131 max_iterations: int = 20` is **per‑agent** | Each subagent gets its own budget — total work across a tree can exceed the user's intended cap | **Won't Fix** |
| G8 | Session title auto‑generation from first exchange | `grep "generate_title\|auto_title"` | `session/hierarchy.py` has a `title` field but **no generator** | **Keep** |
| G9 | Reasoning‑block (`<think>…</think>` / provider thinking blocks) handling | `grep "scratchpad\|thinking_blocks\|<think>"` | **0 hits** — reasoning models' internal tokens leak or break caching | **Keep (Narrow Scope)** |
| ❌ G10 | Dialectic user model (structured, evolving) | `LearnConfig.persona=True` stores flat strings in `PersonaStore` | **EXISTS**: `PersonaStore` already supports structured categorization with `add_preference()`, `add_profile()` methods | **Won't Fix** |
| ❌ G11 | Credential pool / multi‑API‑key rotation | `grep "credential_pool\|CredentialPool\|key_rotation\|OPENAI_API_KEY_POOL"` | **0 hits**; only provider‑level failover in `llm/failover.py` | **Won't Fix** |
| G12 | Skill supply‑chain scan (OSV / content hash on install) | `grep "osv.dev\|OSV"` | **0 hits**; `SkillMutatorProtocol` writes files but no OSV check on installed scripts/references | **Keep (Gated)** |

## Audit Results (From #1490)

The following gaps have been **reassessed and corrected**:

### ❌ Won't Fix Items (Already Exist or Low Value)

- **G1 (init wizard)**: `praisonai setup` command already provides 173-line wizard with provider selection, API key management, non-interactive mode
- **G6 (surrogate sanitization)**: No real-world user reports; most users hit `litellm` which handles unicode properly
- **G7 (shared iteration budget)**: Low real-world impact; users can set explicit limits on delegator agents
- **G10 (dialectic user model)**: `PersonaStore` already supports structured categorization via `add_preference(category)` and `add_profile(aspect)` methods
- **G11 (credential pool)**: `litellm` already provides provider-level failover; niche feature that doubles key management complexity

### ✅ Keep Items (With Reduced Scope)

- **G2 (interrupt)**: Critical safety feature - reduce scope to ~50 LOC by implementing existing protocol stub
- **G3 (large tool results)**: Keep but integrate with existing truncation infrastructure (~150 LOC in wrapper only)
- **G4 (path overlap)**: Keep as-is (~100 LOC) - genuine data-loss prevention
- **G5 (error classifier)**: Reduce scope to context-overflow detection only (~30 LOC)
- **G8 (session title)**: Keep as-is (~100 LOC) - real UX improvement
- **G9 (thinking blocks)**: Reduce scope to render-path stripping (~15 LOC)
- **G12 (OSV scan)**: Keep but gate on Skills Hub Phase 6 completion

### Revised LOC Estimate

| Original Estimate | Revised Estimate | Reduction |
|---|---|---|
| ~2200 LOC | ~595 LOC | -73% |
| Core SDK: +1200 LOC | Core SDK: ≤150 LOC | Protocol-first preserved |

---

## What PraisonAI already has (DRY — reuse, don't duplicate)

- `praisonaiagents/llm/failover.py` — `LLMFailover` + `_is_rate_limit_error` → extend with context-overflow classifier
- `praisonaiagents/llm/protocols.py` — `LLMRateLimiterProtocol` → reuse for rate tracking
- `praisonaiagents/tools/call_executor.py` — `create_tool_call_executor(parallel=True)` → extend with overlap detection
- `praisonaiagents/agent/protocols.py:406` — `interrupt()` protocol stub exists → implement
- `praisonaiagents/session/hierarchy.py` — session `title` field present → add generator
- `praisonaiagents/context/{store,aggregator,instrumentation}.py` — existing truncation infrastructure → extend
- `praisonai/cli/commands/setup.py` — existing 173-line setup wizard → sufficient for G1
- `praisonaiagents/memory/learn/stores.py:208` — `PersonaStore` with categorization → sufficient for G10

---

## Architecture Analysis

### Current agent turn (simplified)

```
Agent.start(prompt)
  └─► execution_mixin._run_loop()
        └─► for i in range(max_iterations):
              ├─► llm.get_response(messages, tools)
              ├─► call_executor.execute(tool_calls)   ◄── no overlap check
              ├─► append tool results to messages     ◄── full result inline
              └─► if no more tool_calls: break
        └─► _trigger_after_agent_hook()
              ├─► _process_auto_memory
              ├─► _process_auto_learning
              └─► _maybe_emit_nudge                   (merged in #1472)
```

### Target agent turn (this issue)

```
Agent.start(prompt)
  ├─► if not session.title: schedule title_gen         [G8]
  │
  └─► execution_mixin._run_loop()
        └─► while not max_iterations_reached:
              ├─► if interrupt_requested: break        [G2]
              ├─► llm.get_response(...)
              │     └─► on_error → check context_overflow [G5]
              │           → trigger compressor → retry
              ├─► call_executor.execute(
              │        tool_calls,
              │        overlap_guard=True              [G4]
              │     )
              ├─► for r in tool_results:
              │     if len(r) > threshold:
              │        ref = tool_result_store.put(r)  [G3]
              │        replace inline content with ref
              │        strip <think>...</think>        [G9]
              └─► if no more tool_calls: break

  └─► post‑turn:
        ├─► _process_auto_learning 
        ├─► _maybe_emit_nudge
        └─► trajectory.record_turn
```

---

## Revised Implementation Plan

### G2 — Interrupt (50 LOC)

Wire existing `agent/protocols.py:406` interrupt stub:
- Add `Agent.interrupt()` → sets Event flag  
- Check flag once per iteration in `_run_loop`
- Propagate to subagents via delegator

### G3 — Large Tool Result Storage (150 LOC, wrapper only)

Extend existing truncation:
- Optional retrieval escape hatch when tool output truncated
- Store full content to session file, expose `load_truncated(ref)` tool
- Reuse `context/store.py` and `context/aggregator.py`

### G4 — Path Overlap Guard (100 LOC)

Add to `call_executor.py`:
- Detect concurrent writes to same path
- Fall back to sequential execution on conflict

### G5 — Error Classifier (30 LOC)  

Context-overflow detection only:
- `is_context_overflow(err) -> bool` helper
- Wire to existing `context/compressor.py` → retry

### G8 — Session Title Auto-gen (100 LOC)

Add to existing session infrastructure:
- `generate_title()` from first user+assistant exchange
- Lazy, failure-silent

### G9 — Thinking Block Handling (15 LOC)

Render-path stripping only:
- Strip `<think>...</think>` in output formatter
- No new module needed

### G12 — OSV Scan (150 LOC, gated)

Only after Skills Hub Phase 6:
- OSV.dev integration for skill security scanning

---

## Acceptance Criteria (Revised)

### G2 — interrupt
- [ ] `agent.interrupt()` stops mid‑run within ≤1 iteration
- [ ] Partial response returned; hooks still fire
- [ ] Works under `asyncio` and threading

### G3 — tool result storage  
- [ ] Tool result > 8KB → replaced with JSON ref containing preview
- [ ] Agent can call `load_truncated(ref)` for full content
- [ ] Integrates with existing `context/aggregator.py` truncation

### G4 — path overlap guard
- [ ] Concurrent file writes → sequential fallback automatically
- [ ] No false positives on different files
- [ ] Performance impact only when overlap detected

### G5 — context overflow classifier
- [ ] Context limit errors → auto-compress → retry once
- [ ] Other errors pass through unchanged
- [ ] Reuses existing compressor infrastructure

### G8 — session title
- [ ] First user+assistant exchange → auto-generates 3-6 word title
- [ ] Failure-silent fallback to truncated user message
- [ ] Lazy - only when `session.title` is empty

### G9 — thinking blocks
- [ ] `<think>...</think>` content stripped from user-visible output
- [ ] Thinking content preserved for observability if needed
- [ ] Minimal render-path change

### G12 — OSV scan
- [ ] Gated behind Skills Hub Phase 6 completion
- [ ] CRITICAL vulns → user approval prompt
- [ ] Graceful offline fallback

---

## Technical Considerations

### Dependencies
- **New required:** none
- **New optional:** none (reuse existing `httpx`, `tiktoken`)

### Performance  
- Import time: **< 25ms** budget (current: 24.1ms after #1479)
- Hot-path: interrupt check = 1 `Event.is_set()` per iteration (nanoseconds)
- All other features only active when triggered

### Backward Compatibility
- All changes are opt-in or behind existing feature flags
- Existing APIs unchanged
- `PersonaStore` and `setup` command preserved as-is

---

## Files to Create / Modify (Revised)

### New files (minimal)

| File | Purpose | LOC |
|---|---|---|
| `praisonaiagents/agent/interrupt.py` | Interrupt implementation | ~30 |
| `praisonai/tools/result_escape.py` | Large result storage (wrapper) | ~80 |
| `praisonaiagents/session/title.py` | Title generator | ~60 |
| `tests/unit/agent/test_interrupt.py` | TDD | ~50 |
| `tests/unit/session/test_title.py` | TDD | ~40 |

### Modified files

| File | Change | LOC |
|---|---|---|
| `praisonaiagents/agent/agent.py` | Wire interrupt protocol | +10 |
| `praisonaiagents/tools/call_executor.py` | Path overlap guard | +40 |
| `praisonaiagents/llm/llm.py` | Context overflow detection | +20 |
| `praisonaiagents/session/hierarchy.py` | Auto-title hook | +15 |

**Total: ~345 LOC** (vs original 2200 LOC estimate)

---

## References

- PR #1479 — push/gateway ship (identifies the bloat source)  
- Issue #1482 — refactor push out of core SDK
- Issue #1490 — comprehensive audit of this issue
- `praisonaiagents/agent/protocols.py:406` — existing `interrupt()` protocol stub
- `praisonai/cli/commands/setup.py` — existing 173‑line setup wizard  
- `praisonaiagents/memory/learn/stores.py:208` — existing `PersonaStore`
- `praisonaiagents/context/{store,aggregator,instrumentation}.py` — existing truncation infrastructure

#	Capability	Grep / probe	Result	Status
❌ G1	Interactive first‑run wizard (`praisonai init`)	`ls praisonai/cli/commands/*.py`	EXISTS: `setup.py` (173 L) provides wizard functionality	Won't Fix
G2	Interrupt / cancel mid‑run	`grep "set_interrupt\|is_interrupted\|interrupt_requested" praisonaiagents praisonai`	0 hits	Keep
G3	Auto‑persist large tool results → reference by ID	`grep "maybe_persist_tool_result\|tool_result_storage" praisonaiagents praisonai`	0 hits	Keep (Narrow Scope)
G4	Parallel tool path‑overlap detection	`grep "_paths_overlap\|overlap_detect" praisonaiagents praisonai`	0 hits; `call_executor.py` runs tools concurrently with no write‑conflict guard	Keep
G5	Multi‑category error classifier	`llm/llm.py:628 _is_rate_limit_error` exists	Only rate‑limit detected; no `context_limit` / `auth` / `transient` / `permanent` categories	Keep (Narrow Scope)
❌ G6	Surrogate / non‑ASCII sanitisation of outgoing messages	`grep "sanitize_surrogate\|surrogatepass\|_strip_non_ascii"`	0 hits	Won't Fix
❌ G7	Shared iteration budget across parent + subagents	`grep -r "IterationBudget\|iteration_budget" praisonaiagents`; `autonomy.py:131 max_iterations: int = 20` is per‑agent	Each subagent gets its own budget — total work across a tree can exceed the user's intended cap	Won't Fix
G8	Session title auto‑generation from first exchange	`grep "generate_title\|auto_title"`	`session/hierarchy.py` has a `title` field but no generator	Keep
G9	Reasoning‑block (`<think>…</think>` / provider thinking blocks) handling	`grep "scratchpad\|thinking_blocks\|<think>"`	0 hits — reasoning models' internal tokens leak or break caching	Keep (Narrow Scope)
❌ G10	Dialectic user model (structured, evolving)	`LearnConfig.persona=True` stores flat strings in `PersonaStore`	EXISTS: `PersonaStore` already supports structured categorization with `add_preference()`, `add_profile()` methods	Won't Fix
❌ G11	Credential pool / multi‑API‑key rotation	`grep "credential_pool\|CredentialPool\|key_rotation\|OPENAI_API_KEY_POOL"`	0 hits; only provider‑level failover in `llm/failover.py`	Won't Fix
G12	Skill supply‑chain scan (OSV / content hash on install)	`grep "osv.dev\|OSV"`	0 hits; `SkillMutatorProtocol` writes files but no OSV check on installed scripts/references	Keep (Gated)

Original Estimate	Revised Estimate	Reduction
~2200 LOC	~595 LOC	-73%
Core SDK: +1200 LOC	Core SDK: ≤150 LOC	Protocol-first preserved

File	Purpose	LOC
`praisonaiagents/agent/interrupt.py`	Interrupt implementation	~30
`praisonai/tools/result_escape.py`	Large result storage (wrapper)	~80
`praisonaiagents/session/title.py`	Title generator	~60
`tests/unit/agent/test_interrupt.py`	TDD	~50
`tests/unit/session/test_title.py`	TDD	~40

File	Change	LOC
`praisonaiagents/agent/agent.py`	Wire interrupt protocol	+10
`praisonaiagents/tools/call_executor.py`	Path overlap guard	+40
`praisonaiagents/llm/llm.py`	Context overflow detection	+20
`praisonaiagents/session/hierarchy.py`	Auto-title hook	+15

Uh oh!

Feature: Round-2 Gap Closure — Interrupts, Large Tool-Result Storage, Parallel-Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto-Gen, Dialectic User Model, Credential Pool, Reasoning-Block Handling, Safe Skill Installs #1480

Description

Feature: Round‑2 Gap Closure — Interrupts, Large Tool‑Result Storage, Parallel‑Tool Path Overlap, Error Classifier, Shared Subagent Budget, Title Auto‑Gen, Dialectic User Model, Credential Pool, Reasoning‑Block Handling, Safe Skill Installs

Overview

Background — Why a round 2

Evidence — confirmed gaps as of commit d25e852c on main (Corrected per #1490)

Audit Results (From #1490)

❌ Won't Fix Items (Already Exist or Low Value)

✅ Keep Items (With Reduced Scope)

Revised LOC Estimate

What PraisonAI already has (DRY — reuse, don't duplicate)

Architecture Analysis

Current agent turn (simplified)

Target agent turn (this issue)

Revised Implementation Plan

G2 — Interrupt (50 LOC)

G3 — Large Tool Result Storage (150 LOC, wrapper only)

G4 — Path Overlap Guard (100 LOC)

G5 — Error Classifier (30 LOC)

G8 — Session Title Auto-gen (100 LOC)

G9 — Thinking Block Handling (15 LOC)

G12 — OSV Scan (150 LOC, gated)

Acceptance Criteria (Revised)

G2 — interrupt

G3 — tool result storage

G4 — path overlap guard

G5 — context overflow classifier

G8 — session title

G9 — thinking blocks

G12 — OSV scan

Technical Considerations

Dependencies

Performance

Backward Compatibility

Files to Create / Modify (Revised)

New files (minimal)

Modified files

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Evidence — confirmed gaps as of commit `d25e852c` on `main` (Corrected per #1490)