|
| 1 | +# Baseline Audit — specsmith |
| 2 | + |
| 3 | +> Generated: 2026-04-20 (Phase 0 — AG2 Realignment) |
| 4 | +
|
| 5 | +## 1. Architecture Map |
| 6 | + |
| 7 | +### Entrypoints |
| 8 | + |
| 9 | +| Entrypoint | Module | Description | |
| 10 | +|---|---|---| |
| 11 | +| `specsmith` CLI | `cli.py` → Click `_AutoUpdateGroup` | 50+ commands. Auto-checks spec_version and PyPI updates on invocation. | |
| 12 | +| `specsmith run` | `agent/runner.py` → `AgentRunner` REPL | Agent loop: system prompt → provider → tool dispatch → hooks. Supports `/help`, `/tools`, `/model`, `/status`, `/save`, `/clear`. | |
| 13 | +| `specsmith gui` | `gui/app.py` → `launch()` | PySide6 (Qt6) desktop app. `GUIAgentRunner(AgentRunner)` overrides print/provider/tool methods to emit Qt signals. `AgentWorker(QThread)` runs off UI thread. | |
| 14 | +| VS Code extension | `extension.ts` → `activate()` | Activation event: `onStartupFinished`. 14 TypeScript source files. 30+ contributed commands. | |
| 15 | + |
| 16 | +### Service Boundaries |
| 17 | + |
| 18 | +``` |
| 19 | +CLI Layer (cli.py) |
| 20 | +├── scaffolder.py — Jinja2 template render → project files |
| 21 | +├── auditor.py — health checks (file existence, REQ↔TEST, ledger) |
| 22 | +├── exporter.py — compliance reports, REQ coverage matrix |
| 23 | +├── importer.py — detect language/build/test → generate overlay |
| 24 | +├── config.py — Pydantic model for scaffold.yml (33 project types) |
| 25 | +├── differ.py — governance file drift detection |
| 26 | +├── doctor.py — environment diagnostic |
| 27 | +├── phase.py — project lifecycle phase management |
| 28 | +├── compressor.py — LEDGER.md archival |
| 29 | +├── ledger.py — CryptoAuditChain (SHA-256 append-only) |
| 30 | +├── retrieval.py — keyword scoring index (term-frequency, not BM25) |
| 31 | +├── profiles.py — execution profiles |
| 32 | +├── credit_analyzer.py — LLM credit spend analysis |
| 33 | +└── credits.py — rate limit profiles |
| 34 | +
|
| 35 | +Agent Layer (agent/) |
| 36 | +├── runner.py — REPL loop, tool execution, streaming, session state |
| 37 | +├── core.py — Message, Tool, CompletionResponse, ModelTier, BaseProvider |
| 38 | +├── tools.py — 20 tool handlers (all use _run_specsmith → subprocess) |
| 39 | +├── hooks.py — HookRegistry: Pre/PostTool, SessionStart, SessionEnd, H13 |
| 40 | +├── skills.py — SKILL.md loader with domain priority |
| 41 | +├── optimizer.py — TokenEstimator, ResponseCache, ContextManager, ModelRouter, ToolFilter |
| 42 | +└── providers/ |
| 43 | + ├── anthropic.py — Claude (SDK: anthropic>=0.56) |
| 44 | + ├── openai.py — GPT (SDK: openai>=1.0, also used for Mistral via base_url) |
| 45 | + ├── gemini.py — Gemini (SDK: google-genai>=1.0, fallback google-generativeai) |
| 46 | + ├── ollama.py — Ollama v0.3+ (stdlib urllib, /api/chat, tool calling, streaming) |
| 47 | + └── mistral.py — Mistral via openai SDK pointed at api.mistral.ai |
| 48 | +
|
| 49 | +Epistemic Layer (epistemic/ + specsmith/epistemic/) |
| 50 | +├── belief.py — BeliefArtifact dataclass |
| 51 | +├── stress_tester.py — 8 adversarial challenges, Logic Knot detection |
| 52 | +├── failure_graph.py — FailureModeGraph, equilibrium_check, Mermaid render |
| 53 | +├── recovery.py — RecoveryOperator, bounded proposals |
| 54 | +├── certainty.py — CertaintyEngine, weakest-link propagation |
| 55 | +├── session.py — AEESession facade |
| 56 | +└── trace.py — TraceVault SHA-256 append-only chain |
| 57 | +
|
| 58 | +GUI Layer (gui/) |
| 59 | +├── app.py — QApplication bootstrap, dark AEE theme |
| 60 | +├── main_window.py — QTabWidget, status bar, menu bar |
| 61 | +├── session_tab.py — per-tab: chat + input + meter + tool panel + provider bar |
| 62 | +├── worker.py — GUIAgentRunner + AgentWorker(QThread) |
| 63 | +└── widgets/ — chat_view, input_bar, provider_bar, token_meter, tool_panel, update_checker |
| 64 | +``` |
| 65 | + |
| 66 | +### VS Code Plugin Structure |
| 67 | + |
| 68 | +``` |
| 69 | +specsmith-vscode/src/ |
| 70 | +├── extension.ts — activate(): tree views, commands, startup checks |
| 71 | +├── bridge.ts — SpecsmithBridge: child process (specsmith run --json-events), JSONL protocol |
| 72 | +├── SessionPanel.ts — webview: agent chat, auto-approve, model/provider switching |
| 73 | +├── GovernancePanel.ts — webview: 6-tab settings (General, Models, Execution, Tools, Agents, Help) |
| 74 | +├── SettingsPanel.ts — webview: global extension settings |
| 75 | +├── HelpPanel.ts — webview: help/docs |
| 76 | +├── OllamaManager.ts — Ollama model management (list, pull, delete, GPU detection) |
| 77 | +├── ModelRegistry.ts — fetch available models per provider |
| 78 | +├── ApiKeyManager.ts — secret storage for LLM API keys |
| 79 | +├── VenvManager.ts — Python venv detection/management |
| 80 | +├── ProjectTree.ts — sidebar tree: project folders + file operations |
| 81 | +├── EpistemicBar.ts — status bar: epistemic health indicator |
| 82 | +├── BugReporter.ts — interactive bug report filing |
| 83 | +└── types.ts — SpecsmithEvent, SessionConfig, SessionStatus types |
| 84 | +``` |
| 85 | + |
| 86 | +**Bridge protocol:** `SpecsmithBridge` spawns `specsmith run --json-events` as a child process. Communication is stdin (user messages, one per line) / stdout (JSONL events: `ready`, `llm_chunk`, `tool_started`, `tool_finished`, `tokens`, `turn_done`, `error`, `system`). Turn timeout: 5 minutes. |
| 87 | + |
| 88 | +**Activation:** `onStartupFinished`. On activate: apply venv path, create tree views, register 30+ commands, startup checks (privacy notice, fetch models, update check, venv check, auto-open governance panel). |
| 89 | + |
| 90 | +**No integration tests exist** for the VS Code extension. |
| 91 | + |
| 92 | +### Model/Backend Assumptions per Provider |
| 93 | + |
| 94 | +- **Anthropic:** SDK `anthropic>=0.56`. Streaming via SDK. Tool calling native. |
| 95 | +- **OpenAI:** SDK `openai>=1.0`. Also serves Mistral (base_url override). Tool calling native. |
| 96 | +- **Gemini:** SDK `google-genai>=1.0` (preferred) or `google-generativeai` (fallback). Auto-detects. |
| 97 | +- **Ollama:** Stdlib only (`urllib.request`). `/api/chat` for all completions. Tool calling v0.3+. `num_ctx` via `SPECSMITH_OLLAMA_NUM_CTX` (default 4096). `keep_alive=-1` to prevent model unload. Think parameter for reasoning models. |
| 98 | +- **Mistral:** Uses OpenAI SDK pointed at `api.mistral.ai`. |
| 99 | + |
| 100 | +All providers are optional extras — specsmith core has zero LLM SDK dependencies. |
| 101 | + |
| 102 | +## 2. Verification Results (2026-04-20) |
| 103 | + |
| 104 | +### pytest (226 collected) |
| 105 | + |
| 106 | +- **Passed:** 208 |
| 107 | +- **Failed:** 18 |
| 108 | +- **Skipped:** 0 |
| 109 | + |
| 110 | +**Failing tests (all sandbox/lifecycle + 1 scaffolder):** |
| 111 | + |
| 112 | +| Test | Category | |
| 113 | +|---|---| |
| 114 | +| `test_sandbox_import::test_full_import_workflow` | sandbox import | |
| 115 | +| `test_sandbox_import::test_import_force_overwrites` | sandbox import | |
| 116 | +| `test_sandbox_import::test_import_idempotent_restart` | sandbox import | |
| 117 | +| `test_sandbox_import::test_import_preserves_existing_project_docs` | sandbox import | |
| 118 | +| `test_sandbox_import::test_import_force_overwrites_existing_docs` | sandbox import | |
| 119 | +| `test_sandbox_lifecycle_import::test_import_sets_inception_phase` | lifecycle import | |
| 120 | +| `test_sandbox_lifecycle_import::test_import_creates_governance_files` | lifecycle import | |
| 121 | +| `test_sandbox_lifecycle_import::test_import_then_phase_operations` | lifecycle import | |
| 122 | +| `test_sandbox_lifecycle_import::test_import_audit_includes_phase_readiness` | lifecycle import | |
| 123 | +| `test_sandbox_lifecycle_new::test_full_lifecycle_phases` | lifecycle new | |
| 124 | +| `test_sandbox_lifecycle_new::test_phase_gating_without_force` | lifecycle new | |
| 125 | +| `test_sandbox_lifecycle_new::test_governance_files_present` | lifecycle new | |
| 126 | +| `test_sandbox_lifecycle_upgrade::test_upgrade_migrates_workflow_to_session_protocol` | lifecycle upgrade | |
| 127 | +| `test_sandbox_lifecycle_upgrade::test_upgrade_preserves_workflow_content` | lifecycle upgrade | |
| 128 | +| `test_sandbox_lifecycle_upgrade::test_upgrade_then_audit_runs` | lifecycle upgrade | |
| 129 | +| `test_sandbox_lifecycle_upgrade::test_upgrade_idempotent` | lifecycle upgrade | |
| 130 | +| `test_sandbox_new::test_full_scaffold_workflow` | sandbox new | |
| 131 | +| `test_scaffolder::test_creates_expected_files` | scaffolder | |
| 132 | + |
| 133 | +**Root cause:** Likely governance template drift — scaffolder output changed but sandbox test expectations weren't updated. |
| 134 | + |
| 135 | +**Platform issue:** pytest cleanup crashes with `WinError 448` (untrusted mount point in temp dir). Does not affect test results. |
| 136 | + |
| 137 | +### ruff (lint) |
| 138 | + |
| 139 | +All checks passed. Zero issues. |
| 140 | + |
| 141 | +### mypy (typecheck) |
| 142 | + |
| 143 | +Success: 0 errors across 72 source files. One note: unused `keyring.*` override in pyproject.toml. |
| 144 | + |
| 145 | +## 3. Untested Modules |
| 146 | + |
| 147 | +**Critical (agent layer — zero test coverage):** |
| 148 | +- `agent/runner.py` — REPL loop, tool execution, streaming, session state, meta-commands |
| 149 | +- `agent/tools.py` — 20 tool handlers (all route through `_run_specsmith` subprocess wrapper) |
| 150 | +- `agent/hooks.py` — HookRegistry, trigger dispatch, H13 check |
| 151 | +- `agent/skills.py` — SKILL.md loading, domain priority |
| 152 | +- `agent/providers/anthropic.py` — Claude provider |
| 153 | +- `agent/providers/openai.py` — GPT/Mistral provider |
| 154 | +- `agent/providers/gemini.py` — Gemini provider |
| 155 | +- `agent/providers/ollama.py` — Ollama provider (tool calling, streaming, think parameter) |
| 156 | +- `commands/__init__.py` — empty stub, no slash commands implemented |
| 157 | + |
| 158 | +**Secondary (supporting modules):** |
| 159 | +- `architect.py`, `auth.py`, `credit_analyzer.py`, `credits.py`, `doctor.py` |
| 160 | +- `ledger.py`, `ollama_cmds.py`, `patent.py`, `phase.py`, `plugins.py` |
| 161 | +- `profiles.py`, `releaser.py`, `retrieval.py`, `session.py` |
| 162 | + |
| 163 | +**Excluded from mypy strict:** |
| 164 | +- `gui/` (requires PySide6) |
| 165 | +- `ollama_cmds`, `languages`, `phase`, `cli`, `importer`, `agent.providers.gemini`, `agent.runner`, `profiles`, `toolrules`, `tool_installer` |
| 166 | + |
| 167 | +**VS Code plugin:** Zero integration tests. No test runner configured. |
| 168 | + |
| 169 | +## 4. Known Breakpoints |
| 170 | + |
| 171 | +1. **18 sandbox/lifecycle test failures** — governance template expectations are stale. Severity: medium (blocks CI green). |
| 172 | +2. **Tool handlers use raw subprocess** — `_run_specsmith()` in `tools.py` shells out to `python -m specsmith <args>`. No structured error handling, no cross-platform abstraction, no typed results. |
| 173 | +3. **`commands/__init__.py` is empty** — slash commands documented in AGENTS.md and ARCHITECTURE.md are not implemented. |
| 174 | +4. **No agent/runner tests** — the entire REPL loop, tool dispatch, streaming, and session state management is untested. |
| 175 | +5. **No provider tests** — all 5 LLM providers have zero unit tests. |
| 176 | +6. **No VS Code extension tests** — plugin activation, bridge protocol, panel rendering are all untested. |
| 177 | +7. **Retrieval uses term-frequency** — not BM25 as documented in requirements. |
| 178 | +8. **pytest WinError 448** — temp directory cleanup fails on Windows. Cosmetic but noisy. |
| 179 | + |
| 180 | +## 5. Gap Summary (ranked by severity) |
| 181 | + |
| 182 | +1. **No agent layer tests** — runner, tools, hooks, skills, providers all untested → high risk for AG2 integration |
| 183 | +2. **18 failing sandbox tests** — CI is red → blocks safe development |
| 184 | +3. **Empty commands/** — REPL meta-commands not wired → blocks slash command surface |
| 185 | +4. **Tool handlers = raw subprocess** — no typed operations → AG2 tools must replace this |
| 186 | +5. **No VS Code extension tests** — plugin correctness is assumed, not proven |
| 187 | +6. **No AG2 integration** — the entire agent orchestration layer is missing |
| 188 | +7. **No eval harness** — cannot measure agent quality |
| 189 | +8. **No instinct/memory** — no cross-session learning |
| 190 | +9. **No feature flags** — no way to gate capabilities |
| 191 | +10. **No server daemon** — no WebSocket path for IDE integration |
0 commit comments