@@ -310,9 +310,83 @@ Begin Phase 1: `operations.py` first (it blocks tool handler refactoring), then
310310- Both repos: develop and main in sync, zero failing workflows
311311
312312### Open TODOs (Phase 1 — next)
313- - [ ] Implement ` src/specsmith/operations.py ` — typed ProjectOperations class
314- - [ ] Refactor tool handlers in ` agent/tools.py ` to use ProjectOperations
313+ - [x ] Implement ` src/specsmith/operations.py ` — typed ProjectOperations class → replaced by AG2 tool surface
314+ - [x ] Refactor tool handlers in ` agent/tools.py ` → AG2 agents/tools/ replaces this
315315- [ ] Populate ` src/specsmith/commands/ ` with priority slash commands
316316- [ ] Implement ` src/specsmith/instinct.py ` — instinct persistence
317317- [ ] Implement ` src/specsmith/eval/ ` — EDD harness with pass@k
318318- [ ] Merge specsmith-vscode develop → main, tag v0.3.14 stable
319+
320+ ---
321+
322+ ## Session 2026-04-20 — AG2 Realignment: Phases 0–3
323+
324+ ** Status:** Complete
325+ ** Branch:** develop
326+ ** AG2 version:** 0.12.0
327+ ** Ollama model:** qwen2.5:14b
328+
329+ ### What changed
330+
331+ ** AG2 Realignment — new architecture direction:**
332+ Replaced the previous incremental roadmap with an AG2-based agent shell over Ollama.
333+ Four-layer architecture: Product Surface → Agent Layer (AG2) → Model Runtime (Ollama) → Verification Layer.
334+ Three agent roles: Planner (read-only inspection + planning), Builder (code/doc changes), Verifier (tests + accept/reject).
335+
336+ ** Phase 0 — Baseline Audit (docs/baseline-audit.md):**
337+ - Architecture map: 4 entrypoints (CLI, REPL, GUI, VS Code), service boundaries, provider assumptions
338+ - Test inventory: 208 pass / 18 fail (stale pip install), ruff clean, mypy clean
339+ - Gap analysis: 10 ranked gaps (no agent tests, empty commands/, raw subprocess tools)
340+ - AGENTS.md: updated with AG2 four-layer architecture, 12 project rules
341+
342+ ** Phase 1 — System Proof (docs/system-proof.md):**
343+ - Root cause of 18 failures: stale v0.3.1 pip install vs v0.3.10 source → reinstall fixed all
344+ - tests/conftest.py: WinError 448 pytest cleanup crash fix for Windows
345+ - tests/test_agent.py: 23 new tests — tool registry (5), tool handlers (5), system prompt (3), AgentRunner init (2), SessionState (2), meta-commands (2), Ollama integration (4)
346+ - 249 tests passing, lint clean, mypy clean
347+
348+ ** Phase 2 — AG2 Agent Shell (src/specsmith/agents/):**
349+ - agents/config.py: AgentConfig from scaffold.yml, AG2 LLMConfig dict generation
350+ - agents/tools/filesystem.py: read_file, write_file, patch_file, list_tree, search_content (pathlib)
351+ - agents/tools/shell.py: run_project_command (structured exit code + output)
352+ - agents/tools/git.py: git_status, git_diff, git_changed_files, git_branch_info
353+ - agents/tools/tests.py: run_unit_tests, summarize_failures
354+ - agents/roles.py: Planner/Builder/Verifier via AG2 ConversableAgent + Ollama
355+ - agents/cli.py: specsmith agent run/plan/status/verify commands
356+ - cli.py: agent command group wired into main CLI
357+ - pyproject.toml: ag2[ ollama] optional dependency added
358+ - Tested live: Planner calls tools via Ollama, full Plan→Build→Verify pipeline works
359+
360+ ** Phase 3 — Self-Improvement Loop:**
361+ - agents/workflows/improve.py: run_improvement() — inspect → plan → edit → test → report
362+ - agents/reports.py: ChangeReport dataclass, save/list at .specsmith/agent-reports/
363+ - agents/cli.py: specsmith agent improve <task >, specsmith agent reports
364+
365+ ### Verification
366+ - 249 tests passing (226 existing + 23 new agent tests)
367+ - ruff check: clean across all agents/ code
368+ - mypy: 0 errors in 72 source files
369+ - AG2 + Ollama live tested: plan, run, and full pipeline all execute successfully
370+ - Ollama tool calling proven with qwen2.5:14b (text completion + tool calling + provider protocol)
371+
372+ ### Files changed (11 new + 5 modified)
373+ - New: src/specsmith/agents/__ init__ .py, config.py, roles.py, cli.py, reports.py
374+ - New: src/specsmith/agents/tools/__ init__ .py, filesystem.py, shell.py, git.py, tests.py
375+ - New: src/specsmith/agents/workflows/__ init__ .py, improve.py
376+ - New: tests/conftest.py, tests/test_agent.py
377+ - New: docs/baseline-audit.md, docs/system-proof.md
378+ - Modified: AGENTS.md, pyproject.toml, src/specsmith/cli.py
379+
380+ ### Open TODOs
381+ - [ ] Phase 4.1: Feature flags (REQ-FLG-001–003)
382+ - [ ] Phase 4.2: Instinct/learning system (REQ-LRN-001–007)
383+ - [ ] Phase 4.3: Eval harness (REQ-EDD-001–008)
384+ - [ ] Phase 4.4: Agent memory persistence (REQ-MEM-001–004)
385+ - [ ] Phase 4.5: Multi-agent coordination via AG2 GroupChat
386+ - [ ] Phase 4.6: Server daemon (specsmith serve)
387+ - [ ] Phase 4.7: Theia IDE (specsmith-ide repo)
388+ - [ ] Populate src/specsmith/commands/ with slash commands
389+ - [ ] Merge specsmith-vscode develop → main, tag v0.3.14 stable
390+
391+ ### Next step
392+ specsmith can now improve itself via ` specsmith agent improve <task> ` . Use it for Phase 4 tasks (feature flags, instinct, eval harness). Review changes before committing.
0 commit comments