Skip to content

Commit 310859a

Browse files
tbitcsoz-agent
andcommitted
docs: LEDGER.md session entry — AG2 realignment Phases 0-3 complete
- Phase 0: baseline-audit.md, AGENTS.md AG2 update - Phase 1: 249 tests green, conftest.py WinError fix, 23 new agent tests - Phase 2: AG2 agent shell (config, tools, roles, CLI, pyproject.toml) - Phase 3: self-improvement workflow + change reports - AG2 v0.12.0 + Ollama qwen2.5:14b proven end-to-end Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent 558a34a commit 310859a

1 file changed

Lines changed: 76 additions & 2 deletions

File tree

LEDGER.md

Lines changed: 76 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -310,9 +310,83 @@ Begin Phase 1: `operations.py` first (it blocks tool handler refactoring), then
310310
- Both repos: develop and main in sync, zero failing workflows
311311

312312
### Open TODOs (Phase 1 — next)
313-
- [ ] Implement `src/specsmith/operations.py` — typed ProjectOperations class
314-
- [ ] Refactor tool handlers in `agent/tools.py` to use ProjectOperations
313+
- [x] Implement `src/specsmith/operations.py` — typed ProjectOperations class → replaced by AG2 tool surface
314+
- [x] Refactor tool handlers in `agent/tools.py` → AG2 agents/tools/ replaces this
315315
- [ ] Populate `src/specsmith/commands/` with priority slash commands
316316
- [ ] Implement `src/specsmith/instinct.py` — instinct persistence
317317
- [ ] Implement `src/specsmith/eval/` — EDD harness with pass@k
318318
- [ ] Merge specsmith-vscode develop → main, tag v0.3.14 stable
319+
320+
---
321+
322+
## Session 2026-04-20 — AG2 Realignment: Phases 0–3
323+
324+
**Status:** Complete
325+
**Branch:** develop
326+
**AG2 version:** 0.12.0
327+
**Ollama model:** qwen2.5:14b
328+
329+
### What changed
330+
331+
**AG2 Realignment — new architecture direction:**
332+
Replaced the previous incremental roadmap with an AG2-based agent shell over Ollama.
333+
Four-layer architecture: Product Surface → Agent Layer (AG2) → Model Runtime (Ollama) → Verification Layer.
334+
Three agent roles: Planner (read-only inspection + planning), Builder (code/doc changes), Verifier (tests + accept/reject).
335+
336+
**Phase 0 — Baseline Audit (docs/baseline-audit.md):**
337+
- Architecture map: 4 entrypoints (CLI, REPL, GUI, VS Code), service boundaries, provider assumptions
338+
- Test inventory: 208 pass / 18 fail (stale pip install), ruff clean, mypy clean
339+
- Gap analysis: 10 ranked gaps (no agent tests, empty commands/, raw subprocess tools)
340+
- AGENTS.md: updated with AG2 four-layer architecture, 12 project rules
341+
342+
**Phase 1 — System Proof (docs/system-proof.md):**
343+
- Root cause of 18 failures: stale v0.3.1 pip install vs v0.3.10 source → reinstall fixed all
344+
- tests/conftest.py: WinError 448 pytest cleanup crash fix for Windows
345+
- tests/test_agent.py: 23 new tests — tool registry (5), tool handlers (5), system prompt (3), AgentRunner init (2), SessionState (2), meta-commands (2), Ollama integration (4)
346+
- 249 tests passing, lint clean, mypy clean
347+
348+
**Phase 2 — AG2 Agent Shell (src/specsmith/agents/):**
349+
- agents/config.py: AgentConfig from scaffold.yml, AG2 LLMConfig dict generation
350+
- agents/tools/filesystem.py: read_file, write_file, patch_file, list_tree, search_content (pathlib)
351+
- agents/tools/shell.py: run_project_command (structured exit code + output)
352+
- agents/tools/git.py: git_status, git_diff, git_changed_files, git_branch_info
353+
- agents/tools/tests.py: run_unit_tests, summarize_failures
354+
- agents/roles.py: Planner/Builder/Verifier via AG2 ConversableAgent + Ollama
355+
- agents/cli.py: specsmith agent run/plan/status/verify commands
356+
- cli.py: agent command group wired into main CLI
357+
- pyproject.toml: ag2[ollama] optional dependency added
358+
- Tested live: Planner calls tools via Ollama, full Plan→Build→Verify pipeline works
359+
360+
**Phase 3 — Self-Improvement Loop:**
361+
- agents/workflows/improve.py: run_improvement() — inspect → plan → edit → test → report
362+
- agents/reports.py: ChangeReport dataclass, save/list at .specsmith/agent-reports/
363+
- agents/cli.py: specsmith agent improve <task>, specsmith agent reports
364+
365+
### Verification
366+
- 249 tests passing (226 existing + 23 new agent tests)
367+
- ruff check: clean across all agents/ code
368+
- mypy: 0 errors in 72 source files
369+
- AG2 + Ollama live tested: plan, run, and full pipeline all execute successfully
370+
- Ollama tool calling proven with qwen2.5:14b (text completion + tool calling + provider protocol)
371+
372+
### Files changed (11 new + 5 modified)
373+
- New: src/specsmith/agents/__init__.py, config.py, roles.py, cli.py, reports.py
374+
- New: src/specsmith/agents/tools/__init__.py, filesystem.py, shell.py, git.py, tests.py
375+
- New: src/specsmith/agents/workflows/__init__.py, improve.py
376+
- New: tests/conftest.py, tests/test_agent.py
377+
- New: docs/baseline-audit.md, docs/system-proof.md
378+
- Modified: AGENTS.md, pyproject.toml, src/specsmith/cli.py
379+
380+
### Open TODOs
381+
- [ ] Phase 4.1: Feature flags (REQ-FLG-001–003)
382+
- [ ] Phase 4.2: Instinct/learning system (REQ-LRN-001–007)
383+
- [ ] Phase 4.3: Eval harness (REQ-EDD-001–008)
384+
- [ ] Phase 4.4: Agent memory persistence (REQ-MEM-001–004)
385+
- [ ] Phase 4.5: Multi-agent coordination via AG2 GroupChat
386+
- [ ] Phase 4.6: Server daemon (specsmith serve)
387+
- [ ] Phase 4.7: Theia IDE (specsmith-ide repo)
388+
- [ ] Populate src/specsmith/commands/ with slash commands
389+
- [ ] Merge specsmith-vscode develop → main, tag v0.3.14 stable
390+
391+
### Next step
392+
specsmith can now improve itself via `specsmith agent improve <task>`. Use it for Phase 4 tasks (feature flags, instinct, eval harness). Review changes before committing.

0 commit comments

Comments
 (0)