Skip to content

Commit 26fec19

Browse files
tbitcsoz-agent
andcommitted
feat: token and credit optimization engine (10 strategies, 50 tests)
src/specsmith/agent/optimizer.py — new module implementing: 1. Response caching — SHA-256 hash cache (30-70% savings) 2. Prompt caching — Anthropic cache_control (50-90% savings) 3. Context trimming — sliding window, always preserves system msg 4. Model routing — keyword heuristic FAST/BALANCED/POWERFUL tier 5. Output length ctrl — max_tokens awareness (3-8x impact vs input) 6. Tool filtering — top-N relevant tools only (cuts 55K-134K overhead) 7. Token estimation — pre-flight cost from character ratios 8. Duplicate detection — identical messages served from cache instantly 9. Summarization trigger— signal when history exceeds threshold 10. Optimization report — cache hit rate, tokens saved, savings USD src/specsmith/agent/providers/anthropic.py: - Add prompt_caching=True param; injects cache_control ephemeral on system message block to enable Anthropic 90% cached-read discount src/specsmith/agent/runner.py: - AgentRunner gains optional OptimizationEngine (--optimize flag) - _call_provider: pre_call() transforms messages/model/tools; post_call() records tokens saved and populates cache src/specsmith/cli.py: - specsmith run --optimize: enable optimization engine per session - specsmith optimize: analyse .specsmith/ usage and print projected monthly savings with per-strategy breakdown and recommendations tests/test_optimizer.py: - 50 unit tests covering all 13 REQ-OPT-* requirements docs/: - REQ-OPT-001 through REQ-OPT-013 in REQUIREMENTS.md - TEST-OPT-001 through TEST-OPT-013 in TEST_SPEC.md Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent adb02f9 commit 26fec19

7 files changed

Lines changed: 1677 additions & 12 deletions

File tree

docs/REQUIREMENTS.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,22 @@
306306
- **REQ-SCF-EPI-002**: `enable_epistemic=true` adds epistemic governance to any project type
307307
- **REQ-SCF-EPI-003**: Epistemic project types get domain-specific directory structures
308308

309+
## Token & Credit Optimization
310+
311+
- **REQ-OPT-001**: `TokenEstimator` estimates token count from text using per-model character ratios, and estimates cost in USD from token counts and provider pricing tables
312+
- **REQ-OPT-002**: `ResponseCache` stores LLM responses keyed by SHA-256 hash of (provider, model, serialised messages); returns cached response on hit and records savings
313+
- **REQ-OPT-003**: `ResponseCache` supports configurable TTL (default 1 h) and optional JSON persistence to `.specsmith/response-cache.json`
314+
- **REQ-OPT-004**: `ContextManager.trim()` implements a sliding window that drops oldest non-system messages when total estimated tokens exceed `context_max_tokens`
315+
- **REQ-OPT-005**: `ContextManager` triggers a summarisation recommendation when history token count exceeds `summarize_threshold`
316+
- **REQ-OPT-006**: `ModelRouter.classify()` assigns a complexity tier (FAST/BALANCED/POWERFUL) to a user message using keyword and length heuristics, with no external API call
317+
- **REQ-OPT-007**: `ModelRouter.suggest_model()` returns the cheapest default model for a given (provider, tier) pair from a built-in pricing table
318+
- **REQ-OPT-008**: `ToolFilter.select()` scores available tools against task text and returns only the top-N relevant tools, reducing tool-schema token overhead
319+
- **REQ-OPT-009**: `OptimizationEngine.pre_call()` applies caching, context trim, model routing, and tool filtering before each LLM call; returns transformed messages, selected model, and an `OptimizationHint`
320+
- **REQ-OPT-010**: `OptimizationEngine.post_call()` records tokens saved, cache hit/miss, and model routing decision to running `OptimizationReport`
321+
- **REQ-OPT-011**: `AnthropicProvider` adds `cache_control: {"type": "ephemeral"}` to the system message when `prompt_caching=True`, enabling Anthropic’s 90% cached-read discount
322+
- **REQ-OPT-012**: `specsmith optimize` CLI command reads `.specsmith/` usage data and emits an `OptimizationReport` with concrete recommendations and projected monthly savings
323+
- **REQ-OPT-013**: `OptimizationConfig` is serialisable and can be embedded in `scaffold.yml` under `optimization:` to persist settings per project
324+
309325
## GUI Workbench
310326

311327
- **REQ-GUI-001**: `specsmith gui` launches a cross-platform Qt6 desktop workbench (Windows, Linux, macOS)

docs/TEST_SPEC.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -539,6 +539,35 @@
539539
- **TEST-WFL-010**: `specsmith session-end` reports unpushed commits and dirty files
540540
Covers: REQ-WFL-008
541541

542+
### Token & Credit Optimization
543+
544+
- **TEST-OPT-001**: `TokenEstimator.estimate()` returns positive int for non-empty text; GPT-4 uses 0.25 tokens/char ratio
545+
Covers: REQ-OPT-001
546+
- **TEST-OPT-002**: `TokenEstimator.estimate_cost()` returns expected USD for known token counts and provider
547+
Covers: REQ-OPT-001
548+
- **TEST-OPT-003**: `ResponseCache.get()` returns None on cold cache; returns response string on warm hit
549+
Covers: REQ-OPT-002
550+
- **TEST-OPT-004**: `ResponseCache` records tokens_saved and cost_saved on cache hit
551+
Covers: REQ-OPT-002
552+
- **TEST-OPT-005**: `ResponseCache` expires entries after TTL seconds
553+
Covers: REQ-OPT-003
554+
- **TEST-OPT-006**: `ContextManager.trim()` returns fewer messages when total tokens exceed max_tokens
555+
Covers: REQ-OPT-004
556+
- **TEST-OPT-007**: `ContextManager.trim()` always preserves system message
557+
Covers: REQ-OPT-004
558+
- **TEST-OPT-008**: `ContextManager.needs_summarization()` returns True when history exceeds summarize_threshold
559+
Covers: REQ-OPT-005
560+
- **TEST-OPT-009**: `ModelRouter.classify()` returns FAST for short/simple inputs, POWERFUL for code/architecture keywords
561+
Covers: REQ-OPT-006
562+
- **TEST-OPT-010**: `ModelRouter.suggest_model()` returns haiku/mini/flash for FAST tier per provider
563+
Covers: REQ-OPT-007
564+
- **TEST-OPT-011**: `ToolFilter.select()` returns subset of tools; governance tools ranked higher for audit-related tasks
565+
Covers: REQ-OPT-008
566+
- **TEST-OPT-012**: `OptimizationEngine.pre_call()` returns cache hit and skips model call when response is cached
567+
Covers: REQ-OPT-009
568+
- **TEST-OPT-013**: `OptimizationReport` accumulates correct cache_hits and tokens_saved across multiple calls
569+
Covers: REQ-OPT-010
570+
542571
### GUI Workbench
543572

544573
- **TEST-GUI-001**: `specsmith gui` command is registered and exits cleanly when PySide6 not installed

0 commit comments

Comments
 (0)