Target Workflow: test-coverage-improver
Source report: See latest token-usage-report issue
Estimated cost per run: N/A (Copilot usage — not billed by token)
Total tokens per run: ~3.9M input / ~39M effective tokens (successful run)
Cache hit rate (within-session): 49% — but 0% cross-run (ambient context never cached)
LLM turns/requests: 66 requests per successful run
Model: claude-sonnet-4.6
Run frequency: 2× daily (8am + 8pm UTC)
Observed Run Data (last 7 days)
| Run |
Status |
Effective Tokens |
Duration |
Requests |
| 25976952947 |
✅ success |
39.4M |
9.7m |
66 |
| 25976313126 |
❌ failure |
0 (skipped early) |
1.5m |
— |
| 25974930350 |
❌ failure |
166.7M |
27.2m |
~2 turns |
Episode total: ~206M effective tokens across 3 episodes.
Current Configuration
| Setting |
Value |
| Tools loaded |
github: [repos, pull_requests] + 14 bash commands |
| Network groups |
github only |
| Pre-agent steps |
✅ Yes — build, coverage, summary extraction |
| Prompt size |
~30,679 input tokens (ambient context) |
| Ambient context cached |
0 tokens (0% cross-run cache hit) |
| Bash tools |
npm run build, npm run test, npm run test:coverage, npm run lint, cat:src/*.ts, cat:tests/**, etc. |
Root Cause: Zero Cross-Run Prefix Caching
The prompt template injects three dynamic blocks at render time:
${{ steps.coverage-summary.outputs.COVERAGE_JSON }} # full JSON, changes every run
${{ steps.coverage-md.outputs.COVERAGE_MD }} # COVERAGE_SUMMARY.md, changes with PRs
${{ steps.low-coverage.outputs.LOW_COVERAGE }} # filtered list, also dynamic
Because these blocks appear inside the static prompt, the entire 30K-token system prompt becomes unique on every run. Claude's prefix caching requires identical prefixes — any change busts the cache. Result: every one of the 66 LLM requests in a run processes the full 30K ambient context as fresh input.
Recommendations
1. Move all dynamic ${{ }} injections to the END of the prompt
Estimated savings: ~20K cache-eligible tokens × 66 requests × 2 runs/day = ~2.6M tokens/day shifted to cheap cache reads (0.1× multiplier vs 1× full price)
Current layout (simplified):
[static context: 8K tokens]
[dynamic: COVERAGE_JSON block: ~5K tokens] ← cache buster
[static: guidelines, phases, examples: 15K tokens]
[dynamic: LOW_COVERAGE, COVERAGE_MD]
Restructured layout:
[ALL static content first: ~25K tokens] ← this prefix caches across runs
---
## Current Coverage Data (this run)
${{ steps.coverage-md.outputs.COVERAGE_MD }}
${{ steps.low-coverage.outputs.LOW_COVERAGE }}
Specific change to .github/workflows/test-coverage-improver.md: move the ## Current Coverage Status section (lines 258–280) to the very end of the document body, after all static guidelines and examples.
2. Remove the full COVERAGE_JSON block from the prompt
Estimated savings: ~5K tokens × 66 requests = ~330K tokens/run
The full coverage-summary.json is injected as a json code block, but the pre-step already extracts the actionable information as LOW_COVERAGE (files below 80%). The raw JSON adds noise and token cost without additional signal.
Change: Remove this block entirely from the prompt:
### Coverage JSON (full)
```json
${{ steps.coverage-summary.outputs.COVERAGE_JSON }}
Keep `COVERAGE_SUMMARY.md` (human-readable) and `LOW_COVERAGE` (prioritized list). The agent can `cat coverage/coverage-summary.json` via bash tool if it needs more detail.
### 3. Remove redundant bash tools already covered by pre-steps
**Estimated savings:** Prevents the agent from re-running expensive build/test commands, reducing session length by an estimated 20–30%
The pre-steps already run `npm ci`, `npm run build`, and `npm run test:coverage`. However, the `tools: bash:` section still allows the agent to re-run them:
```yaml
# CURRENT — allows redundant re-runs:
tools:
bash:
- "npm run build" # ← already done in pre-steps
- "npm run test" # ← redundant (test:coverage covers this)
- "npm run test:coverage" # ← already done in pre-steps
- "npm run lint"
- ...
Proposed change:
tools:
bash:
- "npm run test" # only for writing new tests iteratively
- "npm run lint"
- "cat:src/*.test.ts"
- "cat:src/*.ts"
- "cat:tests/**"
- "cat:coverage/coverage-summary.json"
- "cat:jest.config.js"
- "cat:jest.config.ts"
- "ls:src"
- "ls:tests"
- "ls:coverage"
- "head:*"
- "tail:*"
Remove npm run build (agent isn't modifying the build system) and npm run test:coverage (pre-steps already ran this; agent should use npm run test for fast iteration after writing new tests).
4. Reduce run frequency from 2× to 1× daily
Estimated savings: 50% reduction in total runs → halves all other token costs
The workflow runs at cron: '0 8,20 * * *' (twice daily). Coverage improvements are incremental and PRs require human review — there's no benefit from checking twice daily.
Change:
on:
schedule:
- cron: '0 8 * * *' # once daily at 8am UTC
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Ambient cache hit rate |
0% |
~85%+ |
Major |
| Effective tokens/successful run |
~39M |
~15M |
−62% |
| Effective tokens/week |
~546M |
~105M |
−81% |
| LLM requests/run |
66 |
50–55 (est.) |
−17% |
| Session duration |
9.7m |
7–8m (est.) |
−20% |
| Prompt input tokens |
30,679 |
~25,000 |
−19% |
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · ● 9.1M · ◷
Target Workflow:
test-coverage-improverSource report: See latest
token-usage-reportissueEstimated cost per run: N/A (Copilot usage — not billed by token)
Total tokens per run: ~3.9M input / ~39M effective tokens (successful run)
Cache hit rate (within-session): 49% — but 0% cross-run (ambient context never cached)
LLM turns/requests: 66 requests per successful run
Model: claude-sonnet-4.6
Run frequency: 2× daily (8am + 8pm UTC)
Observed Run Data (last 7 days)
Episode total: ~206M effective tokens across 3 episodes.
Current Configuration
github: [repos, pull_requests]+ 14 bash commandsgithubonlynpm run build,npm run test,npm run test:coverage,npm run lint,cat:src/*.ts,cat:tests/**, etc.Root Cause: Zero Cross-Run Prefix Caching
The prompt template injects three dynamic blocks at render time:
Because these blocks appear inside the static prompt, the entire 30K-token system prompt becomes unique on every run. Claude's prefix caching requires identical prefixes — any change busts the cache. Result: every one of the 66 LLM requests in a run processes the full 30K ambient context as fresh input.
Recommendations
1. Move all dynamic
${{ }}injections to the END of the promptEstimated savings: ~20K cache-eligible tokens × 66 requests × 2 runs/day = ~2.6M tokens/day shifted to cheap cache reads (0.1× multiplier vs 1× full price)
Current layout (simplified):
Restructured layout:
Specific change to
.github/workflows/test-coverage-improver.md: move the## Current Coverage Statussection (lines 258–280) to the very end of the document body, after all static guidelines and examples.2. Remove the full
COVERAGE_JSONblock from the promptEstimated savings: ~5K tokens × 66 requests = ~330K tokens/run
The full
coverage-summary.jsonis injected as ajsoncode block, but the pre-step already extracts the actionable information asLOW_COVERAGE(files below 80%). The raw JSON adds noise and token cost without additional signal.Change: Remove this block entirely from the prompt:
Proposed change:
Remove
npm run build(agent isn't modifying the build system) andnpm run test:coverage(pre-steps already ran this; agent should usenpm run testfor fast iteration after writing new tests).4. Reduce run frequency from 2× to 1× daily
Estimated savings: 50% reduction in total runs → halves all other token costs
The workflow runs at
cron: '0 8,20 * * *'(twice daily). Coverage improvements are incremental and PRs require human review — there's no benefit from checking twice daily.Change:
Expected Impact
Implementation Checklist
## Current Coverage Statussection to end of prompt (fixes cross-run cache miss)### Coverage JSON (full)block from prompt (reduces prompt ~5K tokens)npm run buildandnpm run test:coveragefromtools: bash:list0 8,20 * * *to0 8 * * *(once daily)gh aw compile .github/workflows/test-coverage-improver.md