Skip to content

⚡ Copilot Token Optimization2026-05-17 — test-coverage-improver #3293

Description

@github-actions

Target Workflow: test-coverage-improver

Source report: See latest token-usage-report issue
Estimated cost per run: N/A (Copilot usage — not billed by token)
Total tokens per run: ~3.9M input / ~39M effective tokens (successful run)
Cache hit rate (within-session): 49% — but 0% cross-run (ambient context never cached)
LLM turns/requests: 66 requests per successful run
Model: claude-sonnet-4.6
Run frequency: 2× daily (8am + 8pm UTC)

Observed Run Data (last 7 days)

Run Status Effective Tokens Duration Requests
25976952947 ✅ success 39.4M 9.7m 66
25976313126 ❌ failure 0 (skipped early) 1.5m
25974930350 ❌ failure 166.7M 27.2m ~2 turns

Episode total: ~206M effective tokens across 3 episodes.

Current Configuration

Setting Value
Tools loaded github: [repos, pull_requests] + 14 bash commands
Network groups github only
Pre-agent steps ✅ Yes — build, coverage, summary extraction
Prompt size ~30,679 input tokens (ambient context)
Ambient context cached 0 tokens (0% cross-run cache hit)
Bash tools npm run build, npm run test, npm run test:coverage, npm run lint, cat:src/*.ts, cat:tests/**, etc.

Root Cause: Zero Cross-Run Prefix Caching

The prompt template injects three dynamic blocks at render time:

${{ steps.coverage-summary.outputs.COVERAGE_JSON }}   # full JSON, changes every run
${{ steps.coverage-md.outputs.COVERAGE_MD }}           # COVERAGE_SUMMARY.md, changes with PRs
${{ steps.low-coverage.outputs.LOW_COVERAGE }}         # filtered list, also dynamic

Because these blocks appear inside the static prompt, the entire 30K-token system prompt becomes unique on every run. Claude's prefix caching requires identical prefixes — any change busts the cache. Result: every one of the 66 LLM requests in a run processes the full 30K ambient context as fresh input.

Recommendations

1. Move all dynamic ${{ }} injections to the END of the prompt

Estimated savings: ~20K cache-eligible tokens × 66 requests × 2 runs/day = ~2.6M tokens/day shifted to cheap cache reads (0.1× multiplier vs 1× full price)

Current layout (simplified):

[static context: 8K tokens]
[dynamic: COVERAGE_JSON block: ~5K tokens]   ← cache buster
[static: guidelines, phases, examples: 15K tokens]
[dynamic: LOW_COVERAGE, COVERAGE_MD]

Restructured layout:

[ALL static content first: ~25K tokens]      ← this prefix caches across runs
---
## Current Coverage Data (this run)
${{ steps.coverage-md.outputs.COVERAGE_MD }}
${{ steps.low-coverage.outputs.LOW_COVERAGE }}

Specific change to .github/workflows/test-coverage-improver.md: move the ## Current Coverage Status section (lines 258–280) to the very end of the document body, after all static guidelines and examples.

2. Remove the full COVERAGE_JSON block from the prompt

Estimated savings: ~5K tokens × 66 requests = ~330K tokens/run

The full coverage-summary.json is injected as a json code block, but the pre-step already extracts the actionable information as LOW_COVERAGE (files below 80%). The raw JSON adds noise and token cost without additional signal.

Change: Remove this block entirely from the prompt:

### Coverage JSON (full)

```json
${{ steps.coverage-summary.outputs.COVERAGE_JSON }}

Keep `COVERAGE_SUMMARY.md` (human-readable) and `LOW_COVERAGE` (prioritized list). The agent can `cat coverage/coverage-summary.json` via bash tool if it needs more detail.

### 3. Remove redundant bash tools already covered by pre-steps

**Estimated savings:** Prevents the agent from re-running expensive build/test commands, reducing session length by an estimated 20–30%

The pre-steps already run `npm ci`, `npm run build`, and `npm run test:coverage`. However, the `tools: bash:` section still allows the agent to re-run them:

```yaml
# CURRENT — allows redundant re-runs:
tools:
  bash:
    - "npm run build"        # ← already done in pre-steps
    - "npm run test"         # ← redundant (test:coverage covers this)
    - "npm run test:coverage" # ← already done in pre-steps
    - "npm run lint"
    - ...

Proposed change:

tools:
  bash:
    - "npm run test"          # only for writing new tests iteratively
    - "npm run lint"
    - "cat:src/*.test.ts"
    - "cat:src/*.ts"
    - "cat:tests/**"
    - "cat:coverage/coverage-summary.json"
    - "cat:jest.config.js"
    - "cat:jest.config.ts"
    - "ls:src"
    - "ls:tests"
    - "ls:coverage"
    - "head:*"
    - "tail:*"

Remove npm run build (agent isn't modifying the build system) and npm run test:coverage (pre-steps already ran this; agent should use npm run test for fast iteration after writing new tests).

4. Reduce run frequency from 2× to 1× daily

Estimated savings: 50% reduction in total runs → halves all other token costs

The workflow runs at cron: '0 8,20 * * *' (twice daily). Coverage improvements are incremental and PRs require human review — there's no benefit from checking twice daily.

Change:

on:
  schedule:
    - cron: '0 8 * * *'   # once daily at 8am UTC

Expected Impact

Metric Current Projected Savings
Ambient cache hit rate 0% ~85%+ Major
Effective tokens/successful run ~39M ~15M −62%
Effective tokens/week ~546M ~105M −81%
LLM requests/run 66 50–55 (est.) −17%
Session duration 9.7m 7–8m (est.) −20%
Prompt input tokens 30,679 ~25,000 −19%

Implementation Checklist

  • Move ## Current Coverage Status section to end of prompt (fixes cross-run cache miss)
  • Remove ### Coverage JSON (full) block from prompt (reduces prompt ~5K tokens)
  • Remove npm run build and npm run test:coverage from tools: bash: list
  • Change cron from 0 8,20 * * * to 0 8 * * * (once daily)
  • Recompile: gh aw compile .github/workflows/test-coverage-improver.md
  • Trigger manual run and compare token usage to baseline
  • Verify CI passes on updated workflow

Generated by Daily Copilot Token Optimization Advisor · ● 9.1M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions