Skip to content

Commit b987135

Browse files
committed
docs(plugin): clarify cache segment semantics as last-call-only token display
Document why the status-bar cache segment renders raw tokens (♻N/M) instead of a percentage, explaining the last-call semantics of context_window.current_usage from Claude Code stdin. status-bar-model.md: - Update the example status line and segment table to show the new ♻2k/3.5k format instead of the deprecated Cache:XX% - Add a new "Cache Segment Semantics (Last-Call Only)" subsection explaining numerator/denominator, format rules, fallback behavior, and a contributor caution against reintroducing percentage rendering - Refresh all 5 Mode Examples (PLAN/ACT/EVAL/AUTO/Ready) to match the current v5.3.0 output - Add an explicit note on the Ready state that the cache segment is hidden when no API calls have been made yet (documented fallback, not a bug) codingbuddy-hud.py: - Expand format_compact_tokens docstring with explicit output rules, the 1000-1049 rounding note, and the error fallback contract (returns "0" when input is not coercible to int — never raises) - Expand format_cache_segment docstring with the last-call rationale, fallback conditions, and a contributor caution mirrored in the docs This PR is docs-only — no behavior changes. Regression tests from PR #1359 continue to guard against Cache:XX% reintroduction. Closes #1357
1 parent 8b32125 commit b987135

2 files changed

Lines changed: 86 additions & 24 deletions

File tree

packages/claude-code-plugin/docs/status-bar-model.md

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,47 @@ Displays session metrics computed from stdin data and HUD state.
2020
**Format:**
2121

2222
```
23-
◕‿◕ CB v5.1.1 | PLAN 🟢 | 12m | ~$0.42 | Cache:53% | Ctx:45%
23+
◕‿◕ CB v5.3.0 | PLAN 🟢 | 12m | ~$0.42 | ♻800/1.5k | Ctx:45%
2424
```
2525

2626
| Segment | Source | Description |
2727
| --------------- | ------------------------------- | ------------------------------------------ |
2828
| `◕‿◕` | Constant | Buddy face |
29-
| `CB v5.1.1` | `hud_state.version` | Plugin version |
29+
| `CB v5.3.0` | `hud_state.version` | Plugin version |
3030
| `PLAN` | `hud_state.currentMode` | Current workflow mode (PLAN/ACT/EVAL/AUTO) |
3131
| `🟢` | Computed from `ctx_pct` | Health indicator (see below) |
3232
| `12m` | `hud_state.sessionStartTimestamp`| Session duration |
3333
| `~$0.42` | Computed from stdin token usage | Estimated session cost |
34-
| `Cache:53%` | Computed from stdin token usage | Cache hit rate |
34+
| `♻800/1.5k` | Computed from stdin token usage | Cache tokens (last API call, see below) |
3535
| `Ctx:45%` | `stdin.context_window.used_percentage` | Context window usage |
3636

37+
The segment is omitted entirely when cache usage data is absent — do not assume it is always present.
38+
39+
### Cache Segment Semantics (Last-Call Only)
40+
41+
The cache segment renders raw token counts from **the most recent API call**, not cumulative session cache efficiency. This is a deliberate design choice driven by how Claude Code exposes telemetry (see #1355, #1356).
42+
43+
**Format:** `♻{cache_read_input_tokens}/{total_input_tokens}` — e.g. `♻2k/3.5k`
44+
45+
- **Numerator:** `stdin.context_window.current_usage.cache_read_input_tokens`
46+
- **Denominator:** `input_tokens + cache_creation_input_tokens + cache_read_input_tokens`
47+
- **Compact format:** values < 1000 render as integers (`532`), values ≥ 1000 render as `Nk` with trailing `.0` trimmed for whole thousands (`1k`, `1.5k`, `128k`)
48+
49+
**Why not a percentage?**
50+
51+
An earlier design rendered this as `Cache:XX%`. The calculation was mathematically correct but semantically misleading: users frequently saw values like `Cache:100%` and reasonably assumed their entire session was fully cached, even though the number only described the last request. Claude Code's status-line stdin explicitly documents `context_window.current_usage` as **last-call** token counts, not cumulative session totals. Raw token display removes the ambiguity.
52+
53+
**Fallback behavior:**
54+
55+
- `context_window` missing → segment omitted entirely
56+
- `current_usage` missing or null → segment omitted entirely
57+
- All three token counts are 0 → segment omitted entirely (no meaningful ratio to show)
58+
- Any of the three values is `None` → coerced to 0 via `or 0` defensive pattern
59+
60+
**Rendering reference:** see `format_cache_segment()` and `format_compact_tokens()` in `hooks/codingbuddy-hud.py`.
61+
62+
**Contributor caution:** do **not** reintroduce a percentage-based rendering (e.g. `Cache:XX%`). Regression tests in `tests/test_hud.py::TestFormatCacheSegment::test_regression_no_percent_in_output` and `TestFormatStatusLineCacheSegment::test_status_line_no_longer_contains_cache_percent` explicitly guard against this.
63+
3764
### Health Indicator
3865

3966
| Emoji | Condition | Meaning |
@@ -124,7 +151,7 @@ Claude Code passes session data as JSON to the statusLine script's stdin:
124151
}
125152
```
126153

127-
**Used for:** model identification, cost estimation, cache rate, context percentage.
154+
**Used for:** model identification, cost estimation, last-call cache tokens, context percentage.
128155

129156
### Tier 2: HUD State File
130157

@@ -163,36 +190,38 @@ If any exception occurs during rendering, the script outputs a minimal fallback:
163190
### PLAN Mode (No Agent)
164191

165192
```
166-
◕‿◕ CB v5.1.1 | PLAN 🟢 | 5m | ~$0.12 | Cache:40% | Ctx:22%
193+
◕‿◕ CB v5.3.0 | PLAN 🟢 | 5m | ~$0.12 | ♻1.2k/3k | Ctx:22%
167194
```
168195

169196
### ACT Mode (With Agent)
170197

171198
```
172-
◕‿◕ CB v5.1.1 | ACT 🟢 | 18m | ~$1.05 | Cache:62% | Ctx:48%
199+
◕‿◕ CB v5.3.0 | ACT 🟢 | 18m | ~$1.05 | ♻8k/13k | Ctx:48%
173200
🤖 frontend-developer
174201
```
175202

176203
### EVAL Mode (High Context)
177204

178205
```
179-
◕‿◕ CB v5.1.1 | EVAL 🟡 | 45m | ~$3.20 | Cache:71% | Ctx:73%
206+
◕‿◕ CB v5.3.0 | EVAL 🟡 | 45m | ~$3.20 | ♻24k/34k | Ctx:73%
180207
🤖 security-specialist
181208
```
182209

183210
### AUTO Mode (Critical Context)
184211

185212
```
186-
◕‿◕ CB v5.1.1 | AUTO 🔴 | 1h12m | ~$8.50 | Cache:55% | Ctx:91%
213+
◕‿◕ CB v5.3.0 | AUTO 🔴 | 1h12m | ~$8.50 | ♻95k/172k | Ctx:91%
187214
🤖 code-quality-specialist
188215
```
189216

190-
### No Mode Set (Initial State)
217+
### No Mode Set (Initial State — Cache Segment Hidden)
191218

192219
```
193-
◕‿◕ CB v5.1.1 | Ready 🟢 | 0m | ~$0.00 | Cache:0% | Ctx:0%
220+
◕‿◕ CB v5.3.0 | Ready 🟢 | 0m | ~$0.00 | Ctx:0%
194221
```
195222

223+
> The cache segment is **omitted** in the initial state because no API calls have been made yet — there is no `current_usage` data to render. This is the documented fallback behavior, not a bug.
224+
196225
## Multiline Behavior
197226

198227
The statusLine output supports exactly **one or two lines**:

packages/claude-code-plugin/hooks/codingbuddy-hud.py

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,24 @@ def estimate_cost(model_id: str, context_window: dict) -> float:
132132

133133

134134
def format_compact_tokens(n: int) -> str:
135-
"""Format token count compactly for status-bar display.
135+
"""Format a token count compactly for status-bar display.
136136
137-
- < 1000 → raw integer (e.g. `532`)
138-
- >= 1000 → `Nk` with one decimal trimmed of trailing `.0` (e.g. `1.5k`, `128k`)
137+
Used by the cache segment to keep the status line narrow. See the
138+
"Cache Segment Semantics" section of docs/status-bar-model.md for
139+
the surrounding context.
140+
141+
Output rules:
142+
- value < 1000 → raw integer (e.g. `532`)
143+
- value is whole k → `Nk` with trailing `.0` trimmed (e.g. `1k`, `128k`)
144+
- value > 1000 non-whole → `N.Nk` with one decimal (e.g. `1.5k`, `3.5k`)
145+
146+
Note: values in (1000, 1050) may round to `1.0k` via `.1f` formatting.
147+
This is accepted display behavior — the goal is compactness, not
148+
lossless round-trip encoding.
149+
150+
Error fallback: returns `"0"` when `n` is not coercible to int (e.g.
151+
`None`, non-numeric string). The function never raises — it is called
152+
from the hot status-line path and must degrade gracefully.
139153
"""
140154
try:
141155
value = int(n)
@@ -153,17 +167,36 @@ def format_compact_tokens(n: int) -> str:
153167
def format_cache_segment(context_window: dict) -> str:
154168
"""Render the cache segment as raw tokens from the latest API call.
155169
156-
IMPORTANT: `context_window.current_usage` from Claude Code stdin reflects
157-
**only the most recent API call**, not cumulative session cache usage.
158-
This helper therefore renders raw token counts (numerator/denominator)
159-
rather than a percentage, which users tend to misread as session-wide
160-
cache efficiency (#1355, #1356).
161-
162-
Numerator = `cache_read_input_tokens`
163-
Denominator = `input_tokens + cache_creation_input_tokens + cache_read_input_tokens`
164-
165-
Returns an empty string when usage data is missing so the caller can
166-
omit the segment entirely from the status line.
170+
IMPORTANT — last-call semantics:
171+
`context_window.current_usage` from Claude Code stdin reflects
172+
**only the most recent API call**, not cumulative session cache
173+
usage. An earlier design rendered this as `Cache:XX%`, which
174+
users frequently misread as session-wide cache efficiency
175+
(`Cache:100%` → "my whole session is cached", false). Raw token
176+
display removes the ambiguity — see #1355, #1356 and the
177+
"Cache Segment Semantics" section of docs/status-bar-model.md.
178+
179+
Format:
180+
``♻{cache_read}/{total}`` (e.g. ``♻2k/3.5k``)
181+
182+
- Numerator = ``cache_read_input_tokens``
183+
- Denominator = ``input_tokens + cache_creation_input_tokens
184+
+ cache_read_input_tokens``
185+
- Both values are passed through ``format_compact_tokens``
186+
for narrow status-bar rendering.
187+
188+
Fallback:
189+
Returns an empty string (hide the segment entirely) when:
190+
- ``context_window`` is falsy
191+
- ``current_usage`` is missing or null
192+
- all three token counts sum to 0
193+
194+
Callers append the return value conditionally so the status
195+
line still renders cleanly without a broken slot.
196+
197+
Contributor caution:
198+
Do not reintroduce a percentage-based rendering here. Regression
199+
tests in tests/test_hud.py explicitly guard against it.
167200
"""
168201
usage = context_window.get("current_usage") if context_window else None
169202
if not usage:

0 commit comments

Comments
 (0)