docs(plugin): clarify cache segment semantics as last-call-only token display

JeremyDev87 · JeremyDev87 · commit b9871350d302 · 2026-04-05T20:30:11.000+09:00
Document why the status-bar cache segment renders raw tokens (♻N/M) instead of a percentage, explaining the last-call semantics of context_window.current_usage from Claude Code stdin. status-bar-model.md: - Update the example status line and segment table to show the new ♻2k/3.5k format instead of the deprecated Cache:XX% - Add a new "Cache Segment Semantics (Last-Call Only)" subsection explaining numerator/denominator, format rules, fallback behavior, and a contributor caution against reintroducing percentage rendering - Refresh all 5 Mode Examples (PLAN/ACT/EVAL/AUTO/Ready) to match the current v5.3.0 output - Add an explicit note on the Ready state that the cache segment is hidden when no API calls have been made yet (documented fallback, not a bug) codingbuddy-hud.py: - Expand format_compact_tokens docstring with explicit output rules, the 1000-1049 rounding note, and the error fallback contract (returns "0" when input is not coercible to int — never raises) - Expand format_cache_segment docstring with the last-call rationale, fallback conditions, and a contributor caution mirrored in the docs This PR is docs-only — no behavior changes. Regression tests from PR #1359 continue to guard against Cache:XX% reintroduction. Closes #1357
diff --git a/packages/claude-code-plugin/docs/status-bar-model.md b/packages/claude-code-plugin/docs/status-bar-model.md
@@ -20,20 +20,47 @@ Displays session metrics computed from stdin data and HUD state.
 **Format:**
 
 ```
-◕‿◕ CB v5.1.1 | PLAN 🟢 | 12m | ~$0.42 | Cache:53% | Ctx:45%
+◕‿◕ CB v5.3.0 | PLAN 🟢 | 12m | ~$0.42 | ♻800/1.5k | Ctx:45%
 ```
 
 | Segment         | Source                          | Description                                |
 | --------------- | ------------------------------- | ------------------------------------------ |
 | `◕‿◕`          | Constant                        | Buddy face                                 |
-| `CB v5.1.1`    | `hud_state.version`             | Plugin version                             |
+| `CB v5.3.0`    | `hud_state.version`             | Plugin version                             |
 | `PLAN`          | `hud_state.currentMode`         | Current workflow mode (PLAN/ACT/EVAL/AUTO) |
 | `🟢`           | Computed from `ctx_pct`          | Health indicator (see below)               |
 | `12m`           | `hud_state.sessionStartTimestamp`| Session duration                           |
 | `~$0.42`       | Computed from stdin token usage  | Estimated session cost                     |
-| `Cache:53%`    | Computed from stdin token usage  | Cache hit rate                             |
+| `♻800/1.5k`    | Computed from stdin token usage  | Cache tokens (last API call, see below)    |
 | `Ctx:45%`      | `stdin.context_window.used_percentage` | Context window usage               |
 
+The segment is omitted entirely when cache usage data is absent — do not assume it is always present.
+
+### Cache Segment Semantics (Last-Call Only)
+
+The cache segment renders raw token counts from **the most recent API call**, not cumulative session cache efficiency. This is a deliberate design choice driven by how Claude Code exposes telemetry (see #1355, #1356).
+
+**Format:** `♻{cache_read_input_tokens}/{total_input_tokens}` — e.g. `♻2k/3.5k`
+
+- **Numerator:** `stdin.context_window.current_usage.cache_read_input_tokens`
+- **Denominator:** `input_tokens + cache_creation_input_tokens + cache_read_input_tokens`
+- **Compact format:** values < 1000 render as integers (`532`), values ≥ 1000 render as `Nk` with trailing `.0` trimmed for whole thousands (`1k`, `1.5k`, `128k`)
+
+**Why not a percentage?**
+
+An earlier design rendered this as `Cache:XX%`. The calculation was mathematically correct but semantically misleading: users frequently saw values like `Cache:100%` and reasonably assumed their entire session was fully cached, even though the number only described the last request. Claude Code's status-line stdin explicitly documents `context_window.current_usage` as **last-call** token counts, not cumulative session totals. Raw token display removes the ambiguity.
+
+**Fallback behavior:**
+
+- `context_window` missing → segment omitted entirely
+- `current_usage` missing or null → segment omitted entirely
+- All three token counts are 0 → segment omitted entirely (no meaningful ratio to show)
+- Any of the three values is `None` → coerced to 0 via `or 0` defensive pattern
+
+**Rendering reference:** see `format_cache_segment()` and `format_compact_tokens()` in `hooks/codingbuddy-hud.py`.
+
+**Contributor caution:** do **not** reintroduce a percentage-based rendering (e.g. `Cache:XX%`). Regression tests in `tests/test_hud.py::TestFormatCacheSegment::test_regression_no_percent_in_output` and `TestFormatStatusLineCacheSegment::test_status_line_no_longer_contains_cache_percent` explicitly guard against this.
+
 ### Health Indicator
 
 | Emoji | Condition       | Meaning               |
@@ -124,7 +151,7 @@ Claude Code passes session data as JSON to the statusLine script's stdin:
 }
 ```
 
-**Used for:** model identification, cost estimation, cache rate, context percentage.
+**Used for:** model identification, cost estimation, last-call cache tokens, context percentage.
 
 ### Tier 2: HUD State File
 
@@ -163,36 +190,38 @@ If any exception occurs during rendering, the script outputs a minimal fallback:
 ### PLAN Mode (No Agent)
 
 ```
-◕‿◕ CB v5.1.1 | PLAN 🟢 | 5m | ~$0.12 | Cache:40% | Ctx:22%
+◕‿◕ CB v5.3.0 | PLAN 🟢 | 5m | ~$0.12 | ♻1.2k/3k | Ctx:22%
 ```
 
 ### ACT Mode (With Agent)
 
 ```
-◕‿◕ CB v5.1.1 | ACT 🟢 | 18m | ~$1.05 | Cache:62% | Ctx:48%
+◕‿◕ CB v5.3.0 | ACT 🟢 | 18m | ~$1.05 | ♻8k/13k | Ctx:48%
 🤖 frontend-developer
 ```
 
 ### EVAL Mode (High Context)
 
 ```
-◕‿◕ CB v5.1.1 | EVAL 🟡 | 45m | ~$3.20 | Cache:71% | Ctx:73%
+◕‿◕ CB v5.3.0 | EVAL 🟡 | 45m | ~$3.20 | ♻24k/34k | Ctx:73%
 🤖 security-specialist
 ```
 
 ### AUTO Mode (Critical Context)
 
 ```
-◕‿◕ CB v5.1.1 | AUTO 🔴 | 1h12m | ~$8.50 | Cache:55% | Ctx:91%
+◕‿◕ CB v5.3.0 | AUTO 🔴 | 1h12m | ~$8.50 | ♻95k/172k | Ctx:91%
 🤖 code-quality-specialist
 ```
 
-### No Mode Set (Initial State)
+### No Mode Set (Initial State — Cache Segment Hidden)
 
 ```
-◕‿◕ CB v5.1.1 | Ready 🟢 | 0m | ~$0.00 | Cache:0% | Ctx:0%
+◕‿◕ CB v5.3.0 | Ready 🟢 | 0m | ~$0.00 | Ctx:0%
 ```
 
+> The cache segment is **omitted** in the initial state because no API calls have been made yet — there is no `current_usage` data to render. This is the documented fallback behavior, not a bug.
+
 ## Multiline Behavior
 
 The statusLine output supports exactly **one or two lines**:
diff --git a/packages/claude-code-plugin/hooks/codingbuddy-hud.py b/packages/claude-code-plugin/hooks/codingbuddy-hud.py
@@ -132,10 +132,24 @@ def estimate_cost(model_id: str, context_window: dict) -> float:
 
 
 def format_compact_tokens(n: int) -> str:
-    """Format token count compactly for status-bar display.
+    """Format a token count compactly for status-bar display.
 
-    - < 1000 → raw integer (e.g. `532`)
-    - >= 1000 → `Nk` with one decimal trimmed of trailing `.0` (e.g. `1.5k`, `128k`)
+    Used by the cache segment to keep the status line narrow. See the
+    "Cache Segment Semantics" section of docs/status-bar-model.md for
+    the surrounding context.
+
+    Output rules:
+    - value < 1000          → raw integer (e.g. `532`)
+    - value is whole k      → `Nk` with trailing `.0` trimmed (e.g. `1k`, `128k`)
+    - value > 1000 non-whole → `N.Nk` with one decimal (e.g. `1.5k`, `3.5k`)
+
+    Note: values in (1000, 1050) may round to `1.0k` via `.1f` formatting.
+    This is accepted display behavior — the goal is compactness, not
+    lossless round-trip encoding.
+
+    Error fallback: returns `"0"` when `n` is not coercible to int (e.g.
+    `None`, non-numeric string). The function never raises — it is called
+    from the hot status-line path and must degrade gracefully.
     """
     try:
         value = int(n)
@@ -153,17 +167,36 @@ def format_compact_tokens(n: int) -> str:
 def format_cache_segment(context_window: dict) -> str:
     """Render the cache segment as raw tokens from the latest API call.
 
-    IMPORTANT: `context_window.current_usage` from Claude Code stdin reflects
-    **only the most recent API call**, not cumulative session cache usage.
-    This helper therefore renders raw token counts (numerator/denominator)
-    rather than a percentage, which users tend to misread as session-wide
-    cache efficiency (#1355, #1356).
-
-    Numerator   = `cache_read_input_tokens`
-    Denominator = `input_tokens + cache_creation_input_tokens + cache_read_input_tokens`
-
-    Returns an empty string when usage data is missing so the caller can
-    omit the segment entirely from the status line.
+    IMPORTANT — last-call semantics:
+        `context_window.current_usage` from Claude Code stdin reflects
+        **only the most recent API call**, not cumulative session cache
+        usage. An earlier design rendered this as `Cache:XX%`, which
+        users frequently misread as session-wide cache efficiency
+        (`Cache:100%` → "my whole session is cached", false). Raw token
+        display removes the ambiguity — see #1355, #1356 and the
+        "Cache Segment Semantics" section of docs/status-bar-model.md.
+
+    Format:
+        ``♻{cache_read}/{total}``  (e.g. ``♻2k/3.5k``)
+
+        - Numerator   = ``cache_read_input_tokens``
+        - Denominator = ``input_tokens + cache_creation_input_tokens
+                        + cache_read_input_tokens``
+        - Both values are passed through ``format_compact_tokens``
+          for narrow status-bar rendering.
+
+    Fallback:
+        Returns an empty string (hide the segment entirely) when:
+        - ``context_window`` is falsy
+        - ``current_usage`` is missing or null
+        - all three token counts sum to 0
+
+        Callers append the return value conditionally so the status
+        line still renders cleanly without a broken slot.
+
+    Contributor caution:
+        Do not reintroduce a percentage-based rendering here. Regression
+        tests in tests/test_hud.py explicitly guard against it.
     """
     usage = context_window.get("current_usage") if context_window else None
     if not usage: