Skip to content

fix(global-discover): bucket codex by originator + read 128KB for CC cwd#1488

Open
0xDevNinja wants to merge 1 commit into
garrytan:mainfrom
0xDevNinja:fix/1315-codex-originator-and-cc-truncation
Open

fix(global-discover): bucket codex by originator + read 128KB for CC cwd#1488
0xDevNinja wants to merge 1 commit into
garrytan:mainfrom
0xDevNinja:fix/1315-codex-originator-and-cc-truncation

Conversation

@0xDevNinja
Copy link
Copy Markdown

Summary

Two patches from #1315 in one diff.

  1. Codex session bucketing. scanCodex now normalizes payload.originator into { desktop, exec, claude_code, other } and surfaces the breakdown at tools.codex.originators and per-repo codex_originators. Existing codex totals stay (additive — no consumer break).
  2. CC undercount. extractCwdFromJsonl reads 128KB instead of 8KB. Recent Claude Code / CCR JSONL files often open with a queue-operation event 30-50KB long that has no cwd — the old 8KB read truncated the line, JSON.parse failed, and the whole project dir was silently dropped. Same buffer size scanCodex already uses.

Fixes #1315.

Why

/retro global narrated "codex was the primary execution tool, 414 sessions across 7 repos" when codex actually drove dev for one repo's middle phase. The other ~309 codex_exec entries were CC firing codex as cross-model review subagent. A single bucket can't tell those apart.

For the CC count: @Akagilnc traced ~450 missing files in one repo's 31d window to the 8KB cap (issue thread). First-line queue-operation events are 30-50KB on recent CC versions; the parser never reached the later events that carry cwd.

Shape

// tools.codex now:
{
  "total_sessions": 414,
  "repos": 7,
  "originators": { "desktop": 92, "exec": 309, "claude_code": 13, "other": 0 }
}

// per repo:
{
  "name": "ak-ai-vela",
  "sessions": { "claude_code": 12, "codex": 98, "gemini": 0 },
  "codex_originators": { "desktop": 1, "exec": 97, "claude_code": 0, "other": 0 }
}

Summary format inline shows Codex:98 (desktop=1, exec=97, cc=0) per repo + a top-line Codex originators: ... rollup.

Originator normalization (case-insensitive, matches values observed in ~/.codex/sessions/):

  • "Codex Desktop" / "codex_desktop"desktop
  • "codex_exec" / "codex exec"exec
  • "Claude Code" / "claude_code"claude_code
  • anything else → other (not dropped — future originators land here visibly until we map them)

Also adds a CLAUDE_PROJECTS_DIR env override on scanClaudeCode so the CC regression test can stage a fake project dir; mirrors the existing CODEX_SESSIONS_DIR knob.

Out of scope

Problem 3 from the issue (annotate /retro global that "sessions = tool invocations / file count", not interactive dev) is a narrative-side change in the retro skill template. Reasonable to file separately.

Tests

7 new in test/global-discover.test.ts:

  • 'Codex Desktop' originator → desktop bucket.
  • 'codex_exec'exec bucket.
  • 'Claude Code'claude_code bucket.
  • Unknown originator string → other (verifies nothing is dropped).
  • Per-repo codex_originators sums to per-repo sessions.codex.
  • tools.codex.originators shape + total parity with tools.codex.total_sessions.
  • CC JSONL whose first line is a >30KB queue-operation event still resolves cwd from a later event.
bun test test/global-discover.test.ts
# 26 pass, 0 fail (19 existing + 7 new)

Two problems from issue garrytan#1315, one diff.

Problem 1: `scanCodex` counted every rollout file as a `codex` session,
conflating Codex Desktop (interactive codex dev) with codex_exec
(cron / subagent) and Claude Code (CC driving codex via MCP). `/retro
global` then narrated "codex was the primary tool, 414 sessions" when
codex actually drove dev for one repo's middle phase and the other
~309 entries were CC firing codex as cross-model review.

`payload.originator` is now normalized into a 4-bucket
`codex_originators: { desktop, exec, claude_code, other }`. Surfaced
under `tools.codex.originators` and per-repo `codex_originators`.
Additive — existing `codex` totals stay, no consumer break.

Problem 2: `extractCwdFromJsonl` read 8KB then parsed. Recent
Claude Code / CCR JSONL files often open with a `queue-operation`
event 30-50KB long that carries no `cwd`. The 8KB read truncated
that line, JSON.parse failed, the fallback returned null, and the
whole project dir got skipped. Akagilnc measured ~450 CC files
vanishing this way in one repo's 31d window. Bumped to 128KB — same
buffer size `scanCodex` already uses.

Also exposes `CLAUDE_PROJECTS_DIR` env override (parallel to existing
`CODEX_SESSIONS_DIR`) so the regression test can plant a fake project
dir with a >30KB first line.

Out of scope: the suggested annotation in `/retro global` output that
"sessions" means "tool invocations / file count" (problem 3 in the
issue). Reasonable separate PR.

Fixes garrytan#1315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gstack-global-discover: session counts conflate originator types and undercount CC by ~5x

1 participant