Skip to content

feat(report): session outlier detection and per-model one-shot rates#81

Open
lfl1337 wants to merge 8 commits intogetagentseal:mainfrom
lfl1337:feat/outlier-and-model-efficiency
Open

feat(report): session outlier detection and per-model one-shot rates#81
lfl1337 wants to merge 8 commits intogetagentseal:mainfrom
lfl1337:feat/outlier-and-model-efficiency

Conversation

@lfl1337
Copy link
Copy Markdown
Contributor

@lfl1337 lfl1337 commented Apr 18, 2026

Summary

Closes additional points from #12 (power-user proposals: outlier detection + per-model efficiency).

  • Top Sessions panel now shows dominant activity and highlights outliers (sessions costing >2x their project average) with a red cost color
  • One-Shot Rate by Model panel - per-model breakdown of single-turn completion rate, color-coded (green >=90%, orange 70-89%, red <70%)
  • Both surfaces also available in `--format json` as `outlierSessions[]` and `modelOneShotRates[]`

Architecture

Pure computation split into a new `src/analytics.ts` module - no React/Ink imports, directly unit-testable with plain `SessionSummary` fixtures:

  • `computeOutlierSessions(projects)` - top 5 by cost with outlier flag
  • `computeModelOneShotRates(projects)` - per-model one-shot rate from turn-level data
  • `dominantActivity(session)` - label of the highest-cost category

Design decisions

  • Merged into the existing `Top Sessions` panel rather than a separate panel (the initial version had both, which showed the same data twice)
  • Outliers signaled via red cost color with a one-line legend at the bottom of the panel, not a cryptic flag character
  • Per-turn model attribution uses `turn.assistantCalls[0].model` (primary call) - consistent with how `ActivityBreakdown` credits cost to the turn category without splitting per model

Depends on

If PR #77 lands first, I will rebase - its `getShortModelName` fix makes the per-model one-shot rates more accurate (no more `gpt-4o-mini` showing up as `GPT-4o`).

Test plan

  • `npm test` green (241/241, +11 new tests covering both functions)
  • `tsc --noEmit` clean
  • JSON smoke: `outlierSessions` and `modelOneShotRates` arrays both present with correct shape
  • TUI smoke: outlier highlighted in red at 36x project average, model rates colored by threshold

@lfl1337
Copy link
Copy Markdown
Contributor Author

lfl1337 commented Apr 18, 2026

image

@github-actions
Copy link
Copy Markdown

firstlook

Signal Detail
Account Created 5y 1mo ago
Repos 8 public
Profile None provided ⚠️
History 3 merged, 0 rejected elsewhere
Merge quality 1 unique mergers, 1 repos with 100+ stars
Activity 2/12 months active ⚠️
Followers 1 ⚠️
Signed No ⚠️

⚠️ Dormant Reactivation -- Active 2/12 months despite 5y account

Review Suggested (score: 39/100) -- Limited history or new account. Review with extra care.

Details
  • Self-merged: 3 | Externally merged: 5
  • Contributed to 1 repos with 100+ stars
  • This repo: 5 merged, 2 closed

firstlook

@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, computeModelOneShotRates() attributes edit/one-shot counts to only turn.assistantCalls[0].model, so any turn containing multiple assistant calls (potentially across different models) will miscount one-shot rates per model.

Severity: action required | Category: correctness

How to fix: Attribute turns across models

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

computeModelOneShotRates() counts editTurns/oneShotTurns for a turn under only the first assistant call’s model. A single turn may contain multiple assistantCalls (and retries are computed across calls), so this misattributes one-shot performance to the wrong model.

Issue Context

A ClassifiedTurn can contain multiple assistantCalls due to groupIntoTurns(). retries is computed across calls, and modelBreakdown is aggregated over all calls, so multiple models in one turn is possible.

Fix Focus Areas

  • src/analytics.ts[66-98]
  • src/parser.ts[122-165]
  • src/classifier.ts[120-163]

Implementation notes

Decide a consistent attribution strategy, e.g.:

  • Attribute to all models involved in the turn (increment each model’s counters when turn.hasEdits), or
  • Attribute to the last model used in the turn, or
  • Attribute to the model responsible for the edit tool call(s) if identifiable.

Add a unit test where a single turn has assistantCalls with two different models and verify the chosen attribution behavior.


Found by Qodo code review

…n turn

The first call in a multi-step turn is often a tool-request; the final
call is the one that produces the edit. Using assistantCalls[last]
correctly attributes the outcome to the model that generated it.
@lfl1337
Copy link
Copy Markdown
Contributor Author

lfl1337 commented Apr 28, 2026

Addressed in 44f5f6f. Changed attribution from assistantCalls[0] to assistantCalls[last].

Note on the broader concern: in Claude Code sessions, all API calls within a single turn use the same model (no mid-turn model switching). Other providers (Cursor, Codex, Copilot) produce exactly one assistantCall per turn. So the miscount scenario does not occur in practice with current data. Using the last call is nonetheless more semantically accurate -- it is the call that produced the final edit output, not the first tool-request call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants