Skip to content

ci(pricing): update LiteLLM snapshot automation#1186

Merged
ryoppippi merged 2 commits into
mainfrom
codex/update-litellm-pricing
May 29, 2026
Merged

ci(pricing): update LiteLLM snapshot automation#1186
ryoppippi merged 2 commits into
mainfrom
codex/update-litellm-pricing

Conversation

@ryoppippi
Copy link
Copy Markdown
Owner

@ryoppippi ryoppippi commented May 29, 2026

Updates the locked LiteLLM pricing input used by Nix builds.

Also changes the automated pricing update workflow to run every 12 hours and skip opening PRs when the LiteLLM pricing JSON is unchanged after the flake update.

Testing:

  • nix develop --command cargo test --manifest-path rust/Cargo.toml -p ccusage pricing::tests::
  • git diff --check
  • Verified the LiteLLM pricing JSON comparison path with the current locked rev

Summary by CodeRabbit

  • Chores
    • Updated automated LiteLLM pricing synchronization workflow to run every 12 hours instead of weekly and improved the update process with additional validation checks to prevent unnecessary changes and ensure data integrity.

Review Change Stack

Update the locked LiteLLM flake input from e59e34bed3670a6894d43129c2af16af28057d03 to f27df8d516802ce4c1b32973992154fe83b851cf so Nix builds embed the latest upstream pricing data.

Verified with the focused ccusage pricing test suite against the refreshed embedded pricing snapshot.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

The workflow is updated to check for LiteLLM pricing changes more frequently. The schedule changes from weekly to every 12 hours, and the update job now compares pricing JSON before and after the flake update to avoid unnecessary PR creation when pricing data is unchanged.

Changes

LiteLLM Pricing Update Workflow

Layer / File(s) Summary
Pricing comparison flow and schedule frequency
.github/workflows/update-pricing.yaml
The workflow schedule increases to every 12 hours. The update job replaces a single flake update step with a before/after pricing JSON comparison: fetch current pricing, run nix flake update litellm, fetch updated pricing, compare JSON files, and conditionally skip PR creation and revert flake.lock when pricing is unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • ryoppippi/ccusage#1099: The main PR's update-pricing.yaml workflow changes build directly on the same Nix input switch from litellm-pricing to litellm, extending the existing nix flake update litellm flow with before/after JSON comparison and lockfile handling.
  • ryoppippi/ccusage#1132: Both PRs modify the same .github/workflows/update-pricing.yaml LiteLLM pricing update flow—one removes fallback JSON tracking and the other adds fetch-before/fetch-after comparison to skip PRs when pricing JSON is unchanged.

Poem

🐰 A rabbit hops through workflows bright,
Pricing checks every twelve hours and night,
Fetch, compare, no waste, no false alarm,
Only real changes trigger the swarm. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'ci(pricing): update LiteLLM snapshot automation' directly relates to the main workflow changes shown in the raw summary: the scheduled trigger modification and the expanded LiteLLM pricing update job logic.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/update-litellm-pricing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryoppippi
Copy link
Copy Markdown
Owner Author

@coderabbitai review
@cubic-dev-ai review

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 0add427 Commit Preview URL

Branch Preview URL
May 29 2026, 11:15 AM

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 29, 2026

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR details and diff
gh pr view 1186 --repo ryoppippi/ccusage --json title,body,files,additions,deletions,baseRefName,headRefName

echo "---DIFF---"
gh pr diff 1186 --repo ryoppippi/ccusage

Length of output: 1481



@ryoppippi The changes look good. This is a clean, minimal Nix lock file bump for the litellm flake input:

Field Before After
rev e59e34b f27df8d
lastModified 1779231723 1780038305
narHash sha256-0MG6... sha256-os5z...

Everything else (owner, repo, type) remains unchanged, confirming this is a straightforward upstream HEAD bump for BerriAI/litellm. The narHash update correctly reflects the new content digest, and the description accurately describes the change. No concerns.

[approve]

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 29, 2026

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1186

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1186

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1186

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1186

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1186

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1186

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1186

commit: d0d2491

@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: d0d2491bb4d4
Base SHA: a78f8de1e848

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new a78f8de1e848 608.9ms 591.6ms 35.9ms 3
PR pkg.pr.new d0d2491 641.9ms 716.2ms 35.5ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: a78f8de1e848; PR package: d0d2491. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 543.6ms 552.8ms 0.98x 323.33 MiB 283.33 MiB 0.88x 1.85 GiB/s 1.82 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 384.9ms 384.2ms 1.00x 79.45 MiB 66.33 MiB 0.83x 2.62 GiB/s 2.62 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 564.4ms 1.78 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 533.6ms 1.89 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 383.7ms 2.62 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 351.7ms 2.86 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 32.4ms 4.4ms 7.30x - 2.83 MiB - 0.05 MiB/s 0.35 MiB/s
claude session --offline --json 0.00 MiB 32.6ms 4.4ms 7.46x - 2.83 MiB - 0.05 MiB/s 0.35 MiB/s
codex daily --offline --json 0.00 MiB 32.6ms 4.2ms 7.81x - 2.83 MiB - 0.03 MiB/s 0.21 MiB/s
codex session --offline --json 0.00 MiB 32.1ms 4.1ms 7.78x 43.61 MiB 2.83 MiB 0.06x 0.03 MiB/s 0.21 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 549.4ms 534.1ms 1.03x - 307.70 MiB - 1.83 GiB/s 1.89 GiB/s
codex --offline --json 1.01 GiB 370.8ms 344.8ms 1.08x - 79.83 MiB - 2.72 GiB/s 2.92 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.22 KiB 14.22 KiB +0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Run the LiteLLM pricing update workflow every twelve hours instead of weekly so pricing changes are discovered sooner.

Compare model_prices_and_context_window.json before and after the flake update, and restore flake.lock when the pricing JSON is unchanged so automation does not open lock-only PRs.
@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: d0d2491bb4d4
Base SHA: a78f8de1e848

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new a78f8de1e848 495.8ms 602.6ms 35.0ms 3
PR pkg.pr.new d0d2491 652.5ms 767.0ms 35.4ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: a78f8de1e848; PR package: d0d2491. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 564.6ms 557.1ms 1.01x 323.95 MiB 295.70 MiB 0.91x 1.78 GiB/s 1.81 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 379.9ms 384.1ms 0.99x 78.33 MiB 71.70 MiB 0.92x 2.65 GiB/s 2.62 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 549.3ms 1.83 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 519.8ms 1.94 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 378.9ms 2.66 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 363.7ms 2.77 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 31.1ms 31.9ms 0.97x - - - 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 32.9ms 32.8ms 1.00x - 43.61 MiB - 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 32.4ms 32.2ms 1.01x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 32.1ms 32.4ms 0.99x 43.48 MiB - - 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 571.3ms 566.2ms 1.01x 298.83 MiB 291.58 MiB 0.98x 1.76 GiB/s 1.78 GiB/s
codex --offline --json 1.01 GiB 380.6ms 394.1ms 0.97x 81.33 MiB 76.70 MiB 0.94x 2.64 GiB/s 2.55 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.22 KiB 14.22 KiB +0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi changed the title chore(pricing): update LiteLLM snapshot ci(pricing): update LiteLLM snapshot automation May 29, 2026
@ryoppippi
Copy link
Copy Markdown
Owner Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 29, 2026

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr view 1186 --repo ryoppippi/ccusage --json title,body,files,additions,deletions,baseRefName,headRefName,commits,state

echo "---DIFF---"
gh pr diff 1186 --repo ryoppippi/ccusage

Length of output: 5554



@ryoppippi The PR looks good. Here's a full review of both files:


.github/workflows/update-pricing.yaml

Schedule change (0 3 * * 10 */12 * * *): Correctly moves from weekly to bi-daily runs, catching upstream pricing changes sooner.

New logic flow:

  1. Capture before – fetches model_prices_and_context_window.json from the current locked rev before updating.
  2. nix flake update litellm – bumps the flake input.
  3. Capture after & compare – fetches the same file from the new rev; if identical, runs git checkout -- flake.lock to discard the lock bump and skip the PR. Clean use of cmp --silent.
  4. Validate – gates nix flake check behind git diff --quiet, so it only runs when there's a real pricing change. The exit 0 path means the subsequent "Create pull request" step will still run but will find no staged changes — standard create-pull-request actions handle that gracefully.

One minor note: the curl calls in both capture steps don't pin a --retry flag, so a transient network hiccup could cause a false "no change" (if the first fetch fails, the before-file may not exist and cmp would error). Not a blocker, but something to consider hardening later.

flake.lock

Straightforward upstream HEAD bump for BerriAI/litellm:

Field Before After
rev e59e34bed3670a6894d43129c2af16af28057d03 f27df8d516802ce4c1b32973992154fe83b851cf
lastModified 1779231723 1780038305
narHash sha256-0MG6NXwV… sha256-os5zGH6k…

All structural fields (owner, repo, type, flake: false) are unchanged. ✅


Overall: the implementation is correct and the intent is well-executed. No blocking concerns.

[approve]

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/update-pricing.yaml (1)

18-23: ⚡ Quick win

Add retry/backoff to the pricing fetch.

Now that this runs every 12h, a transient 5xx or network blip from raw.githubusercontent.com will fail curl --fail and abort the whole scheduled run. Add retries here and in the after-update fetch (Lines 30-32).

♻️ Proposed resilience tweak
           curl --fail --location --silent --show-error \
+            --retry 3 --retry-delay 2 --retry-connrefused \
             "https://raw.githubusercontent.com/BerriAI/litellm/${rev}/model_prices_and_context_window.json" \
             --output /tmp/litellm-pricing-before.json

Apply the same --retry flags to the fetch at Lines 30-32.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-pricing.yaml around lines 18 - 23, The current curl
invocations used to "Capture current LiteLLM pricing JSON" (the fetch that
writes /tmp/litellm-pricing-before.json) and the subsequent after-update fetch
are brittle on transient network/5xx errors; update both curl commands to
include retry/backoff flags (for example add --retry 5 --retry-delay 5
--retry-all-errors --retry-max-time 60 and a reasonable --connect-timeout like
10) while keeping the existing --fail/--location/--silent/--show-error and
--output behavior so the workflow will retry transient failures instead of
aborting immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/update-pricing.yaml:
- Around line 18-23: The current curl invocations used to "Capture current
LiteLLM pricing JSON" (the fetch that writes /tmp/litellm-pricing-before.json)
and the subsequent after-update fetch are brittle on transient network/5xx
errors; update both curl commands to include retry/backoff flags (for example
add --retry 5 --retry-delay 5 --retry-all-errors --retry-max-time 60 and a
reasonable --connect-timeout like 10) while keeping the existing
--fail/--location/--silent/--show-error and --output behavior so the workflow
will retry transient failures instead of aborting immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 22730103-39fe-4a7d-9850-8fd0e5dc0e25

📥 Commits

Reviewing files that changed from the base of the PR and between a78f8de and 0add427.

⛔ Files ignored due to path filters (1)
  • flake.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • .github/workflows/update-pricing.yaml

@ryoppippi ryoppippi merged commit 058dd39 into main May 29, 2026
29 checks passed
@ryoppippi ryoppippi deleted the codex/update-litellm-pricing branch May 29, 2026 11:20
@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 0add427abe10
Base SHA: a78f8de1e848

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new a78f8de1e848 701.1ms 508.0ms 31.3ms 3
PR pkg.pr.new 0add427 789.4ms 580.8ms 31.1ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: a78f8de1e848; PR package: 0add427. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 524.3ms 517.5ms 1.01x 280.33 MiB 272.70 MiB 0.97x 1.92 GiB/s 1.95 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 369.5ms 370.0ms 1.00x 79.83 MiB 79.45 MiB 1.00x 2.72 GiB/s 2.72 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 524.0ms 1.92 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 493.0ms 2.04 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 363.6ms 2.77 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 341.7ms 2.95 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 28.6ms 3.8ms 7.56x 43.61 MiB 2.83 MiB 0.06x 0.05 MiB/s 0.41 MiB/s
claude session --offline --json 0.00 MiB 28.4ms 3.8ms 7.40x 43.61 MiB 2.83 MiB 0.06x 0.05 MiB/s 0.40 MiB/s
codex daily --offline --json 0.00 MiB 28.5ms 3.6ms 7.85x 43.48 MiB 2.83 MiB 0.07x 0.03 MiB/s 0.24 MiB/s
codex session --offline --json 0.00 MiB 28.7ms 3.6ms 7.94x 43.48 MiB 2.83 MiB 0.07x 0.03 MiB/s 0.24 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 518.5ms 505.2ms 1.03x 288.95 MiB 295.45 MiB 1.02x 1.94 GiB/s 1.99 GiB/s
codex --offline --json 1.01 GiB 366.0ms 337.7ms 1.08x 83.45 MiB 79.45 MiB 0.95x 2.75 GiB/s 2.98 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.22 KiB 14.21 KiB -0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 0add427abe10
Base SHA: a78f8de1e848

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new a78f8de1e848 744.7ms 810.3ms 33.4ms 3
PR pkg.pr.new 0add427 1.005s 732.4ms 33.5ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: a78f8de1e848; PR package: 0add427. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 533.6ms 549.0ms 0.97x 304.83 MiB 319.58 MiB 1.05x 1.89 GiB/s 1.83 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 378.4ms 379.7ms 1.00x 76.70 MiB 78.83 MiB 1.03x 2.66 GiB/s 2.65 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 557.4ms 1.81 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 511.2ms 1.97 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 375.5ms 2.68 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 349.9ms 2.88 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 31.3ms 31.4ms 1.00x - - - 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 31.4ms 31.5ms 1.00x 43.48 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 31.2ms 31.1ms 1.00x 43.61 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 31.1ms 31.1ms 1.00x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 555.3ms 543.9ms 1.02x 310.08 MiB 300.58 MiB 0.97x 1.81 GiB/s 1.85 GiB/s
codex --offline --json 1.01 GiB 396.2ms 374.9ms 1.06x - 82.45 MiB - 2.54 GiB/s 2.69 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.22 KiB 14.21 KiB -0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant