Skip to content

feat: include usage in each message#1379

Merged
leonardmq merged 9 commits into
mainfrom
leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations
May 8, 2026
Merged

feat: include usage in each message#1379
leonardmq merged 9 commits into
mainfrom
leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations

Conversation

@leonardmq
Copy link
Copy Markdown
Collaborator

@leonardmq leonardmq commented May 7, 2026

What does this PR do?

Add usage in each message for multiturn task runs and cumulative usage.

Adds new usage info blocks into the TaskRun to support the multiturn case:

  • trace[x].usage -> contains the usage info for the LLM call that produced this output (noting that in multiturn, not every message is necessarily produced by an LLM, it could be manually injected into the prior_trace)
  • cumulative_usage -> the total usage info across all the turns included in the trace; this is the same as sum(map(task_run.trace => (message) => message.usage))

Produces TaskRun->trace like this:

{
  "id": "8221839524_[uuid-redacted]",
  "task_run": {
    "v": 1,
    "id": "8221839524_[uuid-redacted]",
    "path": null,
    "created_at": "2026-05-07T21:27:55.900923+08:00",
    "created_by": "xxx",
    "input": "xxx",
    "input_source": {
      "type": "human",
      "properties": {
        "created_by": "xxx"
      },
      "run_config": null
    },
    "output": {
      "rating": null,
      "model_type": "task_output"
    },
    "repair_instructions": null,
    "repaired_output": null,
    "intermediate_outputs": {
      "reasoning": "xxx"
    },
    "tags": [],
    "usage": {
      "input_tokens": 12834,
      "output_tokens": 449,
      "total_tokens": 13283,
      "cost": null,
      "cached_tokens": 11586,
      "total_llm_latency_ms": 17008
    },
    "cumulative_usage": {
      "input_tokens": 84140,
      "output_tokens": 2583,
      "total_tokens": 86723,
      "cost": null,
      "cached_tokens": 61552
    },
    "trace": [
      {
        "content": "xxx",
        "role": "system"
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 11733,
        "usage": {
          "input_tokens": 5429,
          "output_tokens": 263,
          "total_tokens": 5692,
          "cost": null,
          "cached_tokens": 0
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 4251,
        "usage": {
          "input_tokens": 5612,
          "output_tokens": 90,
          "total_tokens": 5702,
          "cost": null,
          "cached_tokens": 5428
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 7653,
        "usage": {
          "input_tokens": 5674,
          "output_tokens": 120,
          "total_tokens": 5794,
          "cost": null,
          "cached_tokens": 5611
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-001-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 2726,
        "usage": {
          "input_tokens": 5756,
          "output_tokens": 77,
          "total_tokens": 5833,
          "cost": null,
          "cached_tokens": 5673
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-001-redacted"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-002-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 5750,
        "usage": {
          "input_tokens": 8573,
          "output_tokens": 125,
          "total_tokens": 8698,
          "cost": null,
          "cached_tokens": 5755
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-002-redacted"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-003-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 2284,
        "usage": {
          "input_tokens": 8941,
          "output_tokens": 50,
          "total_tokens": 8991,
          "cost": null,
          "cached_tokens": 8572
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-003-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 8166,
        "usage": {
          "input_tokens": 9746,
          "output_tokens": 348,
          "total_tokens": 10094,
          "cost": null,
          "cached_tokens": 8940
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-004-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-005-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 30962,
        "usage": {
          "input_tokens": 9988,
          "output_tokens": 830,
          "total_tokens": 10818,
          "cost": null,
          "cached_tokens": 0
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-004-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-005-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-006-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-007-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-008-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 5243,
        "usage": {
          "input_tokens": 11587,
          "output_tokens": 231,
          "total_tokens": 11818,
          "cost": null,
          "cached_tokens": 9987
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-006-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-007-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-008-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 17008,
        "usage": {
          "input_tokens": 12834,
          "output_tokens": 449,
          "total_tokens": 13283,
          "cost": null,
          "cached_tokens": 11586
        }
      }
    ],
    "parent_task_run_id": null,
    "model_type": "task_run"
  }
}

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

leonardmq added 3 commits May 7, 2026 16:44
…g (Phase 1)

Phase 1 of multiturn turn-level usage tracking. Pure refactor and field
additions; no behavior change yet.

- Move `Usage` from `task_run.py` into new `libs/core/kiln_ai/datamodel/usage.py`,
  re-exported from `task_run` for backward compatibility.
- Add `Usage.from_trace` static helper for summing per-message usage.
- Add `cumulative_usage` field to `TaskRun` (defaults to `None`).
- Add `usage` field to `ChatCompletionAssistantMessageParamWrapper` and
  include it in `KILN_ONLY_MESSAGE_FIELDS` so it is stripped before
  sending messages to providers.
- New unit tests for `Usage.from_trace`, datamodel round-trip, and
  per-message usage sanitization.
- Spec docs and Phase 1 phase plan checked in under
  `specs/projects/multiturn_turn_usage/`.
Wire per-message usage capture through the non-streaming LiteLLM adapter
so each assistant trace message carries its own Usage, and compute
TaskRun.cumulative_usage in BaseAdapter.generate_run by summing turn
usages on top of any seeded TaskRun usage.

- LiteLlmAdapter: thread usage through _run_model_turn, _run,
  litellm_message_to_trace_message, all_messages_to_trace, and
  ModelTurnResult so per-turn Usage is attached to assistant messages.
- BaseAdapter.generate_run: aggregate per-message usage into
  cumulative_usage, respecting fresh vs seeded TaskRun behavior.
- Tests: Usage.from_trace, message sanitization, non-streaming
  round-trip, fresh vs seeded TaskRun cumulative usage.
- api_schema.d.ts: regenerated to expose cumulative_usage.
- Phase 2 plan added; implementation_plan.md checkbox marked complete.
Mirror Phase 2's non-streaming usage tracking in the streaming
orchestrator (AdapterStream) so streaming runs persist the same
per-message usage data as non-streaming runs.

- AdapterStream gains _message_usage: dict[int, Usage] state, populated
  alongside _message_latency in _stream_model_turn after each LLM call.
- __aiter__ passes _message_usage through to all_messages_to_trace at
  finalization, so each assistant message in RunOutput.trace carries
  per-call usage. cumulative_usage on TaskRun is then populated
  automatically by Usage.from_trace in BaseAdapter.generate_run.
- StreamingCompletion now forces stream_options={"include_usage": True}
  so LiteLLM's final assembled ModelResponse includes token counts and
  cost; caller-provided stream_options are merged without clobbering.
- Tests cover single-call, tool-call loop, tool-call interruption,
  empty-usage cases, and stream_options merging behavior.

Marks Phase 3 complete in implementation_plan.md and flips
phase_plans/phase_3.md status from draft to complete.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

This PR implements per-message MessageUsage, splits Usage into MessageUsage + Usage(subclass with latency), threads per-message usage through non-streaming and streaming adapters (forcing LiteLLM streaming include_usage), attaches per-message usage to trace messages, and computes TaskRun.cumulative_usage by summing per-message usage across full traces; schemas, sanitization, tests, and docs updated.

Changes

Multiturn Turn-Level Usage Tracking

Layer / File(s) Summary
Data Model & Schema
libs/core/kiln_ai/datamodel/usage.py, libs/core/kiln_ai/datamodel/task_run.py, libs/core/kiln_ai/datamodel/__init__.py, app/web_ui/src/lib/api_schema.d.ts
Introduce MessageUsage (tokens/cost only) and make Usage a MessageUsage subclass with total_llm_latency_ms; add MessageUsage.from_trace and __add__ semantics; add `TaskRun.cumulative_usage: MessageUsage
OpenAI Wrapper & Sanitization
libs/core/kiln_ai/utils/open_ai_types.py, app/web_ui/src/lib/api_schema.d.ts
Add ChatCompletionAssistantMessageParamWrapper.usage: Optional[MessageUsage] and include "usage" in KILN_ONLY_MESSAGE_FIELDS so sanitize_messages_for_provider() strips it before provider calls; generated schema updated.
Non-Streaming Adapter Plumbing
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
ModelTurnResult adds `message_usage: dict[int, MessageUsage]
Streaming Wrapper & AdapterStream
libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py, libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py
StreamingCompletion forces stream_options.include_usage=True (merged with caller options); AdapterStream adds _message_usage, records per-call call_usage and stores per-index MessageUsage, and passes message_usage to trace conversion at finalization.
Run Finalization / Cumulative Usage
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
BaseAdapter.generate_run sets TaskRun.cumulative_usage = MessageUsage.from_trace(trace) to sum per-message usage across the full trace (including seeded prior messages) while preserving TaskRun.usage as the run’s new-turn accumulator.
Tests — Datamodel & Aggregation
libs/core/kiln_ai/datamodel/test_usage.py, libs/core/kiln_ai/datamodel/test_example_models.py
Comprehensive tests for MessageUsage.__add__/Usage.__add__, MessageUsage.from_trace behavior (None/empty/mixed traces, dict or model inputs), serialization shape (no latency on per-message/cumulative usage), and backward-compatible loading of legacy latency keys.
Tests — Adapter & Streaming
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py, libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py, libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py, libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py, libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py
New/extended tests assert per-message MessageUsage capture for single responses and tool-call loops, streaming include_usage enforcement and merging behavior, return_on_tool_call interrupt handling, and generate_run cumulative_usage computation for fresh and seeded traces (including a paid end-to-end integration).
Tests — OpenAI Types & Sanitization
libs/core/kiln_ai/utils/test_open_ai_types.py
Tests updated to assert usage present in Kiln annotations, included in KILN_ONLY_MESSAGE_FIELDS, and stripped by sanitize_messages_for_provider; wrapper instantiation preserves usage.
Docs / Specs / Plans
specs/projects/multiturn_turn_usage/*
Architecture, functional spec, implementation plan, and phase plans updated to document MessageUsage/Usage split, adapter plumbing for non-streaming and streaming, streaming include_usage requirement, sanitization, test coverage, and migration/backcompat expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Kiln-AI/Kiln#308: Modifies usage-related schema and TaskRun fields; related to schema/data-model changes.
  • Kiln-AI/Kiln#1107: Related streaming/adapter include-usage and trace wiring changes.
  • Kiln-AI/Kiln#1340: Similar per-message usage/latency tracking updates across adapters and wrappers.

Suggested reviewers

  • chiang-daniel
  • scosman

Poem

🐰 I count the tokens, one by one,
Each call's tiny usage neatly spun,
Streams and runs I stitch and sum,
Traces sing when tallies come,
Hop — the totals are home, well done!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: include usage in each message' directly reflects the main objective: adding per-message usage tracking and cumulative usage to multiturn task runs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description provides a clear explanation of changes, includes an example of the resulting TaskRun structure, and links to related functionality (multiturn usage tracking). However, the required PR template sections are not fully populated.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements turn-level usage tracking for multiturn conversations by capturing per-LLM-call token usage and cost on assistant messages. It introduces a cumulative_usage field to TaskRun representing the sum of usage across the entire trace. Key changes include moving the Usage model to its own module, updating adapters to track per-message usage, and adding a from_trace helper. Feedback suggests that Usage.from_trace should also aggregate latency_ms from the trace into total_llm_latency_ms to ensure consistency with the TaskRun.usage field.

Comment thread libs/core/kiln_ai/datamodel/usage.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

  • libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py (100%)
  • libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py (100%)
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (100%)
  • libs/core/kiln_ai/datamodel/init.py (100%)
  • libs/core/kiln_ai/datamodel/task_run.py (100%)
  • libs/core/kiln_ai/datamodel/usage.py (98.2%): Missing lines 6
  • libs/core/kiln_ai/utils/open_ai_types.py (100%)

Summary

  • Total: 86 lines
  • Missing: 1 line
  • Coverage: 98%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/datamodel/usage.py

Lines 2-10

   2 
   3 from pydantic import BaseModel, Field
   4 
   5 if TYPE_CHECKING:
!  6     from kiln_ai.utils.open_ai_types import ChatCompletionMessageParam
   7 
   8 
   9 def _add_optional_int(a: int | None, b: int | None) -> int | None:
  10     if a is None and b is None:


Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py (1)

569-746: 💤 Low value

New TestAdapterStreamPerMessageUsage tests look correct and well-structured.

Coverage spans the four key streaming scenarios: single-call usage capture, per-turn isolation across tool-call loops, usage preservation under return_on_tool_call interruption, and robustness against empty Usage() returns. The use of pytest.approx at line 674 for the multi-call cost sum is the right approach for float equality.

One consistency note: line 598 guards call_args.args[2] with if len(call_args.args) >= 3 else None, while lines 664, 712, and 742 access .args[2] directly. All four should use the same pattern; if the implementation ever moves to keyword-arg passing for message_usage, the guarded form gives a clearer failure than IndexError.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py` around
lines 569 - 746, Tests access
mock_adapter.all_messages_to_trace.call_args.args[2] inconsistently; make them
all use the guarded extraction pattern to avoid IndexError if the implementation
switches to keyword args. Update the occurrences in
TestAdapterStreamPerMessageUsage (tests
test_per_message_usage_distinct_per_tool_call_loop,
test_per_message_usage_on_tool_call_interruption,
test_per_message_usage_handles_empty_usage) to mirror the earlier pattern
(inspect call_args = mock_adapter.all_messages_to_trace.call_args and set
message_usage_arg = call_args.args[2] if len(call_args.args) >= 3 else None),
then assert message_usage_arg is not None before using it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py`:
- Around line 569-746: Tests access
mock_adapter.all_messages_to_trace.call_args.args[2] inconsistently; make them
all use the guarded extraction pattern to avoid IndexError if the implementation
switches to keyword args. Update the occurrences in
TestAdapterStreamPerMessageUsage (tests
test_per_message_usage_distinct_per_tool_call_loop,
test_per_message_usage_on_tool_call_interruption,
test_per_message_usage_handles_empty_usage) to mirror the earlier pattern
(inspect call_args = mock_adapter.all_messages_to_trace.call_args and set
message_usage_arg = call_args.args[2] if len(call_args.args) >= 3 else None),
then assert message_usage_arg is not None before using it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b9f21be2-252f-4b70-bfa6-e816137c4d5e

📥 Commits

Reviewing files that changed from the base of the PR and between bef7fdd and c99b946.

📒 Files selected for processing (22)
  • app/web_ui/src/lib/api_schema.d.ts
  • libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py
  • libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py
  • libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/datamodel/task_run.py
  • libs/core/kiln_ai/datamodel/test_example_models.py
  • libs/core/kiln_ai/datamodel/test_usage.py
  • libs/core/kiln_ai/datamodel/usage.py
  • libs/core/kiln_ai/utils/open_ai_types.py
  • libs/core/kiln_ai/utils/test_open_ai_types.py
  • specs/projects/multiturn_turn_usage/architecture.md
  • specs/projects/multiturn_turn_usage/functional_spec.md
  • specs/projects/multiturn_turn_usage/implementation_plan.md
  • specs/projects/multiturn_turn_usage/phase_plans/phase_1.md
  • specs/projects/multiturn_turn_usage/phase_plans/phase_2.md
  • specs/projects/multiturn_turn_usage/phase_plans/phase_3.md
  • specs/projects/multiturn_turn_usage/project_overview.md

leonardmq added 3 commits May 7, 2026 20:13
…s (Phase 4)

Introduces MessageUsage with the five aggregatable fields and reshapes Usage
as a subclass that adds total_llm_latency_ms. Per-message usage and
TaskRun.cumulative_usage are re-typed to MessageUsage so the aggregated
latency field is dropped from sums where it has no meaning.
usage_from_response now returns MessageUsage. Tests cover the new add
semantics and loading legacy JSON that still carries total_llm_latency_ms.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py`:
- Around line 559-595: Remove the brittle conversation-shape thresholds by
deleting the two assertions that check pending_rounds_total >= 4 and
plain_user_messages >= 1; keep the existing per-message sanity checks (the
assert on len(chain) >= 1 and assert not chain[-1].is_toolcall_pending) and let
the later trace/chain assertions validate resume/path behavior using full_chain
and per_message_chain_lens instead of enforcing global counts (i.e., remove the
blocks referencing pending_rounds_total and plain_user_messages).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f3cf787d-f784-44fd-ac11-f2e890f4ad19

📥 Commits

Reviewing files that changed from the base of the PR and between be23547 and b0e2b43.

📒 Files selected for processing (1)
  • libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py

Comment thread libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py Outdated
Copy link
Copy Markdown
Collaborator

@tawnymanticore tawnymanticore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a quick diff against my WIP branch which does the same thing, all looks solid. claude said yours is better than mine XD

@leonardmq leonardmq merged commit d026d1b into main May 8, 2026
15 checks passed
@leonardmq leonardmq deleted the leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations branch May 8, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants