feat: include usage in each message by leonardmq · Pull Request #1379 · Kiln-AI/Kiln

leonardmq · 2026-05-07T10:47:41Z

What does this PR do?

Add usage in each message for multiturn task runs and cumulative usage.

Adds new usage info blocks into the TaskRun to support the multiturn case:

trace[x].usage -> contains the usage info for the LLM call that produced this output (noting that in multiturn, not every message is necessarily produced by an LLM, it could be manually injected into the prior_trace)
cumulative_usage -> the total usage info across all the turns included in the trace; this is the same as sum(map(task_run.trace => (message) => message.usage))

Produces TaskRun->trace like this:

{
  "id": "8221839524_[uuid-redacted]",
  "task_run": {
    "v": 1,
    "id": "8221839524_[uuid-redacted]",
    "path": null,
    "created_at": "2026-05-07T21:27:55.900923+08:00",
    "created_by": "xxx",
    "input": "xxx",
    "input_source": {
      "type": "human",
      "properties": {
        "created_by": "xxx"
      },
      "run_config": null
    },
    "output": {
      "rating": null,
      "model_type": "task_output"
    },
    "repair_instructions": null,
    "repaired_output": null,
    "intermediate_outputs": {
      "reasoning": "xxx"
    },
    "tags": [],
    "usage": {
      "input_tokens": 12834,
      "output_tokens": 449,
      "total_tokens": 13283,
      "cost": null,
      "cached_tokens": 11586,
      "total_llm_latency_ms": 17008
    },
    "cumulative_usage": {
      "input_tokens": 84140,
      "output_tokens": 2583,
      "total_tokens": 86723,
      "cost": null,
      "cached_tokens": 61552
    },
    "trace": [
      {
        "content": "xxx",
        "role": "system"
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 11733,
        "usage": {
          "input_tokens": 5429,
          "output_tokens": 263,
          "total_tokens": 5692,
          "cost": null,
          "cached_tokens": 0
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 4251,
        "usage": {
          "input_tokens": 5612,
          "output_tokens": 90,
          "total_tokens": 5702,
          "cost": null,
          "cached_tokens": 5428
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 7653,
        "usage": {
          "input_tokens": 5674,
          "output_tokens": 120,
          "total_tokens": 5794,
          "cost": null,
          "cached_tokens": 5611
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-001-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 2726,
        "usage": {
          "input_tokens": 5756,
          "output_tokens": 77,
          "total_tokens": 5833,
          "cost": null,
          "cached_tokens": 5673
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-001-redacted"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-002-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 5750,
        "usage": {
          "input_tokens": 8573,
          "output_tokens": 125,
          "total_tokens": 8698,
          "cost": null,
          "cached_tokens": 5755
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-002-redacted"
      },
      {
        "role": "assistant",
        "content": null,
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-003-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 2284,
        "usage": {
          "input_tokens": 8941,
          "output_tokens": 50,
          "total_tokens": 8991,
          "cost": null,
          "cached_tokens": 8572
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-003-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 8166,
        "usage": {
          "input_tokens": 9746,
          "output_tokens": 348,
          "total_tokens": 10094,
          "cost": null,
          "cached_tokens": 8940
        }
      },
      {
        "content": "xxx",
        "role": "user"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-004-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-005-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 30962,
        "usage": {
          "input_tokens": 9988,
          "output_tokens": 830,
          "total_tokens": 10818,
          "cost": null,
          "cached_tokens": 0
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-004-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-005-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "tool_calls": [
          {
            "id": "tool-call-006-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-007-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          },
          {
            "id": "tool-call-008-redacted",
            "function": {
              "arguments": "xxx",
              "name": "some_tool"
            },
            "type": "function"
          }
        ],
        "latency_ms": 5243,
        "usage": {
          "input_tokens": 11587,
          "output_tokens": 231,
          "total_tokens": 11818,
          "cost": null,
          "cached_tokens": 9987
        }
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-006-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-007-redacted"
      },
      {
        "content": "xxx",
        "role": "tool",
        "tool_call_id": "tool-call-008-redacted"
      },
      {
        "role": "assistant",
        "content": "xxx",
        "reasoning_content": "xxx",
        "latency_ms": 17008,
        "usage": {
          "input_tokens": 12834,
          "output_tokens": 449,
          "total_tokens": 13283,
          "cost": null,
          "cached_tokens": 11586
        }
      }
    ],
    "parent_task_run_id": null,
    "model_type": "task_run"
  }
}

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

…g (Phase 1) Phase 1 of multiturn turn-level usage tracking. Pure refactor and field additions; no behavior change yet. - Move `Usage` from `task_run.py` into new `libs/core/kiln_ai/datamodel/usage.py`, re-exported from `task_run` for backward compatibility. - Add `Usage.from_trace` static helper for summing per-message usage. - Add `cumulative_usage` field to `TaskRun` (defaults to `None`). - Add `usage` field to `ChatCompletionAssistantMessageParamWrapper` and include it in `KILN_ONLY_MESSAGE_FIELDS` so it is stripped before sending messages to providers. - New unit tests for `Usage.from_trace`, datamodel round-trip, and per-message usage sanitization. - Spec docs and Phase 1 phase plan checked in under `specs/projects/multiturn_turn_usage/`.

Wire per-message usage capture through the non-streaming LiteLLM adapter so each assistant trace message carries its own Usage, and compute TaskRun.cumulative_usage in BaseAdapter.generate_run by summing turn usages on top of any seeded TaskRun usage. - LiteLlmAdapter: thread usage through _run_model_turn, _run, litellm_message_to_trace_message, all_messages_to_trace, and ModelTurnResult so per-turn Usage is attached to assistant messages. - BaseAdapter.generate_run: aggregate per-message usage into cumulative_usage, respecting fresh vs seeded TaskRun behavior. - Tests: Usage.from_trace, message sanitization, non-streaming round-trip, fresh vs seeded TaskRun cumulative usage. - api_schema.d.ts: regenerated to expose cumulative_usage. - Phase 2 plan added; implementation_plan.md checkbox marked complete.

Mirror Phase 2's non-streaming usage tracking in the streaming orchestrator (AdapterStream) so streaming runs persist the same per-message usage data as non-streaming runs. - AdapterStream gains _message_usage: dict[int, Usage] state, populated alongside _message_latency in _stream_model_turn after each LLM call. - __aiter__ passes _message_usage through to all_messages_to_trace at finalization, so each assistant message in RunOutput.trace carries per-call usage. cumulative_usage on TaskRun is then populated automatically by Usage.from_trace in BaseAdapter.generate_run. - StreamingCompletion now forces stream_options={"include_usage": True} so LiteLLM's final assembled ModelResponse includes token counts and cost; caller-provided stream_options are merged without clobbering. - Tests cover single-call, tool-call loop, tool-call interruption, empty-usage cases, and stream_options merging behavior. Marks Phase 3 complete in implementation_plan.md and flips phase_plans/phase_3.md status from draft to complete.

coderabbitai · 2026-05-07T10:47:54Z

Walkthrough

This PR implements per-message MessageUsage, splits Usage into MessageUsage + Usage(subclass with latency), threads per-message usage through non-streaming and streaming adapters (forcing LiteLLM streaming include_usage), attaches per-message usage to trace messages, and computes TaskRun.cumulative_usage by summing per-message usage across full traces; schemas, sanitization, tests, and docs updated.

Changes

Multiturn Turn-Level Usage Tracking

Layer / File(s)	Summary
Data Model & Schema `libs/core/kiln_ai/datamodel/usage.py`, `libs/core/kiln_ai/datamodel/task_run.py`, `libs/core/kiln_ai/datamodel/__init__.py`, `app/web_ui/src/lib/api_schema.d.ts`	Introduce `MessageUsage` (tokens/cost only) and make `Usage` a `MessageUsage` subclass with `total_llm_latency_ms`; add `MessageUsage.from_trace` and `__add__` semantics; add `TaskRun.cumulative_usage: MessageUsage
OpenAI Wrapper & Sanitization `libs/core/kiln_ai/utils/open_ai_types.py`, `app/web_ui/src/lib/api_schema.d.ts`	Add `ChatCompletionAssistantMessageParamWrapper.usage: Optional[MessageUsage]` and include `"usage"` in `KILN_ONLY_MESSAGE_FIELDS` so `sanitize_messages_for_provider()` strips it before provider calls; generated schema updated.
Non-Streaming Adapter Plumbing `libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py`	`ModelTurnResult` adds `message_usage: dict[int, MessageUsage]
Streaming Wrapper & AdapterStream `libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py`, `libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py`	`StreamingCompletion` forces `stream_options.include_usage=True` (merged with caller options); `AdapterStream` adds `_message_usage`, records per-call `call_usage` and stores per-index `MessageUsage`, and passes `message_usage` to trace conversion at finalization.
Run Finalization / Cumulative Usage `libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`	`BaseAdapter.generate_run` sets `TaskRun.cumulative_usage = MessageUsage.from_trace(trace)` to sum per-message usage across the full trace (including seeded prior messages) while preserving `TaskRun.usage` as the run’s new-turn accumulator.
Tests — Datamodel & Aggregation `libs/core/kiln_ai/datamodel/test_usage.py`, `libs/core/kiln_ai/datamodel/test_example_models.py`	Comprehensive tests for `MessageUsage.__add__`/`Usage.__add__`, `MessageUsage.from_trace` behavior (None/empty/mixed traces, dict or model inputs), serialization shape (no latency on per-message/cumulative usage), and backward-compatible loading of legacy latency keys.
Tests — Adapter & Streaming `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`, `libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py`, `libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py`, `libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`, `libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py`	New/extended tests assert per-message MessageUsage capture for single responses and tool-call loops, streaming include_usage enforcement and merging behavior, return_on_tool_call interrupt handling, and `generate_run` cumulative_usage computation for fresh and seeded traces (including a paid end-to-end integration).
Tests — OpenAI Types & Sanitization `libs/core/kiln_ai/utils/test_open_ai_types.py`	Tests updated to assert `usage` present in Kiln annotations, included in `KILN_ONLY_MESSAGE_FIELDS`, and stripped by `sanitize_messages_for_provider`; wrapper instantiation preserves `usage`.
Docs / Specs / Plans `specs/projects/multiturn_turn_usage/*`	Architecture, functional spec, implementation plan, and phase plans updated to document MessageUsage/Usage split, adapter plumbing for non-streaming and streaming, streaming include_usage requirement, sanitization, test coverage, and migration/backcompat expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Kiln-AI/Kiln#308: Modifies usage-related schema and TaskRun fields; related to schema/data-model changes.
Kiln-AI/Kiln#1107: Related streaming/adapter include-usage and trace wiring changes.
Kiln-AI/Kiln#1340: Similar per-message usage/latency tracking updates across adapters and wrappers.

Suggested reviewers

chiang-daniel
scosman

Poem

🐰 I count the tokens, one by one,
Each call's tiny usage neatly spun,
Streams and runs I stitch and sum,
Traces sing when tallies come,
Hop — the totals are home, well done!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: include usage in each message' directly reflects the main objective: adding per-message usage tracking and cumulative usage to multiturn task runs.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description provides a clear explanation of changes, includes an example of the resulting TaskRun structure, and links to related functionality (multiturn usage tracking). However, the required PR template sections are not fully populated.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements turn-level usage tracking for multiturn conversations by capturing per-LLM-call token usage and cost on assistant messages. It introduces a cumulative_usage field to TaskRun representing the sum of usage across the entire trace. Key changes include moving the Usage model to its own module, updating adapters to track per-message usage, and adding a from_trace helper. Feedback suggests that Usage.from_trace should also aggregate latency_ms from the trace into total_llm_latency_ms to ensure consistency with the TaskRun.usage field.

github-actions · 2026-05-07T10:50:22Z

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py (100%)
libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py (100%)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (100%)
libs/core/kiln_ai/datamodel/init.py (100%)
libs/core/kiln_ai/datamodel/task_run.py (100%)
libs/core/kiln_ai/datamodel/usage.py (98.2%): Missing lines 6
libs/core/kiln_ai/utils/open_ai_types.py (100%)

Summary

Total: 86 lines
Missing: 1 line
Coverage: 98%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/datamodel/usage.py

Lines 2-10

   2 
   3 from pydantic import BaseModel, Field
   4 
   5 if TYPE_CHECKING:
!  6     from kiln_ai.utils.open_ai_types import ChatCompletionMessageParam
   7 
   8 
   9 def _add_optional_int(a: int | None, b: int | None) -> int | None:
  10     if a is None and b is None:

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

coderabbitai

🧹 Nitpick comments (1)

libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py (1)
569-746: 💤 Low value

New TestAdapterStreamPerMessageUsage tests look correct and well-structured.

Coverage spans the four key streaming scenarios: single-call usage capture, per-turn isolation across tool-call loops, usage preservation under return_on_tool_call interruption, and robustness against empty Usage() returns. The use of pytest.approx at line 674 for the multi-call cost sum is the right approach for float equality.

One consistency note: line 598 guards call_args.args[2] with if len(call_args.args) >= 3 else None, while lines 664, 712, and 742 access .args[2] directly. All four should use the same pattern; if the implementation ever moves to keyword-arg passing for message_usage, the guarded form gives a clearer failure than IndexError.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py` around
lines 569 - 746, Tests access
mock_adapter.all_messages_to_trace.call_args.args[2] inconsistently; make them
all use the guarded extraction pattern to avoid IndexError if the implementation
switches to keyword args. Update the occurrences in
TestAdapterStreamPerMessageUsage (tests
test_per_message_usage_distinct_per_tool_call_loop,
test_per_message_usage_on_tool_call_interruption,
test_per_message_usage_handles_empty_usage) to mirror the earlier pattern
(inspect call_args = mock_adapter.all_messages_to_trace.call_args and set
message_usage_arg = call_args.args[2] if len(call_args.args) >= 3 else None),
then assert message_usage_arg is not None before using it.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py`:
- Around line 569-746: Tests access
mock_adapter.all_messages_to_trace.call_args.args[2] inconsistently; make them
all use the guarded extraction pattern to avoid IndexError if the implementation
switches to keyword args. Update the occurrences in
TestAdapterStreamPerMessageUsage (tests
test_per_message_usage_distinct_per_tool_call_loop,
test_per_message_usage_on_tool_call_interruption,
test_per_message_usage_handles_empty_usage) to mirror the earlier pattern
(inspect call_args = mock_adapter.all_messages_to_trace.call_args and set
message_usage_arg = call_args.args[2] if len(call_args.args) >= 3 else None),
then assert message_usage_arg is not None before using it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b9f21be2-252f-4b70-bfa6-e816137c4d5e

📥 Commits

Reviewing files that changed from the base of the PR and between bef7fdd and c99b946.

📒 Files selected for processing (22)

app/web_ui/src/lib/api_schema.d.ts
libs/core/kiln_ai/adapters/litellm_utils/litellm_streaming.py
libs/core/kiln_ai/adapters/litellm_utils/test_litellm_streaming.py
libs/core/kiln_ai/adapters/model_adapters/adapter_stream.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_adapter_stream.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/datamodel/task_run.py
libs/core/kiln_ai/datamodel/test_example_models.py
libs/core/kiln_ai/datamodel/test_usage.py
libs/core/kiln_ai/datamodel/usage.py
libs/core/kiln_ai/utils/open_ai_types.py
libs/core/kiln_ai/utils/test_open_ai_types.py
specs/projects/multiturn_turn_usage/architecture.md
specs/projects/multiturn_turn_usage/functional_spec.md
specs/projects/multiturn_turn_usage/implementation_plan.md
specs/projects/multiturn_turn_usage/phase_plans/phase_1.md
specs/projects/multiturn_turn_usage/phase_plans/phase_2.md
specs/projects/multiturn_turn_usage/phase_plans/phase_3.md
specs/projects/multiturn_turn_usage/project_overview.md

…s (Phase 4) Introduces MessageUsage with the five aggregatable fields and reshapes Usage as a subclass that adds total_llm_latency_ms. Per-message usage and TaskRun.cumulative_usage are re-typed to MessageUsage so the aggregated latency field is dropped from sums where it has no meaning. usage_from_response now returns MessageUsage. Tests cover the new add semantics and loading legacy JSON that still carries total_llm_latency_ms.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py`:
- Around line 559-595: Remove the brittle conversation-shape thresholds by
deleting the two assertions that check pending_rounds_total >= 4 and
plain_user_messages >= 1; keep the existing per-message sanity checks (the
assert on len(chain) >= 1 and assert not chain[-1].is_toolcall_pending) and let
the later trace/chain assertions validate resume/path behavior using full_chain
and per_message_chain_lens instead of enforcing global counts (i.e., remove the
blocks referencing pending_rounds_total and plain_user_messages).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f3cf787d-f784-44fd-ac11-f2e890f4ad19

📥 Commits

Reviewing files that changed from the base of the PR and between be23547 and b0e2b43.

📒 Files selected for processing (1)

libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py

…ix-usage-cost-summing-in-chat-multi-turn-conversations

tawnymanticore

did a quick diff against my WIP branch which does the same thing, all looks solid. claude said yours is better than mine XD

leonardmq added 3 commits May 7, 2026 16:44

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread libs/core/kiln_ai/datamodel/usage.py

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

leonardmq added 3 commits May 7, 2026 20:13

test: paid integration test for token counting

b0e2b43

test: more asserts

d35a52e

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread libs/core/kiln_ai/adapters/model_adapters/test_multiturn_usage_paid.py Outdated

leonardmq added 3 commits May 8, 2026 23:23

Merge branch 'main' of github.com:Kiln-AI/Kiln into leonard/kil-606-f…

101a079

…ix-usage-cost-summing-in-chat-multi-turn-conversations

chore: remove huge paid integration test (too hard to maintain)

ebb3af0

chore: remove spec

aadb21c

leonardmq requested review from chiang-daniel, sfierro and tawnymanticore May 8, 2026 15:47

tawnymanticore approved these changes May 8, 2026

View reviewed changes

leonardmq merged commit d026d1b into main May 8, 2026
15 checks passed

leonardmq deleted the leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations branch May 8, 2026 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: include usage in each message#1379

feat: include usage in each message#1379
leonardmq merged 9 commits into
mainfrom
leonard/kil-606-fix-usage-cost-summing-in-chat-multi-turn-conversations

leonardmq commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

libs/core/kiln_ai/datamodel/usage.py

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

tawnymanticore left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leonardmq commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklists

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Line-by-line

libs/core/kiln_ai/datamodel/usage.py

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tawnymanticore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leonardmq commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading