You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/_advanced/models.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,10 +225,11 @@ puts cost.input
225
225
puts cost.output
226
226
puts cost.cache_read
227
227
puts cost.cache_write
228
+
puts cost.thinking
228
229
puts cost.total
229
230
```
230
231
231
-
Costs use RubyLLM's normalized token buckets: standard input, output, cache read, and cache write. See [Tracking Token Usage]({% link _core_features/chat.md %}#tracking-token-usage) for the provider comparison table and what RubyLLM exposes consistently across providers.
232
+
Costs use RubyLLM's normalized token buckets: standard input, billable output, cache read, cache write, and separately priced thinking when the model registry exposes a distinct reasoning-token price. See [Tracking Token Usage]({% link _core_features/chat.md %}#tracking-token-usage) for the provider comparison table and what RubyLLM exposes consistently across providers.
232
233
233
234
Most applications use the shorter helpers on messages, chats, and agents:
If pricing is incomplete for tokens that were used, the affected cost and `cost.total` return `nil`.
249
+
If pricing is incomplete for tokens that were used, the affected cost and `cost.total` return `nil`. Cost helpers cover token-priced conversation usage; provider-specific add-ons such as search-query charges remain available in the provider's raw usage payload.
249
250
250
251
## Connecting to Custom Endpoints & Using Unlisted Models
Copy file name to clipboardExpand all lines: docs/_advanced/upgrading.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ Use the new cache names in new code:
40
40
41
41
```ruby
42
42
response.input_tokens # Standard input tokens
43
-
response.output_tokens #Output tokens
43
+
response.output_tokens #Billable output tokens
44
44
response.cache_read_tokens # Tokens served from prompt cache
45
45
response.cache_write_tokens # Tokens written to prompt cache
46
46
@@ -79,6 +79,8 @@ agent.cost.total
79
79
80
80
Cost helpers are available from 1.15 onward. They return `nil` for any cost bucket whose pricing is missing, and `cost.total` is also `nil` when a used bucket has incomplete pricing.
81
81
82
+
`thinking_tokens` remains available from 1.10. From 1.15 onward, `output_tokens` is normalized as the billable output bucket. Do not add `thinking_tokens` to `output_tokens` yourself; RubyLLM includes thinking in output when the provider bills it as output, and exposes `cost.thinking` only for models with distinct reasoning-token pricing.
83
+
82
84
See [Tracking Token Usage]({% link _core_features/chat.md %}#tracking-token-usage) for the provider comparison table and the exact normalized token semantics RubyLLM exposes.
@@ -660,9 +661,11 @@ This means the same RubyLLM code works across providers: `input_tokens` for stan
660
661
661
662
`cache_read_tokens` and `cache_write_tokens` are available from v1.15+ and are also exposed as `response.tokens.cache_read` and `response.tokens.cache_write`. The older `cached_tokens` and `cache_creation_tokens` methods remain available for compatibility with v1.9.0+ code.
662
663
663
-
Thinking token usage is available via `response.thinking_tokens` and `response.tokens.thinking` when providers report it. For providers that do not include thinking token counts, these values remain `nil`.
664
+
Thinking token usage is available via `response.thinking_tokens` and `response.tokens.thinking` when providers report it. For most providers, thinking/reasoning tokens are a breakdown of output work, not an extra bucket to add yourself. RubyLLM keeps `output_tokens` as the billable output bucket: OpenAI-style providers that include reasoning in completion tokens stay as-is, while OpenAI-compatible providers that report reasoning outside completion tokens are normalized so `output_tokens` includes the billable generated total.
664
665
665
-
Cost helpers are available from v1.15+. RubyLLM uses token usage from the provider and pricing from the model registry. If the registry is missing pricing for tokens that were used, the affected cost and `cost.total` return `nil` instead of pretending the cost was zero.
666
+
When a model has distinct reasoning-token pricing, `response.cost.thinking` prices that bucket separately. Otherwise, thinking tokens are treated as part of `response.cost.output` and `response.cost.thinking` stays `nil`.
667
+
668
+
Cost helpers are available from v1.15+. RubyLLM uses token usage from the provider and pricing from the model registry. If the registry is missing pricing for tokens that were used, the affected cost and `cost.total` return `nil` instead of pretending the cost was zero. These helpers cover token-priced conversation usage; provider-specific add-ons such as search-query charges are left to the provider's raw usage payload.
666
669
667
670
Refer to the [Working with Models Guide]({% link _advanced/models.md %}) for details on accessing model-specific pricing.
Copy file name to clipboardExpand all lines: docs/_core_features/streaming.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ Key attributes of a `Chunk`:
58
58
*`chunk.model_id`: The model generating the response (usually present).
59
59
*`chunk.tool_calls`: A hash containing partial or complete tool call information if the model is invoking a [Tool]({% link _core_features/tools.md %}). The arguments might be streamed incrementally.
60
60
*`chunk.input_tokens`: Standard input tokens for the request (often `nil` until the final chunk). From v1.15 onward, cache reads and writes are exposed separately as `chunk.cache_read_tokens` and `chunk.cache_write_tokens` when providers report them.
61
-
*`chunk.output_tokens`: Cumulative output tokens *up to this chunk* (behavior varies by provider, often only accurate in the final chunk).
61
+
*`chunk.output_tokens`: Cumulative billable output tokens *up to this chunk* (behavior varies by provider, often only accurate in the final chunk). From v1.15 onward, this includes thinking/reasoning tokens when the provider bills them as output.
62
62
*`chunk.thinking`: Optional thinking output when providers stream it.
63
63
64
64
> Do not rely on token counts being present or accurate in every chunk. They are typically finalized only in the last chunk or the final returned message.
Copy file name to clipboardExpand all lines: docs/_core_features/thinking.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,8 @@ response.thinking&.text
96
96
response.thinking_tokens
97
97
```
98
98
99
+
`thinking_tokens` is usually a breakdown of generated output work. From v1.15 onward, RubyLLM normalizes `output_tokens` as the billable output bucket, so you should not add `thinking_tokens` to `output_tokens` for cost calculations. When a model has distinct reasoning-token pricing, the cost is exposed separately as `response.cost.thinking`.
100
+
99
101
### Upgrading Existing Installations
100
102
101
103
For 1.10 upgrades, consider using the [upgrade guide]({% link _advanced/upgrading.md %}#upgrade-to-1-10) to run the generator.
0 commit comments