Skip to content

Commit 77a1107

Browse files
committed
per-tier /think budgets, persistent session counters, Anthropic cleanup
1 parent 7a4ba3f commit 77a1107

9 files changed

Lines changed: 326 additions & 43 deletions

File tree

AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,10 @@ This prints:
415415
- [ ] Docs: Updated README if user-visible change?
416416
- [ ] UX: Error messages clear and actionable?
417417

418+
## `notes/` directory
419+
420+
Gitignored scratchpad for helper files the user asks to be created there — typically markdown (current proposal/plan files, side references during a refactor, etc.). Safe to read for context; nothing in `notes/` ships with the repo.
421+
418422
## Non-Negotiable Rules
419423

420424
- Use idiomatic Rust and repository naming conventions.

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ All notable changes to Sofos are documented in this file.
3030

3131
### Fixed
3232

33+
- **Session token counters now persist across `--resume`.** Previously every counter (`total_input_tokens`, `total_output_tokens`, `total_cache_read_tokens`, `total_cache_creation_tokens`, `peak_single_turn_input_tokens`) reset to 0 on session reload — the cost line started from zero on resume and the gpt-5.4/5.5 cliff detector forgot whether the 272K threshold had already been crossed. All five counters are now saved as part of the session JSON and restored on load. Older session files (written before this release) default every counter to 0 via `#[serde(default)]` (matching prior behaviour). **Forward-compat note:** if a session file written by this release is later opened by an older sofos, the older binary silently drops the new fields on save; mixing versions against the same session file will lose the persisted counters until you settle on one version.
34+
- **Empty OpenAI reasoning shells are dropped instead of round-tripped.** When a reasoning output item arrives with `id` but no visible summary AND no `encrypted_content`, the wire shape `{type: "reasoning", id, summary: []}` carries no signal and may be rejected by some OpenAI models. Sofos now skips the block in that exact configuration; reasoning items with either a summary or encrypted CoT are preserved unchanged.
3335
- **Streaming Anthropic responses now round-trip server-side `compaction` content blocks.** The streaming path used to silently drop them, so on a streaming-enabled Anthropic session the next turn would re-send the pre-compaction history and Anthropic would re-compact (extra cost). The non-streaming path was already correct via serde; this brings streaming into parity.
3436
- **OpenAI reasoning items round-trip in the right order relative to their assistant message.** Reasoning items were being emitted in the input array *after* the message they preceded, breaking encrypted_content round-trip continuity on the server side. Now correctly placed before.
3537
- **Tool-cache breakpoint actually lands on Anthropic when OpenAI's web-search tool is registered.** The stamper used to no-op when `OpenAIWebSearch` was the last entry in the tool list, leaving Anthropic with no tool-defs cache breakpoint at all. Now finds the last *Anthropic-compatible* tool to stamp.

src/api/openai.rs

Lines changed: 124 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -157,43 +157,17 @@ impl OpenAIClient {
157157
}
158158
}
159159
"reasoning" => {
160-
// Pack the whole reasoning output item (id + every
161-
// summary entry + encrypted_content) into one block
162-
// so the next request can round-trip it as a single
163-
// `{type: "reasoning"}` input item. Splitting it
164-
// into one Summary block per entry would lose the
165-
// shared `id`/`encrypted_content` and force the
166-
// server to rederive the hidden chain-of-thought
167-
// on every tool round-trip.
168160
let summary_texts: Vec<String> = item
169161
.summary
170162
.into_iter()
171163
.filter(|s| s.summary_type == "summary_text" && !s.text.trim().is_empty())
172164
.map(|s| s.text)
173165
.collect();
174-
if let Some(rid) = item.id {
175-
// TODO: if `summary_texts` is empty AND
176-
// `item.encrypted_content` is None, we
177-
// round-trip an empty reasoning shell
178-
// (`{type: "reasoning", id, summary: []}`).
179-
// Theoretical wire-shape edge case — OpenAI
180-
// may reject it. Drop the block in that
181-
// configuration if it ever shows up in real
182-
// responses.
183-
content_blocks.push(ContentBlock::Reasoning {
184-
id: rid,
185-
summary: summary_texts,
186-
encrypted_content: item.encrypted_content,
187-
});
188-
} else {
189-
// No id means this isn't a real reasoning item
190-
// (e.g. an old payload predating the field) —
191-
// fall back to per-text Summary blocks so the
192-
// visible reasoning still surfaces.
193-
for text in summary_texts {
194-
content_blocks.push(ContentBlock::Summary { summary: text });
195-
}
196-
}
166+
content_blocks.extend(reasoning_item_to_blocks(
167+
item.id,
168+
summary_texts,
169+
item.encrypted_content,
170+
));
197171
}
198172
_ => {
199173
if std::env::var("SOFOS_DEBUG").is_ok() {
@@ -577,6 +551,46 @@ struct OpenAIInputTokensDetails {
577551
cached_tokens: Option<u32>,
578552
}
579553

554+
/// Convert a single OpenAI `reasoning` output item into the content
555+
/// blocks sofos stores in conversation history.
556+
///
557+
/// With an `id` present, the whole item (id + visible summary +
558+
/// encrypted CoT) packs into one [`ContentBlock::Reasoning`] so the
559+
/// next request can round-trip it as a single `{type: "reasoning"}`
560+
/// input — splitting into per-summary blocks would lose the shared
561+
/// `id`/`encrypted_content` and force the server to rederive the
562+
/// hidden chain-of-thought on every tool round-trip.
563+
///
564+
/// Two edge cases:
565+
/// 1. `id` present but neither summary nor encrypted_content — drop
566+
/// the block. The wire shape `{type: "reasoning", id, summary: []}`
567+
/// is rejected by some OpenAI models, and the block carries no
568+
/// signal worth round-tripping anyway.
569+
/// 2. No `id` (old payloads predating the field) — fall back to
570+
/// per-text [`ContentBlock::Summary`] blocks so the visible
571+
/// reasoning still surfaces.
572+
fn reasoning_item_to_blocks(
573+
id: Option<String>,
574+
summary_texts: Vec<String>,
575+
encrypted_content: Option<String>,
576+
) -> Vec<ContentBlock> {
577+
if let Some(rid) = id {
578+
if summary_texts.is_empty() && encrypted_content.is_none() {
579+
return Vec::new();
580+
}
581+
vec![ContentBlock::Reasoning {
582+
id: rid,
583+
summary: summary_texts,
584+
encrypted_content,
585+
}]
586+
} else {
587+
summary_texts
588+
.into_iter()
589+
.map(|text| ContentBlock::Summary { summary: text })
590+
.collect()
591+
}
592+
}
593+
580594
#[cfg(test)]
581595
mod tests {
582596
use super::*;
@@ -756,4 +770,83 @@ mod tests {
756770
let usage: OpenAIResponseUsage = serde_json::from_value(json).unwrap();
757771
assert!(usage.input_tokens_details.is_none());
758772
}
773+
774+
#[test]
775+
fn reasoning_item_drops_empty_shell_when_neither_summary_nor_encrypted() {
776+
// `{type: "reasoning", id, summary: []}` with no
777+
// encrypted_content carries no signal and some OpenAI models
778+
// reject the wire shape — drop instead of round-tripping.
779+
let blocks = reasoning_item_to_blocks(Some("rs_abc".to_string()), Vec::new(), None);
780+
assert!(
781+
blocks.is_empty(),
782+
"empty reasoning shell must be dropped, got {blocks:?}"
783+
);
784+
}
785+
786+
#[test]
787+
fn reasoning_item_keeps_block_when_encrypted_content_present() {
788+
// Encrypted CoT alone is enough signal to round-trip — the
789+
// server uses it to resume hidden reasoning even with no
790+
// visible summary.
791+
let blocks = reasoning_item_to_blocks(
792+
Some("rs_abc".to_string()),
793+
Vec::new(),
794+
Some("encrypted_blob".to_string()),
795+
);
796+
assert_eq!(blocks.len(), 1);
797+
assert!(matches!(
798+
&blocks[0],
799+
ContentBlock::Reasoning {
800+
summary,
801+
encrypted_content: Some(_),
802+
..
803+
} if summary.is_empty()
804+
));
805+
}
806+
807+
#[test]
808+
fn reasoning_item_keeps_block_when_summary_present() {
809+
let blocks = reasoning_item_to_blocks(
810+
Some("rs_abc".to_string()),
811+
vec!["thought".to_string()],
812+
None,
813+
);
814+
assert_eq!(blocks.len(), 1);
815+
assert!(matches!(
816+
&blocks[0],
817+
ContentBlock::Reasoning { summary, .. } if summary == &vec!["thought".to_string()]
818+
));
819+
}
820+
821+
#[test]
822+
fn reasoning_item_keeps_block_when_both_summary_and_encrypted_present() {
823+
// Common path — a reasoning model with `summary: "auto"` and
824+
// `include[reasoning.encrypted_content]` returns both. Both
825+
// must round-trip on the same block to preserve the link
826+
// between the visible summary and the hidden CoT.
827+
let blocks = reasoning_item_to_blocks(
828+
Some("rs_abc".to_string()),
829+
vec!["thought".to_string()],
830+
Some("encrypted_blob".to_string()),
831+
);
832+
assert_eq!(blocks.len(), 1);
833+
assert!(matches!(
834+
&blocks[0],
835+
ContentBlock::Reasoning {
836+
summary,
837+
encrypted_content: Some(_),
838+
..
839+
} if summary == &vec!["thought".to_string()]
840+
));
841+
}
842+
843+
#[test]
844+
fn reasoning_item_without_id_falls_back_to_summary_blocks() {
845+
// Old payloads predating the `id` field — the visible
846+
// reasoning still surfaces but loses its round-trip handle.
847+
let blocks = reasoning_item_to_blocks(None, vec!["a".to_string(), "b".to_string()], None);
848+
assert_eq!(blocks.len(), 2);
849+
assert!(matches!(blocks[0], ContentBlock::Summary { .. }));
850+
assert!(matches!(blocks[1], ContentBlock::Summary { .. }));
851+
}
759852
}

src/error.rs

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,16 @@ impl SofosError {
160160
"Set OPENAI_API_KEY environment variable or use --openai-api-key flag"
161161
.to_string(),
162162
)
163-
} else if msg.contains("thinking_budget") && msg.contains("max_tokens") {
164-
Some("Increase --max-tokens or decrease --thinking-budget".to_string())
163+
} else if msg.contains("max_tokens") && msg.contains("thinking-budget ceiling") {
164+
// Matches the validation message in `Repl::new`. The
165+
// suggestion no longer mentions `--thinking-budget`
166+
// because that flag is inert — the legacy budget is
167+
// picked per-effort tier in `request_builder`, not
168+
// from the flag value.
169+
Some(format!(
170+
"Increase --max-tokens above {} or set --reasoning-effort off",
171+
crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH
172+
))
165173
} else {
166174
None
167175
}
@@ -242,3 +250,40 @@ impl SofosError {
242250
}
243251

244252
pub type Result<T> = std::result::Result<T, SofosError>;
253+
254+
#[cfg(test)]
255+
mod tests {
256+
use super::*;
257+
258+
/// Regression: the validation message in `Repl::new` was rewritten
259+
/// (`thinking_budget >= max_tokens` → `max_tokens <= legacy thinking-
260+
/// budget ceiling`) when `/think` started picking budgets per-effort
261+
/// instead of from the inert flag. The classifier here must still
262+
/// recognise the new wording and surface a useful hint.
263+
#[test]
264+
fn config_hint_fires_on_new_max_tokens_validation_message() {
265+
// Mirror the exact format string used at the validation site so
266+
// a future rewording on either side breaks this test loudly.
267+
let err = SofosError::Config(format!(
268+
"max_tokens ({}) must exceed the legacy thinking-budget ceiling ({}). \
269+
Use a higher --max-tokens or set --reasoning-effort off.",
270+
crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH,
271+
crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH
272+
));
273+
let hint = err.hint().expect("hint must fire on the new message");
274+
assert!(
275+
hint.contains("Increase --max-tokens"),
276+
"hint should mention --max-tokens, got: {hint}"
277+
);
278+
assert!(
279+
hint.contains(&crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH.to_string()),
280+
"hint should embed the actual ceiling, got: {hint}"
281+
);
282+
// `--thinking-budget` is inert; the suggestion must not point
283+
// users at a flag that no longer does anything.
284+
assert!(
285+
!hint.contains("--thinking-budget"),
286+
"suggestion must not advise tweaking the inert --thinking-budget flag, got: {hint}"
287+
);
288+
}
289+
}

src/main.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,10 +115,15 @@ fn main() -> Result<()> {
115115
crate::api::anthropic::effort_label(cli.reasoning_effort)
116116
));
117117
} else if cli.reasoning_effort.is_enabled() {
118+
// Display the per-effort tier budget actually sent
119+
// (`request_builder` no longer reads the inert
120+
// `--thinking-budget` flag) so the startup banner matches
121+
// what hits the API.
122+
let budget = crate::api::anthropic::legacy_thinking_budget(cli.reasoning_effort);
118123
startup_banner.push_str(&format!(
119124
"{} (budget: {} tokens)\n",
120125
"Extended thinking: enabled".bright_green(),
121-
cli.thinking_budget
126+
budget
122127
));
123128
}
124129

src/repl/mod.rs

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ use crate::api::{CreateMessageRequest, ImageSource, LlmClient, MessageContentBlo
1414
use crate::config::{ModelConfig, NORMAL_MODE_MESSAGE, SAFE_MODE_MESSAGE};
1515
use crate::error::{Result, SofosError};
1616
use crate::mcp::McpManager;
17-
use crate::session::{DisplayMessage, HistoryManager, SessionMetadata, SessionState};
17+
use crate::session::{
18+
DisplayMessage, HistoryManager, SessionMetadata, SessionState, SessionTokenCounters,
19+
};
1820
use crate::tools::ToolExecutor;
1921
use crate::tools::image::{ImageLoader, ImageReference, extract_image_references};
2022
use crate::ui::{UI, set_safe_mode_cursor_style};
@@ -251,7 +253,13 @@ impl Repl {
251253
format!("effort: {}", crate::api::anthropic::effort_label(effort))
252254
} else if matches!(self.client, Anthropic(_)) {
253255
if effort.is_enabled() {
254-
format!("thinking: {} tok", self.model_config.thinking_budget)
256+
// The legacy non-adaptive shape's `budget_tokens` is
257+
// picked from the effort tier in `request_builder`, not
258+
// from the (inert) `--thinking-budget` flag. Display the
259+
// value we actually send so the status line reflects
260+
// reality.
261+
let budget = crate::api::anthropic::legacy_thinking_budget(effort);
262+
format!("thinking: {} tok", budget)
255263
} else {
256264
"thinking: off".to_string()
257265
}
@@ -717,6 +725,13 @@ impl Repl {
717725
self.session_state.conversation.messages(),
718726
&self.session_state.display_messages,
719727
self.session_state.conversation.system_prompt(),
728+
SessionTokenCounters {
729+
total_input_tokens: self.session_state.total_input_tokens,
730+
total_output_tokens: self.session_state.total_output_tokens,
731+
total_cache_read_tokens: self.session_state.total_cache_read_tokens,
732+
total_cache_creation_tokens: self.session_state.total_cache_creation_tokens,
733+
peak_single_turn_input_tokens: self.session_state.peak_single_turn_input_tokens,
734+
},
720735
)?;
721736

722737
Ok(())
@@ -789,10 +804,15 @@ impl Repl {
789804
);
790805
} else if matches!(self.client, Anthropic(_)) {
791806
if effort.is_enabled() {
807+
// Display the per-effort tier budget actually sent
808+
// (`request_builder` no longer reads the inert
809+
// `--thinking-budget` flag) so the `/think` output
810+
// matches what hits the API.
811+
let budget = crate::api::anthropic::legacy_thinking_budget(effort);
792812
println!(
793813
"\n{} (budget: {} tokens)\n",
794814
"Extended thinking: enabled".bright_green(),
795-
self.model_config.thinking_budget
815+
budget
796816
);
797817
} else {
798818
println!("\n{}\n", "Extended thinking: disabled".bright_yellow());
@@ -871,6 +891,19 @@ impl Repl {
871891
.conversation
872892
.restore_messages(session.api_messages.clone());
873893
self.session_state.display_messages = session.display_messages.clone();
894+
// Restore every persisted token counter so the cost summary
895+
// stays accurate across the resume. Older session files written
896+
// before persistence was added default the whole
897+
// `token_counters` struct to all-zero via `#[serde(default)]`
898+
// on each field, matching the pre-persistence behaviour for
899+
// those old files.
900+
self.session_state.total_input_tokens = session.token_counters.total_input_tokens;
901+
self.session_state.total_output_tokens = session.token_counters.total_output_tokens;
902+
self.session_state.total_cache_read_tokens = session.token_counters.total_cache_read_tokens;
903+
self.session_state.total_cache_creation_tokens =
904+
session.token_counters.total_cache_creation_tokens;
905+
self.session_state.peak_single_turn_input_tokens =
906+
session.token_counters.peak_single_turn_input_tokens;
874907

875908
println!(
876909
"{} {} ({} messages)",

0 commit comments

Comments
 (0)