You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,10 +23,14 @@ All notable changes to Sofos are documented in this file.
23
23
-**Default reasoning effort is now `medium`** (was `high`). Verified roughly 3–5× cheaper hidden-reasoning bill on routine coding turns. Use `-e high` or `/think high` for hard tasks.
24
24
-**Reasoning summaries are suppressed on the OpenAI thinking-off path.** When `effort: off`, sofos sends `reasoning.effort = "minimal"` with no `summary` field, so the model returns no summary blocks at all (they bill as output tokens).
25
25
-**Model context windows corrected.** Claude Opus 4.7 / 4.6 and Sonnet 4.6 are 1,000,000 tokens (were 200K in the table); GPT-5.4 and GPT-5.5 are 1,050,000 tokens (were 400K). The drop-trim safety floor is now per-model API-aware (95% of the real window) instead of a flat 250K.
26
-
-**Anthropic beta header now opts into both `token-efficient-tools-2025-02-19` and `compact-2026-01-12`.**
26
+
-**Anthropic beta header is now picked per-request based on `ModelInfo::supports_server_compaction`.**`token-efficient-tools-2025-02-19` ships on every Anthropic request; `compact-2026-01-12` only ships against models that actually support it (Opus 4.7, Opus 4.6, Sonnet 4.6). Removes the implicit dependency on Anthropic's "ignore unknown beta tokens" policy — if validation ever tightens, only the right requests carry the token.
27
+
-**`/think low|medium|high` on legacy non-adaptive Anthropic models now maps to distinct `budget_tokens` values** (`Low=1024`, `Medium=5120`, `High=16384`) instead of all three collapsing to the `--thinking-budget` flag value. Affects Sonnet 4.5, Opus 4.5/4.6, Haiku 4.5; adaptive models (Opus 4.7+) and OpenAI are unchanged. The `--thinking-budget` CLI flag is now inert on every path; kept for backwards-compatibility and will be removed in a later release.
28
+
-**Startup validation now requires `--max-tokens > 16384` whenever reasoning effort is enabled**, regardless of the current model. Catches a configuration that would have silently 400'd the next request after a runtime `/model` swap to a non-adaptive Anthropic model. Default `--max-tokens 32768` already satisfies the new check.
29
+
-**Server-side compaction trigger clamped to Anthropic's documented 50K floor.** Defends against a hypothetical future small-window model entry whose `auto_compact_at` would otherwise drop below 50K and 400 the request.
27
30
28
31
### Fixed
29
32
33
+
-**Streaming Anthropic responses now round-trip server-side `compaction` content blocks.** The streaming path used to silently drop them, so on a streaming-enabled Anthropic session the next turn would re-send the pre-compaction history and Anthropic would re-compact (extra cost). The non-streaming path was already correct via serde; this brings streaming into parity.
30
34
-**OpenAI reasoning items round-trip in the right order relative to their assistant message.** Reasoning items were being emitted in the input array *after* the message they preceded, breaking encrypted_content round-trip continuity on the server side. Now correctly placed before.
31
35
-**Tool-cache breakpoint actually lands on Anthropic when OpenAI's web-search tool is registered.** The stamper used to no-op when `OpenAIWebSearch` was the last entry in the tool list, leaving Anthropic with no tool-defs cache breakpoint at all. Now finds the last *Anthropic-compatible* tool to stamp.
32
36
-**OpenAI `Reasoning` blocks no longer leak to Anthropic on provider switch.** A session that started on OpenAI accumulates `Reasoning` content blocks; switching to Anthropic mid-session would have sent those blocks to the Messages API, which doesn't recognise the type. The Anthropic sanitiser now drops them.
--thinking-budget <N> Token budget for older Claude models with manual budgets (default: 5120, must be < max-tokens). Ignored on Claude Opus 4.7+ and on OpenAI.
172
+
--thinking-budget <N> Vestigial. Currently inert on every path: legacy Claude uses a fixed per-tier budget (Low=1024, Medium=5120, High=16384), Claude Opus 4.7+ uses adaptive thinking, OpenAI uses `reasoning.effort`. Kept for backwards-compatibility; will be removed.
173
173
-v, --verbose Verbose logging
174
174
```
175
175
@@ -192,7 +192,7 @@ sofos -e off # Cheapest path; no reasoning summar
192
192
193
193
- **OpenAI (gpt-5 family)** — sends `reasoning.effort` matching the level (`minimal` for `off`, `low`/`medium`/`high` otherwise) and `summary: "auto"` when on, omitted when off.
194
194
- **Claude Opus 4.7** — adaptive thinking; the server picks the budget based on the prompt, and sofos sends `output_config.effort` matching the level (`off` collapses to `low`, the lowest the API accepts). `--thinking-budget` is ignored.
195
-
- **Older Claude (Sonnet 4.6, Opus 4.6, Haiku 4.5)** — `off` disables extended thinking; `low/medium/high` all enable it with the `--thinking-budget` token budget (default `5120`). The level is treated uniformly here pending per-tier budget mapping.
195
+
- **Older Claude (Sonnet 4.6, Opus 4.6, Haiku 4.5)** — `off` disables extended thinking; `low`, `medium`, and `high` each map to a distinct legacy `budget_tokens` value (`1024 / 5120 / 16384`) so the slider has a visible effect. `--thinking-budget` is ignored — the per-tier values are the source of truth.
0 commit comments