update docs

alexylon · alexylon · commit 8c8072d9fc71 · 2026-05-07T00:28:04.000+03:00
diff --git a/AGENTS.md b/AGENTS.md
@@ -430,7 +430,7 @@ Gitignored scratchpad for helper files the user asks to be created there — typ
 - Add or update important tests and keep them self-contained.
 - After each important change, but only when we are ready to commit, update if relevant:
     - `README.md`
-    - `CHANGELOG.md` under `[Unreleased]`
+    - `CHANGELOG.md` under `[Unreleased]`. Stick to the standard Keep-a-Changelog categories (`Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`) — do NOT add an `Internal` section. Pure refactors, dead-code removals, and test-only additions are not user-notable; leave them out. The changelog is for users, not contributors. Within each entry, describe the user-visible behaviour in plain English: no file paths, no bare `Type::method` shorthand, no Rust attribute syntax, no crate names. CLI flags, env vars, slash commands, and API wire formats are fine because the user encounters them directly.
 - Run:
     - `cargo fmt --all`
     - `cargo clippy --all-targets -- -D warnings`
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@ All notable changes to Sofos are documented in this file.
 
 - **Anthropic server-side compaction** is now enabled on Claude Opus 4.7, Opus 4.6, and Sonnet 4.6. Sofos sends the `compact-2026-01-12` beta header and a `context_management.edits[type=compact_20260112]` block on every request to those models; when the request crosses the per-model auto-compact threshold (~250K tokens), the API itself summarises older turns and returns a `compaction` content block, dropping the pre-compaction messages server-side on subsequent requests. No extra round-trip — the compaction summary arrives in the same response as the user's reply.
 - **OpenAI encrypted-reasoning round-trip.** Requests that enable reasoning now include `include: ["reasoning.encrypted_content"]`. Sofos captures the opaque encrypted-CoT blob alongside the visible reasoning summary and round-trips both on the next call, so the model resumes its hidden chain-of-thought across tool calls instead of regenerating it. Cuts hidden-reasoning output tokens on multi-call agentic turns.
-- **Per-model `ModelInfo` registry** consolidates context-window, auto-compact threshold, adaptive-thinking flag, server-compaction flag, and pricing (including tiered-pricing rules) into one struct per model. Adding a new model is one struct literal in `src/api/model_info.rs`.
+- **Per-model registry** consolidating context-window, auto-compact threshold, adaptive-thinking flag, server-compaction flag, and pricing (including tiered-pricing rules) into a single entry per model. Adding a new model takes one registry entry.
 - **Tiered-pricing detection for GPT-5.4 and GPT-5.5.** Sofos tracks the largest single-turn input observed across the session. If any single prompt crosses the documented 272K threshold, the cost calculator switches to premium rates (2× input, 1.5× output) for the rest of the session — matching what OpenAI actually bills.
 - **1-hour cache TTL on stable prefixes.** System prompt, the last-listed tool definition, and the sticky message anchor now use Anthropic's `ttl: "1h"` ephemeral cache. The rolling breakpoint stays at 5 min because it moves every turn; paying the 2× write premium for a one-turn slot would burn cache writes for nothing.
 - **Middle truncation for tool outputs.** Large bash / search / file-read / diff / MCP outputs preserve both the head and the tail (separated by a `…N tokens truncated…` marker) instead of the head-only cut sofos previously applied. The diagnostic tail (last error line, ripgrep totals, exit messages) now survives truncation.
@@ -23,21 +23,31 @@ All notable changes to Sofos are documented in this file.
 - **Default reasoning effort is now `medium`** (was `high`). Verified roughly 3–5× cheaper hidden-reasoning bill on routine coding turns. Use `-e high` or `/think high` for hard tasks.
 - **Reasoning summaries are suppressed on the OpenAI thinking-off path.** When `effort: off`, sofos sends `reasoning.effort = "minimal"` with no `summary` field, so the model returns no summary blocks at all (they bill as output tokens).
 - **Model context windows corrected.** Claude Opus 4.7 / 4.6 and Sonnet 4.6 are 1,000,000 tokens (were 200K in the table); GPT-5.4 and GPT-5.5 are 1,050,000 tokens (were 400K). The drop-trim safety floor is now per-model API-aware (95% of the real window) instead of a flat 250K.
-- **Anthropic beta header is now picked per-request based on `ModelInfo::supports_server_compaction`.** `token-efficient-tools-2025-02-19` ships on every Anthropic request; `compact-2026-01-12` only ships against models that actually support it (Opus 4.7, Opus 4.6, Sonnet 4.6). Removes the implicit dependency on Anthropic's "ignore unknown beta tokens" policy — if validation ever tightens, only the right requests carry the token.
-- **`/think low|medium|high` on legacy non-adaptive Anthropic models now maps to distinct `budget_tokens` values** (`Low=1024`, `Medium=5120`, `High=16384`) instead of all three collapsing to the `--thinking-budget` flag value. Affects Sonnet 4.5, Opus 4.5/4.6, Haiku 4.5; adaptive models (Opus 4.7+) and OpenAI are unchanged. The `--thinking-budget` CLI flag is now inert on every path; kept for backwards-compatibility and will be removed in a later release.
+- **Anthropic beta header is now picked per-request based on whether the target model supports server-side compaction.** `token-efficient-tools-2025-02-19` ships on every Anthropic request; `compact-2026-01-12` only ships against models that actually support it (Opus 4.7, Opus 4.6, Sonnet 4.6). Removes the implicit dependency on Anthropic's "ignore unknown beta tokens" policy — if validation ever tightens, only the right requests carry the token.
+- **`/think low|medium|high` on legacy non-adaptive Anthropic models now maps to distinct `budget_tokens` values** (`Low=1024`, `Medium=5120`, `High=16384`) instead of all three collapsing to one fixed value. Affects Sonnet 4.5, Opus 4.5/4.6, Haiku 4.5; adaptive models (Opus 4.7+) and OpenAI are unchanged.
 - **Startup validation now requires `--max-tokens > 16384` whenever reasoning effort is enabled**, regardless of the current model. Catches a configuration that would have silently 400'd the next request after a runtime `/model` swap to a non-adaptive Anthropic model. Default `--max-tokens 32768` already satisfies the new check.
 - **Server-side compaction trigger clamped to Anthropic's documented 50K floor.** Defends against a hypothetical future small-window model entry whose `auto_compact_at` would otherwise drop below 50K and 400 the request.
 
+### Deprecated
+
+- **`--thinking-budget` CLI flag.** The flag has had no effect on any provider path since `/think` started picking budgets per-effort tier — legacy Anthropic uses a fixed per-tier budget, adaptive Anthropic (Opus 4.7+) uses `output_config.effort`, and OpenAI uses `reasoning.effort`. It's now hidden from `--help` and prints a one-line deprecation warning at startup when a non-default value is supplied. The flag still parses as a no-op so existing scripts don't break at parse time. Use `--reasoning-effort <off|low|medium|high>` to control thinking depth. Will be removed in a future release.
+
 ### Fixed
 
-- **Session token counters now persist across `--resume`.** Previously every counter (`total_input_tokens`, `total_output_tokens`, `total_cache_read_tokens`, `total_cache_creation_tokens`, `peak_single_turn_input_tokens`) reset to 0 on session reload — the cost line started from zero on resume and the gpt-5.4/5.5 cliff detector forgot whether the 272K threshold had already been crossed. All five counters are now saved as part of the session JSON and restored on load. Older session files (written before this release) default every counter to 0 via `#[serde(default)]` (matching prior behaviour). **Forward-compat note:** if a session file written by this release is later opened by an older sofos, the older binary silently drops the new fields on save; mixing versions against the same session file will lose the persisted counters until you settle on one version.
+- **Session token counters now persist across `--resume`.** Previously every counter (`total_input_tokens`, `total_output_tokens`, `total_cache_read_tokens`, `total_cache_creation_tokens`, `peak_single_turn_input_tokens`) reset to 0 on session reload — the cost line started from zero on resume and the gpt-5.4/5.5 cliff detector forgot whether the 272K threshold had already been crossed. All five counters are now saved as part of the session JSON and restored on load. Older session files (written before this release) default every counter to 0 on load (matching prior behaviour). **Forward-compat note:** if a session file written by this release is later opened by an older sofos, the older binary silently drops the new fields on save; mixing versions against the same session file will lose the persisted counters until you settle on one version.
 - **Empty OpenAI reasoning shells are dropped instead of round-tripped.** When a reasoning output item arrives with `id` but no visible summary AND no `encrypted_content`, the wire shape `{type: "reasoning", id, summary: []}` carries no signal and may be rejected by some OpenAI models. Sofos now skips the block in that exact configuration; reasoning items with either a summary or encrypted CoT are preserved unchanged.
-- **Streaming Anthropic responses now round-trip server-side `compaction` content blocks.** The streaming path used to silently drop them, so on a streaming-enabled Anthropic session the next turn would re-send the pre-compaction history and Anthropic would re-compact (extra cost). The non-streaming path was already correct via serde; this brings streaming into parity.
+- **Streaming Anthropic responses now round-trip server-side `compaction` content blocks.** The streaming path used to silently drop them, so on a streaming-enabled Anthropic session the next turn would re-send the pre-compaction history and Anthropic would re-compact (extra cost). The non-streaming path was already correct; this brings streaming into parity.
 - **OpenAI reasoning items round-trip in the right order relative to their assistant message.** Reasoning items were being emitted in the input array *after* the message they preceded, breaking encrypted_content round-trip continuity on the server side. Now correctly placed before.
 - **Tool-cache breakpoint actually lands on Anthropic when OpenAI's web-search tool is registered.** The stamper used to no-op when `OpenAIWebSearch` was the last entry in the tool list, leaving Anthropic with no tool-defs cache breakpoint at all. Now finds the last *Anthropic-compatible* tool to stamp.
 - **OpenAI `Reasoning` blocks no longer leak to Anthropic on provider switch.** A session that started on OpenAI accumulates `Reasoning` content blocks; switching to Anthropic mid-session would have sent those blocks to the Messages API, which doesn't recognise the type. The Anthropic sanitiser now drops them.
 - **`peak_single_turn_input_tokens` is updated for every iteration of multi-tool turns**, not just the first. Long tool chains crossing the GPT-5.5 272K cliff inside the loop now correctly switch the cost line to premium rates.
 - **Stale duplicate cache breakpoint on `read_file_tool` removed.** The tool definition carried an inline `cache_control` that, combined with the request-builder's last-tool stamp, could push the request to a 5th breakpoint (Anthropic limits to 4).
+- **Image files with a missing or non-UTF-8 extension** now produce a clear error that names the file path and lists the supported formats. Previously this case collapsed into a confusing `Unsupported image format: .` (empty extension between the colon and the period).
+- **MCP child processes are reaped** if pipe acquisition fails partway through startup. Practically unreachable in healthy operation, but the failure path now matches the success path's cleanup guarantee — no risk of a stray child lingering after a startup error.
+- **A corrupted prior-session file** now logs a warning at save time instead of silently resetting `created_at` to "now". The save itself still succeeds (so the current turn is never lost); only the `created_at` field falls back. The benign read-failure case (file doesn't exist on first save, transient permission hiccup) stays silent.
+- **Failures parsing an Anthropic streaming event** now log a debug entry with a short, UTF-8-safe preview of the offending JSON. Previously these were silently dropped, so a malformed chunk left no trace.
+- **Markdown link destinations now render as terminal hyperlinks** (OSC 8) on supporting terminals, and the surrounding bold / italic / heading style is correctly restored after strong, code, and link spans. Previously a `[label](url)` rendered the label without the URL, and bold inside a heading lost its weight after a `**emphasized**` span.
+- **Blockquote dim styling survives strong, code, and link spans inside the quote.** Previously a `> some **bold** text` line dropped back to normal-weight after the bold span instead of staying dim until the next paragraph.
 
 ## [0.2.7] - 2026-05-04