Skip to content

Commit 56ff1e0

Browse files
committed
Sanitise provider errors, fix OpenAI refusals, cap streaming buffer
1 parent 447b864 commit 56ff1e0

6 files changed

Lines changed: 214 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,16 @@ All notable changes to Sofos are documented in this file.
66

77
### Fixed
88

9+
- **Anthropic decode failures now name the provider and show what came back.** A non-streaming response that doesn't match the expected JSON shape used to surface as the generic "HTTP request failed: error decoding response body" with no provider context; sofos now reads the body as text first and includes a redacted preview in the error so a misconfigured proxy is obvious at a glance.
10+
- **Cache-cost numbers settle correctly on turns where server-side compaction lands late.** Anthropic emits the final `cache_read_input_tokens` and `cache_creation_input_tokens` only on the trailing `message_delta` event in those cases; sofos now refreshes both totals when they appear there, so the cost summary picks up the cache-creation premium instead of under-reporting.
11+
- **OpenAI refusals reach the conversation.** A `{type: "refusal", refusal: "..."}` block used to be dropped silently, surfacing as "Assistant returned an empty response"; sofos now lifts the refusal text into the visible response so both the user and the next turn see what the model said.
12+
- **OpenAI truncations without an `incomplete.reason` still trigger the token-limit warning.** When the provider sets `status: "incomplete"` but omits `incomplete_details.reason`, sofos now treats it as `max_tokens` so the existing "Response was cut off" warning fires instead of letting a half-formed tool call enter the conversation history.
13+
14+
### Security
15+
16+
- **API-key-shaped strings in provider error bodies are redacted before display.** Provider 401 responses sometimes echo a truncated key (`sk-ant-api03-…` or `Bearer …`); sofos now replaces every matching run with `sk-[redacted]` / `Bearer [redacted]` and caps the body at 4 KB, so a noisy proxy or moderation block can no longer flood the status line or leak key prefixes to transcripts and crash reports.
17+
- **The SSE re-assembly buffer is capped at 16 MB on both providers.** A server (or middlebox) that streams gigabytes without a newline used to grow the buffer until the 30-minute request timeout fired; sofos now aborts the stream cleanly with a clear error long before memory exhaustion becomes a real risk.
18+
919
- **Hitting the tool-iteration cap no longer corrupts the saved session.** The recovery turn used to ship with the tools list still attached, so the assistant could answer with a fresh `tool_use` block that was never executed; the next request 400'd and `--resume` was dead on arrival. Sofos now strips tools from that final summary request and removes any tool-related blocks it returns defensively, so the session resumes cleanly afterwards.
1020
- **`/clear` keeps safe mode visible to the model.** Clearing the history used to strip the "you are running in safe mode" preamble even when the executor was still in safe mode, so the model proposed writes and bash and was met with opaque denials. The preamble is re-added automatically when safe mode is still on.
1121
- **Switching reasoning effort mid-session refuses combinations the next request would reject.** On the Anthropic legacy thinking models, `/effort high` (or any enabled level) requires `--max-tokens` above 16 384; the gate that startup enforces now also fires from `/effort`, so the next turn doesn't 400 with no clear hint at the cause.

src/api/anthropic/client.rs

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,20 @@ impl AnthropicClient {
7272
)
7373
.await?;
7474

75-
let result = response.json::<CreateMessageResponse>().await?;
76-
Ok(result)
75+
// Read the body as text first so a JSON-shape mismatch surfaces
76+
// with the provider name and a snippet of what we actually got
77+
// — the bare `response.json::<…>().await?` path otherwise turns
78+
// every decode failure into "HTTP request failed: error
79+
// decoding response body" with no provider context to debug.
80+
let body = response.text().await.map_err(|e| {
81+
crate::error::SofosError::Api(format!("Failed to read Anthropic response body: {}", e))
82+
})?;
83+
serde_json::from_str::<CreateMessageResponse>(&body).map_err(|e| {
84+
crate::error::SofosError::Api(format!(
85+
"Failed to parse Anthropic response: {} (body preview: {})",
86+
e,
87+
utils::sanitize_provider_error_body(&body)
88+
))
89+
})
7790
}
7891
}

src/api/anthropic/stream.rs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ use crate::api::anthropic::client::AnthropicClient;
77
use crate::api::anthropic::wire::{BETA_HEADER_NAME, anthropic_beta_for, prepare_request};
88
use crate::api::types::*;
99
use crate::api::utils;
10+
use crate::api::utils::MAX_SSE_BUFFER_BYTES;
1011
use crate::error::{Result, SofosError};
1112
use futures::stream::{Stream, StreamExt};
1213
use std::sync::Arc;
@@ -130,6 +131,13 @@ where
130131

131132
let chunk = chunk_result?;
132133
buffer.extend_from_slice(chunk.as_ref());
134+
if buffer.len() > MAX_SSE_BUFFER_BYTES {
135+
return Err(SofosError::Api(format!(
136+
"Anthropic SSE buffer exceeded {} MB without a line terminator; \
137+
likely a misbehaving server or middlebox",
138+
MAX_SSE_BUFFER_BYTES / (1024 * 1024)
139+
)));
140+
}
133141

134142
while let Some(pos) = buffer.iter().position(|b| *b == b'\n') {
135143
// The complete line is in-buffer, so codepoints aren't
@@ -371,6 +379,24 @@ where
371379
output_tokens = saturate_u32(
372380
u.get("output_tokens").and_then(|v| v.as_u64()).unwrap_or(0),
373381
);
382+
// Server-side compaction can settle the cache-
383+
// related usage numbers only on the trailing
384+
// `message_delta`, not the opening
385+
// `message_start`. Refresh both totals when
386+
// present so the cost line picks up the cache-
387+
// creation premium on turns where compaction
388+
// landed late.
389+
if let Some(read) =
390+
u.get("cache_read_input_tokens").and_then(|v| v.as_u64())
391+
{
392+
cache_read_input_tokens = Some(saturate_u32(read));
393+
}
394+
if let Some(create) = u
395+
.get("cache_creation_input_tokens")
396+
.and_then(|v| v.as_u64())
397+
{
398+
cache_creation_input_tokens = Some(saturate_u32(create));
399+
}
374400
}
375401
}
376402
"error" => {

src/api/openai/stream.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ use crate::api::openai::client::OPENAI_API_BASE;
1111
use crate::api::openai::wire::{OpenAIResponse, build_response, build_responses_body};
1212
use crate::api::types::*;
1313
use crate::api::utils;
14+
use crate::api::utils::MAX_SSE_BUFFER_BYTES;
1415
use crate::error::{Result, SofosError};
1516
use futures::stream::{Stream, StreamExt};
1617
use serde_json::json;
@@ -89,6 +90,13 @@ where
8990

9091
let chunk = chunk_result?;
9192
buffer.extend_from_slice(chunk.as_ref());
93+
if buffer.len() > MAX_SSE_BUFFER_BYTES {
94+
return Err(SofosError::Api(format!(
95+
"OpenAI SSE buffer exceeded {} MB without a line terminator; \
96+
likely a misbehaving server or middlebox",
97+
MAX_SSE_BUFFER_BYTES / (1024 * 1024)
98+
)));
99+
}
92100

93101
while let Some(pos) = buffer.iter().position(|b| *b == b'\n') {
94102
// Re-check the interrupt flag between lines so a single

src/api/openai/wire.rs

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,13 @@ pub(super) struct OpenAIOutputContent {
306306
pub(super) content_type: String,
307307
#[serde(default)]
308308
pub(super) text: String,
309+
/// OpenAI emits a `refusal` block when the model declines the
310+
/// request. The decline text lives under `refusal` instead of
311+
/// `text`, so it has to be captured separately or it's silently
312+
/// dropped, which surfaces to the user as the very confusing
313+
/// "empty response" warning.
314+
#[serde(default)]
315+
pub(super) refusal: String,
309316
}
310317

311318
#[derive(Debug, Deserialize)]
@@ -432,6 +439,16 @@ pub(super) fn build_response(response_parsed: OpenAIResponse) -> Result<CreateMe
432439
for content in item.content {
433440
if content.content_type == "output_text" && !content.text.trim().is_empty() {
434441
content_blocks.push(ContentBlock::Text { text: content.text });
442+
} else if content.content_type == "refusal"
443+
&& !content.refusal.trim().is_empty()
444+
{
445+
// Surface refusals as plain text so the model
446+
// sees its own decline next turn (instead of
447+
// looking like it returned an empty turn) and
448+
// the user gets the actual refusal text.
449+
content_blocks.push(ContentBlock::Text {
450+
text: content.refusal,
451+
});
435452
}
436453
}
437454

@@ -517,6 +534,13 @@ pub(super) fn build_response(response_parsed: OpenAIResponse) -> Result<CreateMe
517534
Some("max_tokens".to_string())
518535
}
519536
(Some("incomplete"), Some(other)) => Some(other.to_string()),
537+
// OpenAI sometimes reports `status: "incomplete"` without
538+
// populating `incomplete_details.reason`. The truncation guard
539+
// in the response handler looks for `stop_reason ==
540+
// "max_tokens"`, so mapping the missing-reason case there
541+
// keeps the warning firing instead of letting a half-formed
542+
// tool call enter the conversation history.
543+
(Some("incomplete"), None) => Some("max_tokens".to_string()),
520544
// Anthropic always sets `stop_reason` on a normal stop. Map the
521545
// OpenAI `status: "completed"` to the same `"end_turn"` value
522546
// so downstream `if let Some(stop_reason) = ...` branches treat

src/api/utils.rs

Lines changed: 131 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,84 @@ pub fn truncate_at_char_boundary(s: &str, max_bytes: usize) -> usize {
346346
i
347347
}
348348

349+
/// Upper bound on the provider-error body interpolated into a user-facing
350+
/// `SofosError::Api`. A misconfigured proxy that returns a multi-MB HTML
351+
/// page, or a moderation block that echoes the whole request, otherwise
352+
/// floods stderr and the status line. Beyond this the body is truncated
353+
/// with a `[…N more bytes elided]` marker so the model still gets some
354+
/// signal but the UI stays readable.
355+
pub const MAX_PROVIDER_ERROR_BODY_BYTES: usize = 4 * 1024;
356+
357+
/// Upper bound on the SSE re-assembly buffer used by both the Anthropic
358+
/// and OpenAI streaming parsers. A server (or a middlebox) that streams
359+
/// gigabytes without a newline would otherwise grow `buffer` until the
360+
/// 30-minute request timeout fires, exhausting memory long before the
361+
/// timeout helps. 16 MB is far above any legitimate single SSE line we
362+
/// have seen in practice.
363+
pub const MAX_SSE_BUFFER_BYTES: usize = 16 * 1024 * 1024;
364+
365+
/// Best-effort redaction of API-key-shaped substrings inside a provider
366+
/// error body. Provider 401 responses sometimes echo the rejected key
367+
/// (truncated or otherwise), which would land verbatim in transcripts
368+
/// and crash reports. Scans for `sk-…` style prefixes and `Bearer …`
369+
/// pairs and rewrites each run as `<keyword>[redacted]`. Caller is
370+
/// expected to apply [`truncate_at_char_boundary`] separately if the
371+
/// body needs a length cap.
372+
pub fn redact_api_secrets(body: &str) -> String {
373+
fn is_key_byte(b: u8) -> bool {
374+
b.is_ascii_alphanumeric() || b == b'_' || b == b'-'
375+
}
376+
377+
let bytes = body.as_bytes();
378+
let mut out = String::with_capacity(body.len());
379+
let mut i = 0;
380+
while i < bytes.len() {
381+
if bytes[i..].starts_with(b"sk-") {
382+
let mut end = i + 3;
383+
while end < bytes.len() && is_key_byte(bytes[end]) {
384+
end += 1;
385+
}
386+
if end - i >= 11 {
387+
out.push_str("sk-[redacted]");
388+
i = end;
389+
continue;
390+
}
391+
}
392+
if bytes[i..].starts_with(b"Bearer ") || bytes[i..].starts_with(b"bearer ") {
393+
let prefix_len = 7;
394+
let mut end = i + prefix_len;
395+
while end < bytes.len() && is_key_byte(bytes[end]) {
396+
end += 1;
397+
}
398+
if end - i >= prefix_len + 8 {
399+
out.push_str(&body[i..i + prefix_len]);
400+
out.push_str("[redacted]");
401+
i = end;
402+
continue;
403+
}
404+
}
405+
// Non-ASCII bytes carry one full UTF-8 char; push it whole so
406+
// the surrounding text stays valid.
407+
let ch = body[i..].chars().next().unwrap_or('\u{FFFD}');
408+
out.push(ch);
409+
i += ch.len_utf8();
410+
}
411+
out
412+
}
413+
414+
/// Truncate `body` to [`MAX_PROVIDER_ERROR_BODY_BYTES`] and run
415+
/// [`redact_api_secrets`] over the result. Centralised so every error-
416+
/// to-message hop applies the same cleanup.
417+
pub fn sanitize_provider_error_body(body: &str) -> String {
418+
let cut = truncate_at_char_boundary(body, MAX_PROVIDER_ERROR_BODY_BYTES);
419+
let mut truncated = redact_api_secrets(&body[..cut]);
420+
if body.len() > cut {
421+
let extra = body.len() - cut;
422+
truncated.push_str(&format!(" […{} more bytes elided]", extra));
423+
}
424+
truncated
425+
}
426+
349427
/// Upper bound applied to the `Retry-After` value advertised by a 429
350428
/// response. 60 seconds is comfortably above the burst-limit windows
351429
/// the APIs we integrate with use in practice, and short enough that
@@ -468,7 +546,10 @@ fn api_call_error_to_sofos(service_name: &str, attempts: u32, e: ApiCallError) -
468546
ApiCallError::ServerError { status, body } | ApiCallError::ClientError { status, body } => {
469547
SofosError::Api(format!(
470548
"{} request failed with status {} after {} attempt(s): {}",
471-
service_name, status, attempts, body
549+
service_name,
550+
status,
551+
attempts,
552+
sanitize_provider_error_body(&body)
472553
))
473554
}
474555
ApiCallError::RateLimited { retry_after, body } => SofosError::Api(format!(
@@ -479,7 +560,7 @@ fn api_call_error_to_sofos(service_name: &str, attempts: u32, e: ApiCallError) -
479560
None => String::new(),
480561
},
481562
attempts,
482-
body
563+
sanitize_provider_error_body(&body)
483564
)),
484565
}
485566
}
@@ -604,6 +685,54 @@ pub(crate) mod sse_test_support {
604685
mod tests {
605686
use super::*;
606687

688+
#[test]
689+
fn redact_strips_sk_keys_and_bearer_tokens() {
690+
let body = "Invalid x-api-key: sk-ant-api03-AAAAaaaa1111BBBBbbbb22 returned error";
691+
let cleaned = redact_api_secrets(body);
692+
assert!(
693+
!cleaned.contains("sk-ant-api03"),
694+
"key prefix must be removed, got: {cleaned}"
695+
);
696+
assert!(cleaned.contains("sk-[redacted]"));
697+
assert!(cleaned.contains("returned error"));
698+
699+
let bearer = "Authorization: Bearer abcdefghijKLMN1234 expired";
700+
let cleaned = redact_api_secrets(bearer);
701+
assert!(
702+
cleaned.contains("Bearer [redacted]"),
703+
"bearer token must be redacted, got: {cleaned}"
704+
);
705+
assert!(cleaned.contains("expired"));
706+
707+
// Short fragments that LOOK like a key but lack enough chars
708+
// after `sk-` stay untouched (avoids redacting unrelated
709+
// `sk-` substrings).
710+
let small = "see sk-x for details";
711+
let unchanged = redact_api_secrets(small);
712+
assert_eq!(unchanged, small);
713+
}
714+
715+
#[test]
716+
fn sanitize_body_caps_long_payload_with_marker() {
717+
let payload = "Z".repeat(MAX_PROVIDER_ERROR_BODY_BYTES * 3);
718+
let out = sanitize_provider_error_body(&payload);
719+
assert!(out.len() < payload.len());
720+
assert!(out.contains("more bytes elided"));
721+
}
722+
723+
#[test]
724+
fn sanitize_body_redacts_and_truncates_together() {
725+
// Key followed by a long tail must lose the key AND keep the
726+
// elision marker for the trailing bytes.
727+
let payload = format!(
728+
"key=sk-ant-api03-AAAAaaaa1111BBBB tail{}",
729+
"Y".repeat(MAX_PROVIDER_ERROR_BODY_BYTES * 2)
730+
);
731+
let out = sanitize_provider_error_body(&payload);
732+
assert!(out.contains("sk-[redacted]"));
733+
assert!(out.contains("more bytes elided"));
734+
}
735+
607736
#[test]
608737
fn api_call_error_is_retryable_for_server_error_and_rate_limited() {
609738
let server = ApiCallError::ServerError {

0 commit comments

Comments
 (0)