Skip to content

Commit 447b864

Browse files
committed
Harden REPL state on tool-cap, /clear, /effort, resume, and atomic writes
1 parent 4768b11 commit 447b864

8 files changed

Lines changed: 263 additions & 11 deletions

File tree

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ All notable changes to Sofos are documented in this file.
44

55
## [Unreleased]
66

7+
### Fixed
8+
9+
- **Hitting the tool-iteration cap no longer corrupts the saved session.** The recovery turn used to ship with the tools list still attached, so the assistant could answer with a fresh `tool_use` block that was never executed; the next request 400'd and `--resume` was dead on arrival. Sofos now strips tools from that final summary request and removes any tool-related blocks it returns defensively, so the session resumes cleanly afterwards.
10+
- **`/clear` keeps safe mode visible to the model.** Clearing the history used to strip the "you are running in safe mode" preamble even when the executor was still in safe mode, so the model proposed writes and bash and was met with opaque denials. The preamble is re-added automatically when safe mode is still on.
11+
- **Switching reasoning effort mid-session refuses combinations the next request would reject.** On the Anthropic legacy thinking models, `/effort high` (or any enabled level) requires `--max-tokens` above 16 384; the gate that startup enforces now also fires from `/effort`, so the next turn doesn't 400 with no clear hint at the cause.
12+
- **Resuming a session that was killed mid-tool-call no longer fails the first request.** A hard kill between writing the assistant's `tool_use` to disk and writing the matching `tool_result` used to leave the saved history in a shape the provider rejects on resume; sofos now strips the orphan tool-use block on load and replaces it with a short "[Tool call interrupted before execution]" placeholder.
13+
- **Slash commands now save their effect to the session.** Running `/compact`, `/clear`, `/safe`, or `/normal` and quitting via Ctrl+C without another prompt used to lose the change; the worker now persists the session after every slash command, not only after free-text turns.
14+
- **`write_file` / `edit_file` survive a power loss between the rename and writeback.** The atomic-write helper now `fsync`s the temp file before the rename and (on Unix) the parent directory after, so ext4/xfs in their default mount options can no longer leave the target with the new inode but zero data after a kernel crash.
15+
- **Moving a file across filesystems no longer fails outright.** `move_file` used to surface a raw "Invalid cross-device link" error when source and destination lived on different filesystems (macOS APFS volumes, Linux `/tmp` mounts, external drives). Sofos now detects the cross-device case and falls back to a copy + delete so the operation completes.
16+
717
### Security
818

919
- **MCP server children no longer inherit the sofos environment.** Stdio MCP servers used to start with every variable the parent shell exported, so `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `MORPH_API_KEY`, ssh agent sockets, and AWS credentials all reached the child. The launcher now clears the environment first and forwards a small allowlist (`PATH`, `HOME` / `USERPROFILE`, `TMPDIR` / `TEMP` / `TMP`, `LANG`, every `LC_*` locale variable, plus Windows essentials like `SYSTEMROOT` and `PATHEXT`); anything else needs to be listed under the server's `env` config field. Existing servers that already declare their required vars in config are unaffected.

src/repl/conversation/lifecycle.rs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,51 @@ impl ConversationHistory {
213213
)
214214
}
215215

216+
/// Drop tool-use blocks from the tail assistant message when no
217+
/// matching tool-result follows. Triggered by `restore_messages`
218+
/// to guard against a session file written between
219+
/// `add_assistant_with_blocks` and `add_tool_results` — a hard
220+
/// kill at that moment would otherwise leave the resumed
221+
/// conversation with an orphan that the provider 400s on the
222+
/// next request.
223+
pub(super) fn drop_tail_orphaned_tool_uses(&mut self) -> bool {
224+
let Some(last) = self.messages.last_mut() else {
225+
return false;
226+
};
227+
if last.role != "assistant" {
228+
return false;
229+
}
230+
let crate::api::MessageContent::Blocks { content } = &mut last.content else {
231+
return false;
232+
};
233+
let had_initiator = content.iter().any(|b| {
234+
matches!(
235+
b,
236+
crate::api::MessageContentBlock::ToolUse { .. }
237+
| crate::api::MessageContentBlock::ServerToolUse { .. }
238+
| crate::api::MessageContentBlock::WebSearchToolResult { .. }
239+
)
240+
});
241+
if !had_initiator {
242+
return false;
243+
}
244+
content.retain(|b| {
245+
!matches!(
246+
b,
247+
crate::api::MessageContentBlock::ToolUse { .. }
248+
| crate::api::MessageContentBlock::ServerToolUse { .. }
249+
| crate::api::MessageContentBlock::WebSearchToolResult { .. }
250+
)
251+
});
252+
if content.is_empty() {
253+
content.push(crate::api::MessageContentBlock::Text {
254+
text: "[Tool call interrupted before execution]".to_string(),
255+
cache_control: None,
256+
});
257+
}
258+
true
259+
}
260+
216261
/// Build a brief summary of messages about to be dropped (no LLM, just key facts).
217262
pub(super) fn build_drop_summary(messages: &[Message]) -> String {
218263
let mut tools_used = Vec::new();

src/repl/conversation/messages.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,11 @@ impl ConversationHistory {
9191
// any inherited anchor index is meaningless content-wise.
9292
self.invalidate_cache_anchor();
9393
self.messages = messages;
94+
// A session file written between `add_assistant_with_blocks`
95+
// and `add_tool_results` carries a tail assistant message with
96+
// unmatched tool_use blocks. Resuming that history would 400 on
97+
// the next request, so strip the orphan blocks now.
98+
self.drop_tail_orphaned_tool_uses();
9499
self.trim_if_needed();
95100
}
96101
}

src/repl/mod.rs

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,15 @@ impl Repl {
354354
let new_session_id = self.history_manager.generate_unique_session_id();
355355
self.session_state.conversation.clear();
356356
self.session_state.clear(new_session_id);
357+
// The executor stays in safe mode across `/clear`, so the model
358+
// also needs to keep seeing the safe-mode preamble. Without
359+
// this it confidently proposes write/bash and gets opaque
360+
// denials from the tool layer.
361+
if self.safe_mode {
362+
self.session_state
363+
.conversation
364+
.add_user_message(SAFE_MODE_MESSAGE.to_string());
365+
}
357366
self.session_state
358367
.conversation
359368
.add_user_message("The session history has been cleared".to_string());
@@ -408,6 +417,27 @@ impl Repl {
408417
println!();
409418
return;
410419
}
420+
// Anthropic's legacy thinking model rejects requests where
421+
// `max_tokens` is below the configured budget ceiling. Startup
422+
// refuses that combination outright (see `new` above); the same
423+
// gate must fire on mid-session `/effort high` so the next
424+
// turn doesn't 400 with no clear hint at the cause.
425+
if matches!(self.client, Anthropic(_))
426+
&& !self.uses_adaptive_thinking()
427+
&& effort.is_enabled()
428+
&& self.model_config.max_tokens <= crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH
429+
{
430+
println!();
431+
UI::print_error(&format!(
432+
"Cannot enable extended thinking on the legacy budget — max_tokens \
433+
({}) must exceed {}. Relaunch with a higher --max-tokens or pick a \
434+
lower effort.",
435+
self.model_config.max_tokens,
436+
crate::api::anthropic::LEGACY_THINKING_BUDGET_HIGH
437+
));
438+
println!();
439+
return;
440+
}
411441
self.model_config.set_reasoning_effort(effort);
412442
self.print_reasoning_state();
413443
}

src/repl/response_handler.rs

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ impl ResponseHandler {
232232
}
233233

234234
let (tool_results, user_cancelled) =
235-
self.execute_tools(&tool_uses, display_messages).await?;
235+
self.execute_tools(&tool_uses, display_messages).await;
236236

237237
if !tool_results.is_empty() {
238238
if std::env::var("SOFOS_DEBUG").is_ok() {
@@ -367,11 +367,19 @@ impl ResponseHandler {
367367
(text_output, tool_uses, had_reasoning)
368368
}
369369

370+
/// Run every tool in the assistant's `tool_use` batch and return
371+
/// a `ToolResult` for each one — including a synthesised note for
372+
/// any tool we skip after a cancellation. Returns `(results,
373+
/// user_cancelled)` directly, without a `Result`, so a caller
374+
/// cannot accidentally short-circuit with `?` and leave a
375+
/// half-batch on the wire: every `ToolUse` MUST be paired with a
376+
/// matching `ToolResult` on the immediate next user turn or the
377+
/// provider 400s, and the saved session loads dead.
370378
async fn execute_tools(
371379
&self,
372380
tool_uses: &[(String, String, serde_json::Value)],
373381
display_messages: &mut Vec<DisplayMessage>,
374-
) -> Result<(Vec<crate::api::MessageContentBlock>, bool)> {
382+
) -> (Vec<crate::api::MessageContentBlock>, bool) {
375383
let mut tool_results = Vec::new();
376384
let mut user_cancelled = false;
377385

@@ -514,7 +522,7 @@ impl ResponseHandler {
514522
}
515523
}
516524

517-
Ok((tool_results, user_cancelled))
525+
(tool_results, user_cancelled)
518526
}
519527

520528
async fn get_next_response(&mut self) -> Result<crate::api::CreateMessageResponse> {
@@ -624,7 +632,14 @@ impl ResponseHandler {
624632
// Let the assistant respond to the interruption. Use
625633
// `run_interruptible` so ESC during this final summary cancels
626634
// the HTTP call instead of blocking on the server.
627-
let request = self.build_request();
635+
//
636+
// The recovery request is built without tools. Sending the
637+
// tools array would let the assistant come back with another
638+
// `tool_use` block; persisting that block without the matching
639+
// `tool_result` puts the session in a shape the provider
640+
// rejects on the next request and breaks `--resume`.
641+
let mut request = self.build_request();
642+
request.tools = None;
628643
let client = self.client.clone();
629644
let response_result = self
630645
.run_interruptible(async move { client.create_message(request).await })
@@ -654,10 +669,23 @@ impl ResponseHandler {
654669
}
655670
}
656671

672+
// Even though the recovery request carried no tools,
673+
// strip any tool-related blocks defensively before
674+
// appending. A provider that ignores `tools = None`
675+
// (or returns a cached tool_use from a previous turn)
676+
// would otherwise leave an orphan in the saved session.
657677
let message_blocks: Vec<crate::api::MessageContentBlock> = response
658678
.content
659679
.iter()
660680
.map(crate::api::MessageContentBlock::from_content_block_for_api)
681+
.filter(|block| {
682+
!matches!(
683+
block,
684+
crate::api::MessageContentBlock::ToolUse { .. }
685+
| crate::api::MessageContentBlock::ServerToolUse { .. }
686+
| crate::api::MessageContentBlock::WebSearchToolResult { .. }
687+
)
688+
})
661689
.collect();
662690
if !message_blocks.is_empty() {
663691
self.conversation.add_assistant_with_blocks(message_blocks);

src/repl/tui/worker.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,14 @@ fn run(
151151
}
152152
}
153153
}
154+
// Slash commands can mutate the conversation (`/compact`
155+
// rewrites the history, `/clear` resets it, `/safe` /
156+
// `/normal` toggle the mode preamble). Persist after the
157+
// command runs so a `/exit` or Ctrl+C before the next
158+
// prompt doesn't lose the change.
159+
if let Err(e) = repl.save_current_session() {
160+
UI::print_warning(&format!("Failed to save session: {}", e));
161+
}
154162
flush_captured_streams();
155163
let _ = ui_tx.send(UiEvent::Status(repl.status_snapshot()));
156164
let _ = ui_tx.send(UiEvent::WorkerIdle);

src/tools/executor.rs

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -241,12 +241,13 @@ fn move_between(
241241
return fs_tool.move_file(source, destination);
242242
}
243243
ensure_parent_dir(&dst.canonical)?;
244-
std::fs::rename(&src.canonical, &dst.canonical).map_err(|e| {
245-
SofosError::ToolExecution(format!(
246-
"Failed to move '{}' to '{}': {}",
247-
source, destination, e
248-
))
249-
})
244+
crate::tools::filesystem::rename_with_cross_device_fallback(&src.canonical, &dst.canonical)
245+
.map_err(|e| {
246+
SofosError::ToolExecution(format!(
247+
"Failed to move '{}' to '{}': {}",
248+
source, destination, e
249+
))
250+
})
250251
}
251252

252253
/// Copy-file counterpart to [`move_between`]. Same inside/outside matrix;

src/tools/filesystem.rs

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ fn write_atomic(path: &Path, content: &str) -> std::io::Result<()> {
5555
let _ = fs::remove_file(&tmp_path);
5656
return Err(e);
5757
}
58+
// Flush the file content to stable storage BEFORE the rename. On a
59+
// kernel crash between rename and writeback ext4/xfs default mount
60+
// options can otherwise leave the target file with the new inode
61+
// but zero data, replacing valid content with an empty file.
62+
if let Err(e) = tmp_file.sync_all() {
63+
drop(tmp_file);
64+
let _ = fs::remove_file(&tmp_path);
65+
return Err(e);
66+
}
5867
drop(tmp_file);
5968

6069
// Preserve the existing file's permission bits. Best-effort: if
@@ -74,9 +83,64 @@ fn write_atomic(path: &Path, content: &str) -> std::io::Result<()> {
7483
let _ = fs::remove_file(&tmp_path);
7584
return Err(e);
7685
}
86+
87+
// Flush the directory entry on Unix so the rename itself survives a
88+
// crash. Without this, ext4/xfs in `data=ordered` mode can lose
89+
// the directory update on power-loss even though the file content
90+
// already made it to disk. Best-effort: `sync_all` on a closed
91+
// directory handle isn't supported on every platform, and on
92+
// Windows the call would be inappropriate.
93+
#[cfg(unix)]
94+
if let Some(parent) = target.parent() {
95+
if let Ok(dir) = fs::File::open(parent) {
96+
let _ = dir.sync_all();
97+
}
98+
}
7799
Ok(())
78100
}
79101

102+
/// True when `e` describes a rename that crossed a filesystem
103+
/// boundary. Uses the stable `ErrorKind::CrossesDevices` mapping
104+
/// first; falls back to the platform-specific raw code so a future
105+
/// Rust that drops the kind mapping still works.
106+
fn is_cross_device_error(e: &std::io::Error) -> bool {
107+
if e.kind() == std::io::ErrorKind::CrossesDevices {
108+
return true;
109+
}
110+
#[cfg(unix)]
111+
{
112+
e.raw_os_error() == Some(libc::EXDEV)
113+
}
114+
#[cfg(windows)]
115+
{
116+
// ERROR_NOT_SAME_DEVICE
117+
e.raw_os_error() == Some(17)
118+
}
119+
#[cfg(not(any(unix, windows)))]
120+
{
121+
false
122+
}
123+
}
124+
125+
/// Move `src` to `dst`, falling back to copy + delete when the rename
126+
/// crosses a filesystem boundary (`EXDEV` on Unix,
127+
/// `ERROR_NOT_SAME_DEVICE` on Windows). The fallback isn't atomic —
128+
/// a crash between the copy and the delete leaves both files — but
129+
/// the alternative is a hard error for what is otherwise a common
130+
/// operation (move into `/tmp` on macOS APFS, move across mounts on
131+
/// Linux), so the tradeoff is worth it.
132+
pub fn rename_with_cross_device_fallback(src: &Path, dst: &Path) -> std::io::Result<()> {
133+
match fs::rename(src, dst) {
134+
Ok(()) => Ok(()),
135+
Err(e) if is_cross_device_error(&e) => {
136+
fs::copy(src, dst)?;
137+
fs::remove_file(src)?;
138+
Ok(())
139+
}
140+
Err(e) => Err(e),
141+
}
142+
}
143+
80144
/// Reserve a fresh `<target>.sofos.tmp.<hex>` sibling file using
81145
/// `O_EXCL`-style exclusive create, returning both the path and the
82146
/// open handle so the caller writes through the handle we just owned
@@ -431,7 +495,7 @@ impl FileSystemTool {
431495
}
432496
}
433497

434-
fs::rename(&source_path, &dest_path)?;
498+
rename_with_cross_device_fallback(&source_path, &dest_path)?;
435499
Ok(())
436500
}
437501

@@ -493,6 +557,67 @@ mod tests {
493557
assert!(fs_tool.validate_path("foo..bar/baz..qux.txt").is_ok());
494558
}
495559

560+
#[test]
561+
fn write_atomic_replaces_existing_content_durably() {
562+
// Smoke test: write_atomic must produce a readable file with the
563+
// exact bytes passed in, with no leftover `.sofos.tmp.*` siblings
564+
// in the directory after a successful write. `sync_all` runs
565+
// before the rename — failing to flush would leave the file
566+
// empty on a crash, which is the regression we're guarding
567+
// against.
568+
let (_temp, path) = test_support::workspace();
569+
let fs_tool = FileSystemTool::new(path.clone()).unwrap();
570+
fs_tool.write_file("doc.md", "first").unwrap();
571+
fs_tool.write_file("doc.md", "second pass").unwrap();
572+
let contents = fs_tool.read_file("doc.md").unwrap();
573+
assert_eq!(contents, "second pass");
574+
575+
// No tmp file should survive a successful write.
576+
let leftover: Vec<_> = std::fs::read_dir(&path)
577+
.unwrap()
578+
.filter_map(|e| e.ok())
579+
.filter(|e| e.file_name().to_string_lossy().contains(".sofos.tmp."))
580+
.collect();
581+
assert!(
582+
leftover.is_empty(),
583+
"no .sofos.tmp.* artefact should remain"
584+
);
585+
}
586+
587+
#[cfg(unix)]
588+
#[test]
589+
fn cross_device_move_falls_back_to_copy_remove() {
590+
// `fs::rename` returns `EXDEV` across filesystems. Modern macOS
591+
// (every `/tmp` is a separate APFS volume from the user's home)
592+
// and Linux containers (`/dev/shm` vs `/tmp`) both hit this in
593+
// practice. The fallback must succeed where rename can't.
594+
//
595+
// We can't reliably stage two devices in unit-tests, so we go
596+
// straight at `is_cross_device_error` + `rename_with_cross_device_fallback`
597+
// on a same-device move and verify it still works (the
598+
// fast-path) — the slow-path is exercised by hand on macOS
599+
// before each release.
600+
let (_temp, path) = test_support::workspace();
601+
let fs_tool = FileSystemTool::new(path).unwrap();
602+
fs_tool.write_file("src.txt", "hello").unwrap();
603+
fs_tool.move_file("src.txt", "dst.txt").unwrap();
604+
let contents = fs_tool.read_file("dst.txt").unwrap();
605+
assert_eq!(contents, "hello");
606+
}
607+
608+
#[test]
609+
fn cross_device_error_classifier_matches_kind() {
610+
// `ErrorKind::CrossesDevices` is the canonical signal from
611+
// 1.85+; any future toolchain that drops the mapping is caught
612+
// by the platform-specific raw-error fallback.
613+
let e = std::io::Error::from(std::io::ErrorKind::CrossesDevices);
614+
assert!(is_cross_device_error(&e));
615+
616+
// Unrelated IO errors must NOT trigger the copy+remove path.
617+
let other = std::io::Error::from(std::io::ErrorKind::PermissionDenied);
618+
assert!(!is_cross_device_error(&other));
619+
}
620+
496621
#[test]
497622
fn append_file_creates_then_appends_across_calls() {
498623
let (_temp, path) = test_support::workspace();

0 commit comments

Comments
 (0)