Skip to content

Commit 61ae00d

Browse files
committed
Add view_image tool so the model can open a local image or http(s) URL on demand, and drop the inline path/URL auto-detection from typed prompts
1 parent 0103b4c commit 61ae00d

15 files changed

Lines changed: 688 additions & 540 deletions

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ All notable changes to Sofos are documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- **New `view_image` tool lets the model open an image on demand.** Given a local image file path or an `http(s)://` URL, the tool attaches the image to the conversation so the model can describe it. For a folder of images, the model is told to call `list_directory` first and then `view_image` once per file. Supports JPEG, PNG, GIF, and WebP up to 20 MB per local file; URLs are passed through to the model provider, which fetches them on its side. External paths reuse the same Read-permission prompt as `read_file`, so granting access to a directory once covers both tools. Local images larger than 2048 pixels on the long side are downscaled proportionally before they reach the model so a 4K screenshot does not burn through the per-image token budget; smaller images are sent unchanged.
10+
11+
### Changed
12+
13+
- **Image references typed inline in a prompt are no longer auto-attached.** Previously, a path or URL with an image extension typed in a message would be detected, stripped from the text, and attached as an image content block. Two ergonomic problems followed: vague asks such as "look at the image in `assets/`" without a filename did nothing, and unrelated text that happened to end in `.png` could be misread as a path. Attaching an image now goes through the new `view_image` tool, which the model invokes after reading the prompt; clipboard paste (Ctrl-V) continues to attach images directly.
14+
715
### Security
816

917
- **Shell command and process substitution are now blocked in bash commands.** Constructs such as `echo $(rm bad)`, backtick substitution, and process substitution `<(cmd)` / `>(cmd)` previously slipped past the permission system because only the outer command name was checked. They are now refused before the command runs, with a clear message that names the marker. Single-quoted literals and arithmetic expansion `$((expr))` continue to work.

Cargo.lock

Lines changed: 32 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ rand = "0.10"
3838
base64 = "0.22"
3939
arboard = "3"
4040
png = "0.18"
41+
image = { version = "0.25", default-features = false, features = ["png", "jpeg", "gif", "webp"] }
4142
unicode-width = "0.2"
4243

4344
# Utilities

README.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Sofos provides an AI assistant inside your terminal with controlled access to yo
6262
- create, move, copy, and delete files with permission checks;
6363
- run safe build and test commands;
6464
- fetch documentation and use provider-native web search;
65-
- review local, clipboard, or web images;
65+
- open local image files or remote image URLs through the `view_image` tool, and accept clipboard pastes directly;
6666
- update a visible task plan during multi-step work;
6767
- save and resume conversations;
6868
- connect to external tools through Model Context Protocol servers.
@@ -81,7 +81,7 @@ The assistant can act through tools, but it does not do so silently: tool calls
8181
- **Strong permission model** — independent Read, Write, and Bash grants for paths outside the workspace.
8282
- **Bash safety** — allowed, denied, and ask tiers, plus structural checks for parent traversal, redirection, and dangerous git operations.
8383
- **Safe mode** — read-only native tools for review-only sessions.
84-
- **Image vision** — local images, web images, and clipboard paste.
84+
- **Image vision**`view_image` tool for local files and remote URLs, plus clipboard paste.
8585
- **MCP integration** — connect additional tool servers through stdio or streamable HTTP.
8686
- **Session persistence** — saved conversations, resume picker, restored safe mode, restored model where compatible, and persisted cost counters.
8787
- **Cost visibility** — token totals, cache hit reporting, and provider-specific price estimates.
@@ -214,21 +214,24 @@ sofos -p "Create a high-level summary of this crate" --safe-mode
214214

215215
### Image vision
216216

217-
Include image paths or URLs directly in your message, or paste images from the clipboard.
217+
Ask about an image by referring to it in your message. The model calls the `view_image` tool to open the file or URL you mention.
218218

219219
```text
220220
What is wrong in ./screenshots/error.png?
221-
Describe "./docs/architecture diagram.webp".
221+
Describe ./docs/architecture-diagram.webp.
222222
Review https://example.com/chart.png
223+
What do you see in the images in ./assets/?
223224
```
224225

226+
For a folder, the model lists the directory first and then opens each image one by one.
227+
225228
Clipboard paste:
226229

227230
```text
228231
Ctrl+V # Inserts a numbered marker such as ①.
229232
```
230233

231-
Supported formats: JPEG, PNG, GIF, and WebP. Local images are capped at 20 MB. Paths with spaces should be quoted. Images outside the workspace require Read permission.
234+
Supported formats: JPEG, PNG, GIF, and WebP. Local images are capped at 20 MB. Images larger than 2048 pixels on the long side are scaled down proportionally before being sent to the model, so a 4K screenshot does not balloon your token budget. Images outside the workspace require Read permission the first time, just like reading a file.
232235

233236
---
234237

@@ -311,10 +314,11 @@ Provider mapping:
311314
| `delete_directory` | Delete a directory after confirmation. External paths require Write permission. |
312315
| `execute_bash` | Run approved shell commands through the bash permission system. |
313316
| `update_plan` | Show the current multi-step task plan with `pending`, `in_progress`, and `completed` statuses. |
317+
| `view_image` | Attach a local image file or an `http(s)://` URL to the conversation so the model can see it. |
314318
| `web_fetch` | Fetch a URL and return readable text. |
315319
| `web_search` | Provider-native web search. |
316320

317-
Image vision is not a tool. Sofos detects supported image paths and URLs in user messages and converts them into image content blocks before sending the request.
321+
Clipboard pastes are not routed through a tool: pressing Ctrl-V in the prompt attaches the image directly to the message.
318322

319323
### Safe mode tools
320324

@@ -325,6 +329,7 @@ Safe mode is enabled with `--safe-mode` or `/s`. It restricts the native tool se
325329
- `glob_files`;
326330
- `search_code` when ripgrep is installed;
327331
- `update_plan`;
332+
- `view_image`;
328333
- `web_fetch`;
329334
- `web_search`.
330335

@@ -574,7 +579,7 @@ See [`RELEASE.md`](RELEASE.md) for the full process.
574579
| Path denied | Add a `Read`, `Write`, or `Bash` rule, or approve the interactive prompt. |
575580
| External edit denied | `edit_file` and `morph_edit_file` need both Read and Write for external files. |
576581
| Code search unavailable | Install `ripgrep` and ensure `rg` is on `PATH`. |
577-
| Image path with spaces fails | Quote the path: `"path/with spaces/image.png"`. |
582+
| Image not opening | Mention the image by path or URL in your message; the model will call `view_image`. For a folder, ask it to look in the folder and it will list and open each image. |
578583
| Terminal does not insert newline with Shift+Enter | Use Alt+Enter or Ctrl+Enter. |
579584
| Build problems | Run `rustup update`, then `cargo clean` and `cargo build`. |
580585

STRUCTURE.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ src/
275275
│ ├── codesearch.rs
276276
│ │ # Ripgrep-backed code search with ignore policy, file-type filters, and output limits.
277277
│ ├── image.rs
278-
│ │ # Local and web image detection, validation, encoding, and message-content conversion.
278+
│ │ # Image loader used by the `view_image` tool: format detection, 20 MB size cap, automatic resize to 2048 pixels on the long side, base64 encoding, and Read-permission integration.
279279
│ ├── morph_validate.rs
280280
│ │ # Safety checks that reject suspicious or truncated Morph Apply output before writing files.
281281
│ ├── plan.rs
@@ -449,7 +449,7 @@ It contains:
449449
- image size enforcement;
450450
- numbered marker handling used by the TUI input flow.
451451

452-
It does not own general image path loading. Local and web image detection for user messages lives in `tools/image.rs`.
452+
It does not own image loading from disk. Local and remote image loading for the `view_image` tool lives in `tools/image.rs`.
453453

454454
---
455455

@@ -995,23 +995,23 @@ Rules:
995995

996996
### 7.8 `tools/image.rs`
997997

998-
`tools/image.rs` owns image detection and loading for user messages.
998+
`tools/image.rs` owns the image loader behind the `view_image` tool.
999999

10001000
It contains:
10011001

1002-
- local image path parsing;
1003-
- web image URL detection;
1004-
- supported-format checks;
1005-
- base64 encoding;
1006-
- media-type assignment;
1007-
- size enforcement;
1008-
- integration with workspace and external Read permissions.
1002+
- decode plus optional resize (long side fits within 2048 pixels) before the bytes reach the model;
1003+
- byte-level format detection: PNG, JPEG, GIF, and WebP pass through unchanged when small enough; other decodable formats (e.g. BMP) are re-encoded as PNG;
1004+
- base64 encoding and media-type assignment;
1005+
- the 20 MB per-file size cap on the raw bytes;
1006+
- canonical workspace resolution so inside/outside classification compares the same path shape on both sides;
1007+
- integration with the shared Read-permission grant set, so a single "Allow Read access to /foo?" decision answered for `read_file` also covers `view_image`;
1008+
- a URL passthrough that hands `http(s)://` inputs to the model provider unchanged.
10091009

10101010
Rules:
10111011

1012-
- Image paths in user text become image content blocks before provider requests.
1013-
- Unsupported or oversized images should produce clear errors.
1014-
- Failed web-image loading can be retried without discarding the user's text.
1012+
- Local files outside the workspace go through the same interactive Read prompt as `read_file`.
1013+
- Files that fail to decode or exceed the size cap produce errors that name the cause.
1014+
- The loader never fetches remote URLs itself; the model provider does that on its side.
10151015

10161016
### 7.9 `tools/morph_validate.rs`
10171017

@@ -1502,7 +1502,7 @@ Rules:
15021502
| Permission settings and prompts | `tools/permissions/manager.rs` |
15031503
| Permission rule parsing | `tools/permissions/pattern.rs` |
15041504
| Code search | `tools/codesearch.rs` |
1505-
| User-message image loading | `tools/image.rs` |
1505+
| `view_image` tool image loading | `tools/image.rs` |
15061506
| Morph output validation | `tools/morph_validate.rs` |
15071507
| MCP configuration | `mcp/config.rs` |
15081508
| MCP protocol shapes | `mcp/protocol.rs` |

src/mcp/manager.rs

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,22 @@ pub struct ToolResult {
2121
pub images: Vec<ImageData>,
2222
}
2323

24+
/// Image attachment in a tool result. `Url` is fetched by the model
25+
/// provider; `Base64` is shipped inline.
2426
#[derive(Debug, Clone)]
25-
pub struct ImageData {
26-
pub mime_type: String,
27-
pub base64_data: String,
27+
pub enum ImageData {
28+
Base64 { mime_type: String, data: String },
29+
Url { url: String },
30+
}
31+
32+
impl ImageData {
33+
/// Bytes we'd ship inline; 0 for URLs since the provider fetches them.
34+
pub fn outbound_size(&self) -> usize {
35+
match self {
36+
ImageData::Base64 { data, .. } => data.len(),
37+
ImageData::Url { .. } => 0,
38+
}
39+
}
2840
}
2941

3042
/// Manages multiple MCP server connections and their tools.
@@ -286,12 +298,9 @@ fn format_tool_result(result: CallToolResult) -> ToolResult {
286298
text_output.push('\n');
287299
}
288300
ToolContent::Image { data, mime_type } => {
289-
let size_kb = (data.len() * 3 / 4) / 1024;
301+
let size_kb = crate::tools::utils::base64_approx_decoded_kb(data.len());
290302
text_output.push_str(&format!("[Image: {} ({} KB)]\n", mime_type, size_kb));
291-
images.push(ImageData {
292-
mime_type,
293-
base64_data: data,
294-
});
303+
images.push(ImageData::Base64 { mime_type, data });
295304
}
296305
ToolContent::Resource {
297306
uri,

src/repl/mod.rs

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ use crate::error::{Result, SofosError};
1919
use crate::mcp::McpManager;
2020
use crate::session::{HistoryManager, SessionState};
2121
use crate::tools::ToolExecutor;
22-
use crate::tools::image::ImageLoader;
2322
use crate::ui::{UI, set_safe_mode_cursor_style};
2423
use colored::Colorize;
2524
use std::path::PathBuf;
@@ -61,7 +60,6 @@ pub struct Repl {
6160
pub(super) client: LlmClient,
6261
pub(super) tool_executor: ToolExecutor,
6362
pub(super) history_manager: HistoryManager,
64-
pub(super) image_loader: ImageLoader,
6563
pub(super) ui: UI,
6664
pub(super) model_config: ModelConfig,
6765
pub(super) session_state: SessionState,
@@ -129,12 +127,6 @@ impl Repl {
129127
let has_code_search = tool_executor.has_code_search();
130128

131129
let history_manager = HistoryManager::new(workspace.clone())?;
132-
let mut image_loader = ImageLoader::new(workspace.clone())?;
133-
image_loader.install_read_path_session(
134-
std::io::stdin().is_terminal(),
135-
tool_executor.read_path_session_allowed(),
136-
tool_executor.read_path_session_denied(),
137-
);
138130

139131
// Load custom instructions
140132
let custom_instructions = history_manager.load_custom_instructions()?;
@@ -214,7 +206,6 @@ impl Repl {
214206
client,
215207
tool_executor,
216208
history_manager,
217-
image_loader,
218209
ui,
219210
model_config,
220211
session_state,

src/repl/response_handler.rs

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -444,11 +444,19 @@ impl ResponseHandler {
444444
});
445445

446446
for image in output.images() {
447+
let source = match image {
448+
crate::mcp::manager::ImageData::Base64 { mime_type, data } => {
449+
crate::api::ImageSource::Base64 {
450+
media_type: mime_type.clone(),
451+
data: data.clone(),
452+
}
453+
}
454+
crate::mcp::manager::ImageData::Url { url } => {
455+
crate::api::ImageSource::Url { url: url.clone() }
456+
}
457+
};
447458
tool_results.push(crate::api::MessageContentBlock::Image {
448-
source: crate::api::ImageSource::Base64 {
449-
media_type: image.mime_type.clone(),
450-
data: image.base64_data.clone(),
451-
},
459+
source,
452460
cache_control: None,
453461
});
454462
}

0 commit comments

Comments
 (0)