Skip to content

Commit 7e778d8

Browse files
Merge pull request #572 from ragini-pandey/feature/image-attachments
feat: support image attachments on user messages
2 parents 618d81e + abe955c commit 7e778d8

24 files changed

Lines changed: 873 additions & 57 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@
5858

5959
- Large **refactor and dead-code sweep**: unified tool-call ID generation, extracted a shared `useWizardForm` hook, consolidated session-override managers, added message-factory helpers, deduped config loaders / git exec / command dispatch / conversation-loop flush, extracted `StyledSelectInput` and `makeSimpleToolFormatter`, and deleted several dead modules and orphaned exports (including `fetch-local-models` orphaned by an earlier removal).
6060

61+
- Added **image attachments on user messages**. Paste an image from the clipboard with **Ctrl+V** (via `osascript` on macOS, `wl-paste`/`xclip` on Linux, PowerShell on Windows), drag an image file into the terminal, or type a path — quoted, unquoted, and macOS backslash-escaped paths are all recognised, and `http(s)` URLs are left untouched. Attachments (PNG/JPEG/GIF/WebP, ≤10 MB) show above the input box and **Ctrl+X** removes the last one; references are stripped from the text and sent as image parts to vision-capable models. A missing clipboard tool now logs a debug breadcrumb instead of silently no-op'ing. Thanks to @ragini-pandey. Closes #572.
62+
6163
- Updated the **Nanocoder Battlemap** competitive comparison and refreshed the README and docs.
6264

6365
- Dependency updates: `ai` 6.0.174 -> 6.0.193, `@ai-sdk/openai`, `@ai-sdk/openai-compatible`, `@ai-sdk/anthropic`, `@agentclientprotocol/sdk` 0.22.1 -> 0.25.0, `undici`, `esbuild`, `@biomejs/biome` 2.5.0, `lint-staged` 17, `@types/node`, `@types/vscode`, and other transitive bumps tracked through the lockfile. Added `clipboardy ^5.3.1` for `/copy`.

docs/features/image-attachments.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: "Image Attachments"
3+
description: "Attach screenshots and images to your messages so vision-capable models can see them"
4+
sidebar_order: 12
5+
---
6+
7+
# Image Attachments
8+
9+
Nanocoder can send images alongside your text so a vision-capable model can look at a screenshot, a diagram, or a design mockup. Attachments are gathered as you compose a message and sent with the next prompt you submit.
10+
11+
## Attaching an Image
12+
13+
There are three ways to attach an image:
14+
15+
| Method | How |
16+
|--------|-----|
17+
| Clipboard paste | Copy an image, then press **Ctrl+V** in the input |
18+
| Drag and drop | Drag an image file into the terminal |
19+
| File path | Type or paste a path to an image file |
20+
21+
For drag-and-drop and typed paths, the path can be **quoted, unquoted, or backslash-escaped**. macOS terminals drop a dragged screenshot in as an unquoted path with escaped spaces (e.g. `Screenshot\ 2026-06-21\ at\ 10.04.32.png`) — that form is recognised without you needing to add quotes. The image reference is stripped from your message text before it's sent, so the model receives the picture rather than a file path.
22+
23+
Remote `http(s)://` URLs that end in an image extension are left as plain text — they are not fetched or treated as local files.
24+
25+
## Managing Attachments
26+
27+
Pending attachments are listed just above the input box:
28+
29+
```
30+
[image #1: Screenshot 2026-06-21.png] [image #2: clipboard] · ctrl-x remove last
31+
```
32+
33+
- **Ctrl+X** removes the most recently added attachment.
34+
- Attachments are cleared once the message is submitted.
35+
36+
## Supported Formats
37+
38+
PNG, JPEG, GIF, and WebP are accepted (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`). Each image must be **10 MB or smaller**; larger files are skipped.
39+
40+
Whether the image is actually understood depends on the model — attach images only when your provider/model supports vision. If a model can't accept images, it will report or error on its own.
41+
42+
## Clipboard Requirements by Platform
43+
44+
Clipboard paste (**Ctrl+V**) shells out to a platform tool to read the image. If the tool isn't installed, the paste is a no-op and a one-line note is written to the debug log naming the missing command.
45+
46+
| Platform | Required tool |
47+
|----------|---------------|
48+
| macOS | `osascript` (built in) |
49+
| Linux (Wayland) | `wl-paste` |
50+
| Linux (X11) | `xclip` |
51+
| Windows | PowerShell |
52+
53+
On a minimal Linux container without `wl-paste` or `xclip` — common in dev containers and CI — clipboard paste won't work; attach by drag-and-drop or file path instead, or install one of the tools.
54+
55+
## See Also
56+
57+
- [Keyboard Shortcuts](keyboard-shortcuts.md) — full shortcut reference, including the image bindings.

docs/features/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ Prefix any command with **`!`** to run it directly in your shell without leaving
4141
!npm test
4242
```
4343

44+
### Attaching Images
45+
46+
Press **Ctrl+V** to paste an image from your clipboard, or drag an image file into the terminal, to send it to a vision-capable model. Pending attachments show above the input box; **Ctrl+X** removes the last one. See [Image Attachments](image-attachments.md) for supported formats and platform requirements.
47+
4448
### Keyboard Shortcuts
4549

4650
These are the shortcuts you'll use constantly:
@@ -231,6 +235,7 @@ Extend Nanocoder's capabilities by connecting [MCP (Model Context Protocol) serv
231235
| [Session Management](session-management.md) | Automatic session saving and resumption |
232236
| [Task Management](task-management.md) | Tracking multi-step work |
233237
| [File Explorer](file-explorer.md) | Interactive file browser for context selection |
238+
| [Image Attachments](image-attachments.md) | Send screenshots and images to vision-capable models |
234239
| [VS Code Extension](vscode-extension.md) | Editor integration with live diff previews |
235240
| [ACP](acp.md) | Run as an Agent Client Protocol server for editors like Zed |
236241
| [Tune](tune.md) | Runtime model tuning for tool profiles, parameters, and compaction |

docs/features/keyboard-shortcuts.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ This page covers the main chat input and common interactive views. Some speciali
5050

5151
When typing `@` for file mentions or `/` for commands, Tab accepts the current suggestion. If there are multiple command matches, the first Tab shows the completion list and pressing Tab again accepts the first result.
5252

53+
## Image Attachments
54+
55+
| Action | Shortcut |
56+
|--------|----------|
57+
| Paste image from clipboard | Ctrl+V |
58+
| Remove last attached image | Ctrl+X |
59+
60+
Ctrl+V pulls an image off the system clipboard and adds it as an attachment. You can also attach an image by typing, pasting, or dragging an image file path into the input — quoted, unquoted, and macOS backslash-escaped paths (e.g. `Screenshot\ 2026.png`) are all recognised. Attachments appear above the input box as `[image #1: …]`; Ctrl+X drops the most recently added one. See [Image Attachments](image-attachments.md) for the full feature, including supported formats and platform requirements.
61+
5362
## History & Navigation
5463

5564
| Action | Shortcut |

source/acp/acp-agent.ts

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ import {
2727
getAvailableModes,
2828
negotiateProtocolVersion,
2929
} from '@/acp/acp-capabilities';
30-
import {acpContentToUserText} from '@/acp/acp-content';
30+
import {acpContentToUserMessage} from '@/acp/acp-content';
3131
import {runAcpConversation} from '@/acp/acp-conversation';
3232
import {AcpSession} from '@/acp/acp-session';
3333
import type {AcpInitContext} from '@/acp/acp-types';
@@ -128,16 +128,26 @@ export class AcpAgent implements Agent {
128128
);
129129
}
130130

131-
const userText = await acpContentToUserText(params.prompt, {
132-
conn: this.conn,
133-
sessionId: params.sessionId,
134-
canReadTextFile: this.clientCapabilities?.fs?.readTextFile ?? false,
135-
});
131+
const {text: userText, images} = await acpContentToUserMessage(
132+
params.prompt,
133+
{
134+
conn: this.conn,
135+
sessionId: params.sessionId,
136+
canReadTextFile: this.clientCapabilities?.fs?.readTextFile ?? false,
137+
},
138+
);
136139
logger.info(
137-
`ACP prompt: session=${params.sessionId} text=${userText.slice(0, 100)}`,
140+
`ACP prompt: session=${params.sessionId} text=${userText.slice(0, 100)} images=${images.length}`,
138141
);
139142

140-
session.messages = [...session.messages, {role: 'user', content: userText}];
143+
session.messages = [
144+
...session.messages,
145+
{
146+
role: 'user',
147+
content: userText,
148+
...(images.length > 0 ? {images} : {}),
149+
},
150+
];
141151

142152
const config = getAppConfig();
143153
const nonInteractiveAlwaysAllow = config.alwaysAllow ?? [];

source/acp/acp-content.spec.ts

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@ import {tmpdir} from 'node:os';
33
import {join} from 'node:path';
44
import {pathToFileURL} from 'node:url';
55
import test from 'ava';
6-
import {acpContentToUserText} from '@/acp/acp-content';
6+
import {
7+
acpContentToUserMessage,
8+
acpContentToUserText,
9+
} from '@/acp/acp-content';
710

811
console.log('\nacp-content.spec.ts');
912

@@ -34,11 +37,30 @@ test('acpContentToUserText - preserves exact text content', async t => {
3437
t.is(result, specialText);
3538
});
3639

37-
test('acpContentToUserText - notes unsupported image attachments instead of dropping', async t => {
38-
const result = await acpContentToUserText([
40+
test('acpContentToUserMessage - extracts supported image blocks as attachments', async t => {
41+
const result = await acpContentToUserMessage([
42+
{type: 'text', text: 'look at this'},
3943
{type: 'image', data: 'abc', mimeType: 'image/png'} as any,
4044
]);
41-
t.true(result.includes('image'));
45+
t.is(result.text, 'look at this');
46+
t.deepEqual(result.images, [
47+
{data: 'abc', mediaType: 'image/png', source: 'acp'},
48+
]);
49+
});
50+
51+
test('acpContentToUserMessage - notes unsupported image media types instead of sending', async t => {
52+
const result = await acpContentToUserMessage([
53+
{type: 'image', data: 'abc', mimeType: 'image/tiff'} as any,
54+
]);
55+
t.is(result.images.length, 0);
56+
t.true(result.text.toLowerCase().includes('omitted'));
57+
});
58+
59+
test('acpContentToUserText - still notes audio attachments instead of dropping', async t => {
60+
const result = await acpContentToUserText([
61+
{type: 'audio', data: 'abc', mimeType: 'audio/wav'} as any,
62+
]);
63+
t.true(result.includes('audio'));
4264
t.true(result.toLowerCase().includes('omitted'));
4365
});
4466

source/acp/acp-content.ts

Lines changed: 65 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,32 +6,48 @@ import type {
66
EmbeddedResource,
77
ResourceLink,
88
} from '@agentclientprotocol/sdk';
9+
import type {ImageAttachment} from '@/types/core';
910
import {getLogger} from '@/utils/logging';
1011

1112
const logger = getLogger();
1213

14+
/** Image media types the providers accept; others are noted rather than sent. */
15+
const SUPPORTED_IMAGE_MEDIA_TYPES = new Set([
16+
'image/png',
17+
'image/jpeg',
18+
'image/gif',
19+
'image/webp',
20+
]);
21+
1322
export interface AcpContentContext {
1423
conn: AgentSideConnection;
1524
sessionId: string;
1625
/** Whether the client advertised the `fs.readTextFile` capability. */
1726
canReadTextFile: boolean;
1827
}
1928

29+
/** A resolved ACP prompt: model-visible text plus any image attachments. */
30+
export interface AcpUserMessage {
31+
text: string;
32+
images: ImageAttachment[];
33+
}
34+
2035
/**
21-
* Convert a prompt's content blocks into the plain text the model receives.
36+
* Convert a prompt's content blocks into the user message the model receives.
2237
*
2338
* Text blocks are concatenated directly (preserving the client's own
24-
* splitting). Non-text blocks - embedded resources and `@`-mentioned file
25-
* links - are resolved into readable sections appended after the prompt text so
26-
* the model can actually see tagged files. Attachments we cannot process
27-
* (images, audio) are noted rather than silently dropped.
39+
* splitting). Embedded resources and `@`-mentioned file links are resolved into
40+
* readable sections appended after the prompt text. Image blocks are collected
41+
* as multimodal attachments; audio (and unsupported image types) are noted
42+
* rather than silently dropped.
2843
*/
29-
export async function acpContentToUserText(
44+
export async function acpContentToUserMessage(
3045
prompt: ContentBlock[],
3146
ctx?: AcpContentContext,
32-
): Promise<string> {
47+
): Promise<AcpUserMessage> {
3348
let text = '';
3449
const sections: string[] = [];
50+
const images: ImageAttachment[] = [];
3551

3652
for (const block of prompt) {
3753
switch (block.type) {
@@ -44,22 +60,58 @@ export async function acpContentToUserText(
4460
case 'resource_link':
4561
sections.push(await renderResourceLink(block, ctx));
4662
break;
47-
case 'image':
63+
case 'image': {
64+
const image = toImageAttachment(block);
65+
if (image) {
66+
images.push(image);
67+
} else {
68+
sections.push(
69+
`[Attached image omitted: unsupported media type ${
70+
'mimeType' in block ? block.mimeType : 'unknown'
71+
}]`,
72+
);
73+
}
74+
break;
75+
}
4876
case 'audio':
4977
sections.push(
50-
`[Attached ${block.type} omitted: nanocoder cannot process ${block.type} content over ACP yet]`,
78+
`[Attached audio omitted: nanocoder cannot process audio content over ACP yet]`,
5179
);
5280
break;
5381
default:
5482
break;
5583
}
5684
}
5785

58-
if (sections.length === 0) {
59-
return text;
60-
}
86+
const resolvedText =
87+
sections.length === 0
88+
? text
89+
: [text, ...sections].filter(part => part.length > 0).join('\n\n');
90+
91+
return {text: resolvedText, images};
92+
}
6193

62-
return [text, ...sections].filter(part => part.length > 0).join('\n\n');
94+
/**
95+
* Text-only view of {@link acpContentToUserMessage}, kept for callers and tests
96+
* that only need the model-visible prose.
97+
*/
98+
export async function acpContentToUserText(
99+
prompt: ContentBlock[],
100+
ctx?: AcpContentContext,
101+
): Promise<string> {
102+
return (await acpContentToUserMessage(prompt, ctx)).text;
103+
}
104+
105+
/** Build an image attachment from an ACP image block, or null if unusable. */
106+
function toImageAttachment(block: {
107+
data?: string;
108+
mimeType?: string;
109+
}): ImageAttachment | null {
110+
const {data, mimeType} = block;
111+
if (!data || !mimeType || !SUPPORTED_IMAGE_MEDIA_TYPES.has(mimeType)) {
112+
return null;
113+
}
114+
return {data, mediaType: mimeType, source: 'acp'};
63115
}
64116

65117
function renderEmbeddedResource(block: EmbeddedResource): string {

source/ai-sdk-client/converters/message-converter.spec.ts

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,48 @@ test('convertToModelMessages converts user message', t => {
8484
t.is(result[0].content, 'Hello');
8585
});
8686

87+
test('convertToModelMessages emits image parts for a user message with attachments', t => {
88+
const messages: Message[] = [
89+
{
90+
role: 'user',
91+
content: 'what is in this screenshot?',
92+
images: [{data: 'BASE64DATA', mediaType: 'image/png'}],
93+
},
94+
];
95+
96+
const result = convertToModelMessages(messages);
97+
t.is(result.length, 1);
98+
t.is(result[0].role, 'user');
99+
const content = result[0].content as Array<Record<string, unknown>>;
100+
t.true(Array.isArray(content));
101+
t.is(content[0].type, 'text');
102+
t.is(content[0].text, 'what is in this screenshot?');
103+
t.is(content[1].type, 'image');
104+
t.is(content[1].image, 'data:image/png;base64,BASE64DATA');
105+
t.is(content[1].mediaType, 'image/png');
106+
});
107+
108+
test('convertToModelMessages keeps image-only user messages without a text part', t => {
109+
const messages: Message[] = [
110+
{
111+
role: 'user',
112+
content: '',
113+
images: [{data: 'IMG', mediaType: 'image/jpeg'}],
114+
},
115+
];
116+
117+
const result = convertToModelMessages(messages);
118+
const content = result[0].content as Array<Record<string, unknown>>;
119+
t.is(content.length, 1);
120+
t.is(content[0].type, 'image');
121+
});
122+
123+
test('convertToModelMessages leaves text-only user messages as plain strings', t => {
124+
const messages: Message[] = [{role: 'user', content: 'plain text'}];
125+
const result = convertToModelMessages(messages);
126+
t.is(result[0].content, 'plain text');
127+
});
128+
87129
test('convertToModelMessages converts assistant message with text', t => {
88130
const messages: Message[] = [
89131
{

0 commit comments

Comments
 (0)