Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@

- Large **refactor and dead-code sweep**: unified tool-call ID generation, extracted a shared `useWizardForm` hook, consolidated session-override managers, added message-factory helpers, deduped config loaders / git exec / command dispatch / conversation-loop flush, extracted `StyledSelectInput` and `makeSimpleToolFormatter`, and deleted several dead modules and orphaned exports (including `fetch-local-models` orphaned by an earlier removal).

- Added **image attachments on user messages**. Paste an image from the clipboard with **Ctrl+V** (via `osascript` on macOS, `wl-paste`/`xclip` on Linux, PowerShell on Windows), drag an image file into the terminal, or type a path — quoted, unquoted, and macOS backslash-escaped paths are all recognised, and `http(s)` URLs are left untouched. Attachments (PNG/JPEG/GIF/WebP, ≤10 MB) show above the input box and **Ctrl+X** removes the last one; references are stripped from the text and sent as image parts to vision-capable models. A missing clipboard tool now logs a debug breadcrumb instead of silently no-op'ing. Thanks to @ragini-pandey. Closes #572.

- Updated the **Nanocoder Battlemap** competitive comparison and refreshed the README and docs.

- Dependency updates: `ai` 6.0.174 -> 6.0.193, `@ai-sdk/openai`, `@ai-sdk/openai-compatible`, `@ai-sdk/anthropic`, `@agentclientprotocol/sdk` 0.22.1 -> 0.25.0, `undici`, `esbuild`, `@biomejs/biome` 2.5.0, `lint-staged` 17, `@types/node`, `@types/vscode`, and other transitive bumps tracked through the lockfile. Added `clipboardy ^5.3.1` for `/copy`.
Expand Down
57 changes: 57 additions & 0 deletions docs/features/image-attachments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: "Image Attachments"
description: "Attach screenshots and images to your messages so vision-capable models can see them"
sidebar_order: 12
---

# Image Attachments

Nanocoder can send images alongside your text so a vision-capable model can look at a screenshot, a diagram, or a design mockup. Attachments are gathered as you compose a message and sent with the next prompt you submit.

## Attaching an Image

There are three ways to attach an image:

| Method | How |
|--------|-----|
| Clipboard paste | Copy an image, then press **Ctrl+V** in the input |
| Drag and drop | Drag an image file into the terminal |
| File path | Type or paste a path to an image file |

For drag-and-drop and typed paths, the path can be **quoted, unquoted, or backslash-escaped**. macOS terminals drop a dragged screenshot in as an unquoted path with escaped spaces (e.g. `Screenshot\ 2026-06-21\ at\ 10.04.32.png`) — that form is recognised without you needing to add quotes. The image reference is stripped from your message text before it's sent, so the model receives the picture rather than a file path.

Remote `http(s)://` URLs that end in an image extension are left as plain text — they are not fetched or treated as local files.

## Managing Attachments

Pending attachments are listed just above the input box:

```
[image #1: Screenshot 2026-06-21.png] [image #2: clipboard] · ctrl-x remove last
```

- **Ctrl+X** removes the most recently added attachment.
- Attachments are cleared once the message is submitted.

## Supported Formats

PNG, JPEG, GIF, and WebP are accepted (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`). Each image must be **10 MB or smaller**; larger files are skipped.

Whether the image is actually understood depends on the model — attach images only when your provider/model supports vision. If a model can't accept images, it will report or error on its own.

## Clipboard Requirements by Platform

Clipboard paste (**Ctrl+V**) shells out to a platform tool to read the image. If the tool isn't installed, the paste is a no-op and a one-line note is written to the debug log naming the missing command.

| Platform | Required tool |
|----------|---------------|
| macOS | `osascript` (built in) |
| Linux (Wayland) | `wl-paste` |
| Linux (X11) | `xclip` |
| Windows | PowerShell |

On a minimal Linux container without `wl-paste` or `xclip` — common in dev containers and CI — clipboard paste won't work; attach by drag-and-drop or file path instead, or install one of the tools.

## See Also

- [Keyboard Shortcuts](keyboard-shortcuts.md) — full shortcut reference, including the image bindings.
5 changes: 5 additions & 0 deletions docs/features/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Prefix any command with **`!`** to run it directly in your shell without leaving
!npm test
```

### Attaching Images

Press **Ctrl+V** to paste an image from your clipboard, or drag an image file into the terminal, to send it to a vision-capable model. Pending attachments show above the input box; **Ctrl+X** removes the last one. See [Image Attachments](image-attachments.md) for supported formats and platform requirements.

### Keyboard Shortcuts

These are the shortcuts you'll use constantly:
Expand Down Expand Up @@ -231,6 +235,7 @@ Extend Nanocoder's capabilities by connecting [MCP (Model Context Protocol) serv
| [Session Management](session-management.md) | Automatic session saving and resumption |
| [Task Management](task-management.md) | Tracking multi-step work |
| [File Explorer](file-explorer.md) | Interactive file browser for context selection |
| [Image Attachments](image-attachments.md) | Send screenshots and images to vision-capable models |
| [VS Code Extension](vscode-extension.md) | Editor integration with live diff previews |
| [ACP](acp.md) | Run as an Agent Client Protocol server for editors like Zed |
| [Tune](tune.md) | Runtime model tuning for tool profiles, parameters, and compaction |
Expand Down
9 changes: 9 additions & 0 deletions docs/features/keyboard-shortcuts.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,15 @@ This page covers the main chat input and common interactive views. Some speciali

When typing `@` for file mentions or `/` for commands, Tab accepts the current suggestion. If there are multiple command matches, the first Tab shows the completion list and pressing Tab again accepts the first result.

## Image Attachments

| Action | Shortcut |
|--------|----------|
| Paste image from clipboard | Ctrl+V |
| Remove last attached image | Ctrl+X |

Ctrl+V pulls an image off the system clipboard and adds it as an attachment. You can also attach an image by typing, pasting, or dragging an image file path into the input — quoted, unquoted, and macOS backslash-escaped paths (e.g. `Screenshot\ 2026.png`) are all recognised. Attachments appear above the input box as `[image #1: …]`; Ctrl+X drops the most recently added one. See [Image Attachments](image-attachments.md) for the full feature, including supported formats and platform requirements.

## History & Navigation

| Action | Shortcut |
Expand Down
26 changes: 18 additions & 8 deletions source/acp/acp-agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import {
getAvailableModes,
negotiateProtocolVersion,
} from '@/acp/acp-capabilities';
import {acpContentToUserText} from '@/acp/acp-content';
import {acpContentToUserMessage} from '@/acp/acp-content';
import {runAcpConversation} from '@/acp/acp-conversation';
import {AcpSession} from '@/acp/acp-session';
import type {AcpInitContext} from '@/acp/acp-types';
Expand Down Expand Up @@ -128,16 +128,26 @@ export class AcpAgent implements Agent {
);
}

const userText = await acpContentToUserText(params.prompt, {
conn: this.conn,
sessionId: params.sessionId,
canReadTextFile: this.clientCapabilities?.fs?.readTextFile ?? false,
});
const {text: userText, images} = await acpContentToUserMessage(
params.prompt,
{
conn: this.conn,
sessionId: params.sessionId,
canReadTextFile: this.clientCapabilities?.fs?.readTextFile ?? false,
},
);
logger.info(
`ACP prompt: session=${params.sessionId} text=${userText.slice(0, 100)}`,
`ACP prompt: session=${params.sessionId} text=${userText.slice(0, 100)} images=${images.length}`,
);

session.messages = [...session.messages, {role: 'user', content: userText}];
session.messages = [
...session.messages,
{
role: 'user',
content: userText,
...(images.length > 0 ? {images} : {}),
},
];

const config = getAppConfig();
const nonInteractiveAlwaysAllow = config.alwaysAllow ?? [];
Expand Down
30 changes: 26 additions & 4 deletions source/acp/acp-content.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ import {tmpdir} from 'node:os';
import {join} from 'node:path';
import {pathToFileURL} from 'node:url';
import test from 'ava';
import {acpContentToUserText} from '@/acp/acp-content';
import {
acpContentToUserMessage,
acpContentToUserText,
} from '@/acp/acp-content';

console.log('\nacp-content.spec.ts');

Expand Down Expand Up @@ -34,11 +37,30 @@ test('acpContentToUserText - preserves exact text content', async t => {
t.is(result, specialText);
});

test('acpContentToUserText - notes unsupported image attachments instead of dropping', async t => {
const result = await acpContentToUserText([
test('acpContentToUserMessage - extracts supported image blocks as attachments', async t => {
const result = await acpContentToUserMessage([
{type: 'text', text: 'look at this'},
{type: 'image', data: 'abc', mimeType: 'image/png'} as any,
]);
t.true(result.includes('image'));
t.is(result.text, 'look at this');
t.deepEqual(result.images, [
{data: 'abc', mediaType: 'image/png', source: 'acp'},
]);
});

test('acpContentToUserMessage - notes unsupported image media types instead of sending', async t => {
const result = await acpContentToUserMessage([
{type: 'image', data: 'abc', mimeType: 'image/tiff'} as any,
]);
t.is(result.images.length, 0);
t.true(result.text.toLowerCase().includes('omitted'));
});

test('acpContentToUserText - still notes audio attachments instead of dropping', async t => {
const result = await acpContentToUserText([
{type: 'audio', data: 'abc', mimeType: 'audio/wav'} as any,
]);
t.true(result.includes('audio'));
t.true(result.toLowerCase().includes('omitted'));
});

Expand Down
78 changes: 65 additions & 13 deletions source/acp/acp-content.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,48 @@ import type {
EmbeddedResource,
ResourceLink,
} from '@agentclientprotocol/sdk';
import type {ImageAttachment} from '@/types/core';
import {getLogger} from '@/utils/logging';

const logger = getLogger();

/** Image media types the providers accept; others are noted rather than sent. */
const SUPPORTED_IMAGE_MEDIA_TYPES = new Set([
'image/png',
'image/jpeg',
'image/gif',
'image/webp',
]);

export interface AcpContentContext {
conn: AgentSideConnection;
sessionId: string;
/** Whether the client advertised the `fs.readTextFile` capability. */
canReadTextFile: boolean;
}

/** A resolved ACP prompt: model-visible text plus any image attachments. */
export interface AcpUserMessage {
text: string;
images: ImageAttachment[];
}

/**
* Convert a prompt's content blocks into the plain text the model receives.
* Convert a prompt's content blocks into the user message the model receives.
*
* Text blocks are concatenated directly (preserving the client's own
* splitting). Non-text blocks - embedded resources and `@`-mentioned file
* links - are resolved into readable sections appended after the prompt text so
* the model can actually see tagged files. Attachments we cannot process
* (images, audio) are noted rather than silently dropped.
* splitting). Embedded resources and `@`-mentioned file links are resolved into
* readable sections appended after the prompt text. Image blocks are collected
* as multimodal attachments; audio (and unsupported image types) are noted
* rather than silently dropped.
*/
export async function acpContentToUserText(
export async function acpContentToUserMessage(
prompt: ContentBlock[],
ctx?: AcpContentContext,
): Promise<string> {
): Promise<AcpUserMessage> {
let text = '';
const sections: string[] = [];
const images: ImageAttachment[] = [];

for (const block of prompt) {
switch (block.type) {
Expand All @@ -44,22 +60,58 @@ export async function acpContentToUserText(
case 'resource_link':
sections.push(await renderResourceLink(block, ctx));
break;
case 'image':
case 'image': {
const image = toImageAttachment(block);
if (image) {
images.push(image);
} else {
sections.push(
`[Attached image omitted: unsupported media type ${
'mimeType' in block ? block.mimeType : 'unknown'
}]`,
);
}
break;
}
case 'audio':
sections.push(
`[Attached ${block.type} omitted: nanocoder cannot process ${block.type} content over ACP yet]`,
`[Attached audio omitted: nanocoder cannot process audio content over ACP yet]`,
);
break;
default:
break;
}
}

if (sections.length === 0) {
return text;
}
const resolvedText =
sections.length === 0
? text
: [text, ...sections].filter(part => part.length > 0).join('\n\n');

return {text: resolvedText, images};
}

return [text, ...sections].filter(part => part.length > 0).join('\n\n');
/**
* Text-only view of {@link acpContentToUserMessage}, kept for callers and tests
* that only need the model-visible prose.
*/
export async function acpContentToUserText(
prompt: ContentBlock[],
ctx?: AcpContentContext,
): Promise<string> {
return (await acpContentToUserMessage(prompt, ctx)).text;
}

/** Build an image attachment from an ACP image block, or null if unusable. */
function toImageAttachment(block: {
data?: string;
mimeType?: string;
}): ImageAttachment | null {
const {data, mimeType} = block;
if (!data || !mimeType || !SUPPORTED_IMAGE_MEDIA_TYPES.has(mimeType)) {
return null;
}
return {data, mediaType: mimeType, source: 'acp'};
}

function renderEmbeddedResource(block: EmbeddedResource): string {
Expand Down
42 changes: 42 additions & 0 deletions source/ai-sdk-client/converters/message-converter.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,48 @@ test('convertToModelMessages converts user message', t => {
t.is(result[0].content, 'Hello');
});

test('convertToModelMessages emits image parts for a user message with attachments', t => {
const messages: Message[] = [
{
role: 'user',
content: 'what is in this screenshot?',
images: [{data: 'BASE64DATA', mediaType: 'image/png'}],
},
];

const result = convertToModelMessages(messages);
t.is(result.length, 1);
t.is(result[0].role, 'user');
const content = result[0].content as Array<Record<string, unknown>>;
t.true(Array.isArray(content));
t.is(content[0].type, 'text');
t.is(content[0].text, 'what is in this screenshot?');
t.is(content[1].type, 'image');
t.is(content[1].image, 'data:image/png;base64,BASE64DATA');
t.is(content[1].mediaType, 'image/png');
});

test('convertToModelMessages keeps image-only user messages without a text part', t => {
const messages: Message[] = [
{
role: 'user',
content: '',
images: [{data: 'IMG', mediaType: 'image/jpeg'}],
},
];

const result = convertToModelMessages(messages);
const content = result[0].content as Array<Record<string, unknown>>;
t.is(content.length, 1);
t.is(content[0].type, 'image');
});

test('convertToModelMessages leaves text-only user messages as plain strings', t => {
const messages: Message[] = [{role: 'user', content: 'plain text'}];
const result = convertToModelMessages(messages);
t.is(result[0].content, 'plain text');
});

test('convertToModelMessages converts assistant message with text', t => {
const messages: Message[] = [
{
Expand Down
Loading
Loading