Skip to content

feat: support image attachments on user messages#572

Merged
will-lamerton merged 5 commits into
Nano-Collective:mainfrom
ragini-pandey:feature/image-attachments
Jun 22, 2026
Merged

feat: support image attachments on user messages#572
will-lamerton merged 5 commits into
Nano-Collective:mainfrom
ragini-pandey:feature/image-attachments

Conversation

@ragini-pandey

@ragini-pandey ragini-pandey commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Description

Adds multimodal image input to user messages across all chat surfaces (interactive TUI, VS Code prompt path, and ACP). Images are carried as base64 on the internal Message type and converted to AI SDK image parts (a data: URL) at the provider boundary, which Anthropic, Google, and OpenAI-compatible providers all accept.

Highlights:

  • New ImageAttachment type threaded through the submit chain (useChatHandler, useAppHandlers, app-util, message-builder, chat-input/user-input).
  • Clipboard paste via Ctrl+V (macOS osascript, Linux wl-paste/xclip, Windows PowerShell).
  • Image file paths typed/pasted/dragged into the terminal are resolved to attachments and stripped from the message text.
  • ACP image content blocks collected as attachments; unsupported media types and audio are noted rather than silently dropped.
  • modelSupportsVision() heuristic drives a non-blocking warning when the active model likely can't see images (the image is still sent).
  • UserInput lists pending attachments (Ctrl+X removes the last); UserMessage shows an attached-image count.

Note: the ACP and message-converter portions existed in a partially-applied, non-compiling state on the base; this PR completes them (adds the missing ImageAttachment type they imported and restores a dropped return in the user branch of convertToModelMessages).

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Testing

Automated Tests

  • New features include passing tests in .spec.ts/tsx files
  • All existing tests pass (pnpm test:all completes successfully)
  • Tests cover both success and error scenarios

pnpm test:types and pnpm test:lint pass, and all new/updated specs pass (clipboard-image, vision-support, message-converter, acp-content). pnpm test:all currently fails on 3 pre-existing tests in source/utils/tool-result-display.spec.tsx that also fail on main and are unrelated to this change.

Manual Testing

  • Tested with Ollama
  • Tested with OpenRouter
  • Tested with OpenAI-compatible API
  • Tested MCP integration (if applicable)

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)
  • No breaking changes (or clearly documented)
  • Appropriate logging added using structured logging (see CONTRIBUTING.md)
Screen.Recording.2026-06-14.at.6.40.47.PM.mov

Add multimodal image input across the chat surfaces:

- New ImageAttachment type carried on user messages, converted to AI SDK
  image parts at the provider boundary (data URL accepted by Anthropic,
  Google, and OpenAI-compatible providers).
- Clipboard paste (Ctrl+V) and drag/typed image file paths become
  attachments; image path tokens are stripped from the message text.
- ACP image content blocks are collected as attachments; unsupported
  media types and audio are noted rather than silently dropped.
- modelSupportsVision() heuristic drives a non-blocking warning when the
  active model likely cannot see images (the image is still sent).
- UserInput shows pending attachments with Ctrl+X to remove the last;
  UserMessage shows an attached-image count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread source/utils/clipboard-image.spec.ts Fixed
Resolves the CodeQL "incomplete string escaping" alert: the test built
its escaped path by replacing spaces only, leaving any pre-existing
backslash unescaped. Escape backslashes first, then spaces, so the
encoding is complete for any path. Behavior is unchanged for the
temp paths under test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@will-lamerton

Copy link
Copy Markdown
Member

Hey @ragini-pandey - this is a brilliant PR. Thank you for building this. It works well and below are mostly some quality of life comments :)

  1. The picture emoji with "1 image attached" - if we could remove that, it would be great as we don't tend to use emojis in the design palette of Nanocoder. Something like this could be okay: ■ :)

  2. On my Macbook upon dragging in a screenshot, it doesn't wrap the path with quote marks and therefore it doesn't recognise that there is an image attached. Upon dragging, it needs to either wrap the path with quote marks automatically or work without needing quote marks.

  3. I don't think this warning is needed - the model should report or error if it can't support images. Although I like this warning, it shows every time I attach an image which is overkill.

  4. readClipboardImage in source/utils/clipboard-image.ts returns null silently when the underlying tooling isn't installed - osascript on macOS, wl-paste / xclip on Linux, or PowerShell on Windows. On a minimal Linux container without wl-paste or xclip (which is super common in dev containers and CI boxes) the user gets no feedback that Ctrl+V is a no-op. Could we log a one-liner at debug level naming the missing command, or at least surface a small note in the status bar / footer so it's not a black box?

  5. source/utils/clipboard-image.ts has a 10 MB maxBuffer on the spawnSync calls (around line 160 with MAX_IMAGE_BYTES). If a pasted screenshot exceeds that, the child gets killed and readClipboardImage returns null with only a logger.warn - the user sees nothing. For the typical "drag a screenshot in" flow this is borderline in scope, but a soft "image too large" hint in the input footer would make the failure mode discoverable. Worth a follow-up if not for this PR.

  6. extractImageReferences in source/utils/clipboard-image.ts (around line 93) will happily match https://example.com/chart.png style URLs and call existsSync on them. It's harmless (the existsSync returns false and the token is left in place), but it's a stat per URL-like token and it slightly leaks FS access into message parsing. Not a blocker - more of a tidy-up for a future pass.

Love this though. Cannot wait to merge! Let me know if there are any other questions or thoughts!

@will-lamerton

Copy link
Copy Markdown
Member

Hey @ragini-pandey - just wondered if my above feedback was all okay for you? Happy to jump in and help if needs be. Looking forward to merging this :)

@ragini-pandey

Copy link
Copy Markdown
Contributor Author

Hey, sorry for the delay. I was out of town.
Thanks for taking the time to review and share the feedback. I'm looking forward to addressing the comments and getting this merged. Appreciate the help! 🙂

- Drop the 🖼 emoji from the "image attached" label (use ■ glyph)
- Recognise unquoted macOS dragged paths with backslash-escaped spaces
- Remove the per-attach "model may not support images" warning and the
  now-dead modelSupportsVision/visionSupported plumbing
- Log a debug line naming the missing clipboard tool (osascript /
  wl-paste / xclip / powershell) so Ctrl+V is not a silent no-op
- Skip http(s) URLs in extractImageReferences before touching the FS

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ragini-pandey

ragini-pandey commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Thank you @will-lamerton! 🙏 All of these made sense

  1. Emoji — dropped the 🖼; the label now reads ■ 1 image attached.

  2. macOS drag without quotes — fixed. On a Mac the terminal drops the path in unquoted with spaces backslash-escaped (Screenshot\ 2026...\ PM.png), and the unquoted-token matcher was stopping at the first space. It now accepts \ sequences inside a path token, so a dragged screenshot is recognised without needing quotes. Added a test for an escaped-space path embedded in prose.

  3. Warning on every attach — removed it entirely, per your point that the model itself will report/error. It was the only consumer of the modelSupportsVision heuristic, so I deleted that whole chain (vision-support.ts + spec + the visionSupported prop plumbing) rather than leave dead code behind. Easy to bring back a once-per-session version if you'd prefer a lighter heads-up.

  4. Silent no-op when tooling missing — now logs a debug one-liner naming the missing command (osascript / wl-paste / xclip / powershell). On Linux it only fires when both wl-paste and xclip are absent, so dev containers/CI boxes get a breadcrumb instead of a black box.

  5. existsSync on URLsextractImageReferences now bails on http(s):// tokens before touching the filesystem, so https://example.com/chart.png no longer triggers a stat during message parsing. Added a test.

  6. Oversized-image footer hint — left as the follow-up you suggested; it's out of scope for this PR. The oversized path still logs a warn, just no footer surfacing yet. Happy to pick it up in a follow-up.

Thanks again for the thorough review!

ragini-pandey and others added 2 commits June 22, 2026 00:41
…hments

# Conflicts:
#	source/components/user-input.tsx
- New docs/features/image-attachments.md: clipboard paste (Ctrl+V),
  drag-and-drop, typed paths (quoted/unquoted/escaped), Ctrl+X to remove,
  supported formats (PNG/JPEG/GIF/WebP, ≤10 MB), and per-platform
  clipboard tool requirements
- Add image bindings to the keyboard-shortcuts reference
- Add an "Attaching Images" subsection and reference-table row to the
  features index
- Add a CHANGELOG entry for the feature

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@will-lamerton

Copy link
Copy Markdown
Member

Thanks for these changes - brilliant PR :)

@will-lamerton will-lamerton merged commit 7e778d8 into Nano-Collective:main Jun 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants