Skip to content

Commit 310e7b4

Browse files
patnikoCopilot
andauthored
docs: add image input guide (#719)
* docs: add image input guide Add docs/guides/image-input.md documenting how to send images/visual input to Copilot sessions via file attachments. Covers: - Quick start examples in all 4 SDK languages (TS, Python, Go, .NET) - Supported image formats (JPG, PNG, GIF, and other common types) - Automatic image processing by the runtime (resizing, quality reduction) - Vision model capability fields for checking support - Receiving image results from tool execution content blocks - Practical tips and limitations All code blocks verified via docs validation. All claims cross-checked against SDK source code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: add docs-validate skip for Python snippet in streaming-events guide The Python subscription example uses 'session' without defining it, which fails mypy validation in CI. Add skip directive to match the Go and .NET snippets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 9595cda commit 310e7b4

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed

docs/guides/image-input.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Image Input
2+
3+
Send images to Copilot sessions by attaching them as file attachments. The runtime reads the file from disk, converts it to base64 internally, and sends it to the LLM as an image content block — no manual encoding required.
4+
5+
## Overview
6+
7+
```mermaid
8+
sequenceDiagram
9+
participant App as Your App
10+
participant SDK as SDK Session
11+
participant RT as Copilot Runtime
12+
participant LLM as Vision Model
13+
14+
App->>SDK: send({ prompt, attachments: [{ type: "file", path }] })
15+
SDK->>RT: JSON-RPC with file attachment
16+
RT->>RT: Read file from disk
17+
RT->>RT: Detect image, convert to base64
18+
RT->>RT: Resize if needed (model-specific limits)
19+
RT->>LLM: image_url content block (base64)
20+
LLM-->>RT: Response referencing the image
21+
RT-->>SDK: assistant.message events
22+
SDK-->>App: event stream
23+
```
24+
25+
| Concept | Description |
26+
|---------|-------------|
27+
| **File attachment** | An attachment with `type: "file"` and an absolute `path` to an image on disk |
28+
| **Automatic encoding** | The runtime reads the image, converts it to base64, and sends it as an `image_url` block |
29+
| **Auto-resize** | The runtime automatically resizes or quality-reduces images that exceed model-specific limits |
30+
| **Vision capability** | The model must have `capabilities.supports.vision = true` to process images |
31+
32+
## Quick Start
33+
34+
Attach an image file to any message using the file attachment type. The path must be an absolute path to an image on disk.
35+
36+
<details open>
37+
<summary><strong>Node.js / TypeScript</strong></summary>
38+
39+
```typescript
40+
import { CopilotClient } from "@github/copilot-sdk";
41+
42+
const client = new CopilotClient();
43+
await client.start();
44+
45+
const session = await client.createSession({
46+
model: "gpt-4.1",
47+
onPermissionRequest: async () => ({ kind: "approved" }),
48+
});
49+
50+
await session.send({
51+
prompt: "Describe what you see in this image",
52+
attachments: [
53+
{
54+
type: "file",
55+
path: "/absolute/path/to/screenshot.png",
56+
},
57+
],
58+
});
59+
```
60+
61+
</details>
62+
63+
<details>
64+
<summary><strong>Python</strong></summary>
65+
66+
```python
67+
from copilot import CopilotClient
68+
from copilot.types import PermissionRequestResult
69+
70+
client = CopilotClient()
71+
await client.start()
72+
73+
session = await client.create_session({
74+
"model": "gpt-4.1",
75+
"on_permission_request": lambda req, inv: PermissionRequestResult(kind="approved"),
76+
})
77+
78+
await session.send({
79+
"prompt": "Describe what you see in this image",
80+
"attachments": [
81+
{
82+
"type": "file",
83+
"path": "/absolute/path/to/screenshot.png",
84+
},
85+
],
86+
})
87+
```
88+
89+
</details>
90+
91+
<details>
92+
<summary><strong>Go</strong></summary>
93+
94+
<!-- docs-validate: skip -->
95+
```go
96+
ctx := context.Background()
97+
client := copilot.NewClient(nil)
98+
client.Start(ctx)
99+
100+
session, _ := client.CreateSession(ctx, &copilot.SessionConfig{
101+
Model: "gpt-4.1",
102+
OnPermissionRequest: func(req copilot.PermissionRequest, inv copilot.PermissionInvocation) (copilot.PermissionRequestResult, error) {
103+
return copilot.PermissionRequestResult{Kind: copilot.PermissionRequestResultKindApproved}, nil
104+
},
105+
})
106+
107+
path := "/absolute/path/to/screenshot.png"
108+
session.Send(ctx, copilot.MessageOptions{
109+
Prompt: "Describe what you see in this image",
110+
Attachments: []copilot.Attachment{
111+
{
112+
Type: copilot.File,
113+
Path: &path,
114+
},
115+
},
116+
})
117+
```
118+
119+
</details>
120+
121+
<details>
122+
<summary><strong>.NET</strong></summary>
123+
124+
<!-- docs-validate: skip -->
125+
```csharp
126+
using GitHub.Copilot.SDK;
127+
128+
await using var client = new CopilotClient();
129+
await using var session = await client.CreateSessionAsync(new SessionConfig
130+
{
131+
Model = "gpt-4.1",
132+
OnPermissionRequest = (req, inv) =>
133+
Task.FromResult(new PermissionRequestResult { Kind = PermissionRequestResultKind.Approved }),
134+
});
135+
136+
await session.SendAsync(new MessageOptions
137+
{
138+
Prompt = "Describe what you see in this image",
139+
Attachments = new List<UserMessageDataAttachmentsItem>
140+
{
141+
new UserMessageDataAttachmentsItemFile
142+
{
143+
Path = "/absolute/path/to/screenshot.png",
144+
DisplayName = "screenshot.png",
145+
},
146+
},
147+
});
148+
```
149+
150+
</details>
151+
152+
## Supported Formats
153+
154+
Supported image formats include JPG, PNG, GIF, and other common image types. The runtime reads the image from disk and converts it as needed before sending to the LLM. Use PNG or JPEG for best results, as these are the most widely supported formats.
155+
156+
The model's `capabilities.limits.vision.supported_media_types` field lists the exact MIME types it accepts.
157+
158+
## Automatic Processing
159+
160+
The runtime automatically processes images to fit within the model's constraints. No manual resizing is required.
161+
162+
- Images that exceed the model's dimension or size limits are automatically resized (preserving aspect ratio) or quality-reduced.
163+
- If an image cannot be brought within limits after processing, it is skipped and not sent to the LLM.
164+
- The model's `capabilities.limits.vision.max_prompt_image_size` field indicates the maximum image size in bytes.
165+
166+
You can check these limits at runtime via the model capabilities object. For the best experience, use reasonably-sized PNG or JPEG images.
167+
168+
## Vision Model Capabilities
169+
170+
Not all models support vision. Check the model's capabilities before sending images.
171+
172+
### Capability fields
173+
174+
| Field | Type | Description |
175+
|-------|------|-------------|
176+
| `capabilities.supports.vision` | `boolean` | Whether the model can process image inputs |
177+
| `capabilities.limits.vision.supported_media_types` | `string[]` | MIME types the model accepts (e.g., `["image/png", "image/jpeg"]`) |
178+
| `capabilities.limits.vision.max_prompt_images` | `number` | Maximum number of images per prompt |
179+
| `capabilities.limits.vision.max_prompt_image_size` | `number` | Maximum image size in bytes |
180+
181+
### Vision limits type
182+
183+
<!-- docs-validate: skip -->
184+
```typescript
185+
vision?: {
186+
supported_media_types: string[];
187+
max_prompt_images: number;
188+
max_prompt_image_size: number; // bytes
189+
};
190+
```
191+
192+
## Receiving Image Results
193+
194+
When tools return images (e.g., screenshots or generated charts), the result contains `"image"` content blocks with base64-encoded data.
195+
196+
| Field | Type | Description |
197+
|-------|------|-------------|
198+
| `type` | `"image"` | Content block type discriminator |
199+
| `data` | `string` | Base64-encoded image data |
200+
| `mimeType` | `string` | MIME type (e.g., `"image/png"`) |
201+
202+
These image blocks appear in `tool.execution_complete` event results. See the [Streaming Events](./streaming-events.md) guide for the full event lifecycle.
203+
204+
## Tips & Limitations
205+
206+
| Tip | Details |
207+
|-----|---------|
208+
| **Use PNG or JPEG directly** | Avoids conversion overhead — these are sent to the LLM as-is |
209+
| **Keep images reasonably sized** | Large images may be quality-reduced, which can lose important details |
210+
| **Use absolute paths** | The runtime reads files from disk; relative paths may not resolve correctly |
211+
| **Check vision support first** | Sending images to a non-vision model wastes tokens on the file path without visual understanding |
212+
| **Multiple images are supported** | Attach several file attachments in one message, up to the model's `max_prompt_images` limit |
213+
| **Images are not base64 in your code** | You provide a file path — the runtime handles encoding, resizing, and format conversion |
214+
| **SVG is not supported** | SVG files are text-based and excluded from image processing |
215+
216+
## See Also
217+
218+
- [Streaming Events](./streaming-events.md) — event lifecycle including tool result content blocks
219+
- [Steering & Queueing](./steering-and-queueing.md) — sending follow-up messages with attachments

docs/guides/streaming-events.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ session.on("assistant.message_delta", (event) => {
8282
<details>
8383
<summary><strong>Python</strong></summary>
8484

85+
<!-- docs-validate: skip -->
8586
```python
8687
from copilot.generated.session_events import SessionEventType
8788

0 commit comments

Comments
 (0)