Skip to content

Commit 1479e3e

Browse files
authored
feat: update Grok models and vision handling (#289)
* Update Grok models and vision handling * Fix vision path escaping test
1 parent 9a10bf5 commit 1479e3e

19 files changed

Lines changed: 223 additions & 190 deletions

.cursor/rules/development-workflow.mdc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ bun run start
4242
|----|----|----|
4343
| `GROK_API_KEY` | Yes | API key for xAI Grok |
4444
| `GROK_BASE_URL` | No | Custom API endpoint (default: `https://api.x.ai/v1`) |
45-
| `GROK_MODEL` | No | Model override (default: `grok-4-1-fast`) |
45+
| `GROK_MODEL` | No | Model override (default: `grok-4.3`) |
4646
| `GROK_MAX_TOKENS` | No | Max tokens per response (default: 16384) |
4747

4848
Copy `.env.example` to `.env` and fill in your values.

.cursor/rules/project-overview.mdc

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,11 @@ src/
5353

5454
## Latest Grok Models
5555

56-
- grok-4-0709 (flagship reasoning)
57-
- grok-4.20-beta-0309 (multi-agent, 2M context)
58-
- grok-4-fast (fast reasoning, 2M context)
59-
- grok-4-1-fast (latest fast, 2M context)
60-
- grok-code-fast-1 (code optimized)
61-
- grok-3, grok-3-mini
56+
- grok-4.3 (recommended flagship reasoning)
57+
- grok-4.20-non-reasoning (recommended non-reasoning)
58+
- grok-4.20-multi-agent-0309 (multi-agent, 2M context)
59+
- grok-4.20-0309-reasoning (reasoning, 2M context)
60+
- grok-3-mini (compact model with reasoning effort controls)
6261

6362
## CI/CD
6463

.env.example

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ GROK_API_KEY=your_grok_api_key_here
44
# Optional: Custom API base URL (default: https://api.x.ai/v1)
55
# GROK_BASE_URL=https://api.x.ai/v1
66

7-
# Optional: Default model (default: grok-4-1-fast-reasoning)
8-
# GROK_MODEL=grok-4-1-fast-reasoning
7+
# Optional: Default model (default: grok-4.3)
8+
# GROK_MODEL=grok-4.3
99

1010
# Optional: Max tokens per response (default: 16384)
1111
# GROK_MAX_TOKENS=16384

README.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# grok-cli an open-source coding agent for the Grok API
1+
# grok-cli: an open-source coding agent for the Grok API
22

3-
[![CI](https://github.com/superagent-ai/grok-cli/actions/workflows/typecheck.yml/badge.svg)](https://github.com/superagent-ai/grok-cli/actions/workflows/typecheck.yml)
4-
[![npm](https://img.shields.io/npm/v/grok-dev.svg)](https://www.npmjs.com/package/grok-dev)
5-
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
6-
[![TypeScript](https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7-
[![Bun](https://img.shields.io/badge/Bun-1.x-000000?logo=bun&logoColor=white)](https://bun.sh/)
3+
[CI](https://github.com/superagent-ai/grok-cli/actions/workflows/typecheck.yml)
4+
[npm](https://www.npmjs.com/package/grok-dev)
5+
[License: MIT](./LICENSE)
6+
[TypeScript](https://www.typescriptlang.org/)
7+
[Bun](https://bun.sh/)
88

99
> **Disclaimer:** This project is community-built, open-source, and **not affiliated with, endorsed by, or sponsored by xAI Corp.** "Grok" is a trademark of xAI Corp. This tool uses the publicly available Grok API.
1010
@@ -173,19 +173,19 @@ You keep using a text model for the session, and Grok saves generated media unde
173173

174174
| Thing | What it means |
175175
| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
176-
| **Built for the Grok API** | Defaults tuned for the xAI API; models like `**grok-code-fast-1`**, `**grok-4-1-fast-reasoning**`, `**grok-4.20-multi-agent-0309**`, plus flagship and fast variants—run `grok models` for the full menu. |
177-
| **X + web search** | `**search_x`** and `**search_web**` tools—live posts and docs without pretending the internet stopped in 2023. |
178-
| **Media generation** | Built-in `**generate_image`** and `**generate_video**` tools for text-to-image, image editing, text-to-video, and image-to-video flows. Generated files are saved locally so you can reuse them after the xAI URLs expire. |
179-
| **Sub-agents (default behavior)** | Foreground `**task`** delegation (e.g. explore, general, or computer) plus background `**delegate**` for read-only deep dives—parallelize like you mean it. |
180-
| **Verify** | `**/verify`** or `**--verify**` — inspects your app, builds, tests, boots it, and runs browser smoke checks in a sandboxed environment. Screenshots and video included. |
181-
| **Computer use** | Built-in `**computer`** sub-agent for host desktop automation via `**agent-desktop**`. It prefers semantic accessibility snapshots and stable refs, with screenshots saved under `**.grok/computer/**` when requested. |
182-
| **Custom sub-agents** | Define named agents with `**subAgents`** in `**~/.grok/user-settings.json**` and manage them from the TUI with `**/agents**`. |
176+
| **Built for the Grok API** | Defaults tuned for the xAI API; models like `grok-4.3`, `grok-4.20-non-reasoning`, `grok-4.20-multi-agent-0309`, plus current flagship and multi-agent variants—run `grok models` for the full menu. |
177+
| **X + web search** | `**search_x`** and `**search_web`** tools—live posts and docs without pretending the internet stopped in 2023. |
178+
| **Media generation** | Built-in `**generate_image`** and `**generate_video`** tools for text-to-image, image editing, text-to-video, and image-to-video flows. Generated files are saved locally so you can reuse them after the xAI URLs expire. |
179+
| **Sub-agents (default behavior)** | Foreground `**task`** delegation (e.g. explore, general, or computer) plus background `**delegate`** for read-only deep dives—parallelize like you mean it. |
180+
| **Verify** | `**/verify`** or `**--verify`** — inspects your app, builds, tests, boots it, and runs browser smoke checks in a sandboxed environment. Screenshots and video included. |
181+
| **Computer use** | Built-in `**computer`** sub-agent for host desktop automation via `**agent-desktop`**. It prefers semantic accessibility snapshots and stable refs, with screenshots saved under `**.grok/computer/**` when requested. |
182+
| **Custom sub-agents** | Define named agents with `**subAgents`** in `**~/.grok/user-settings.json`** and manage them from the TUI with `**/agents**`. |
183183
| **Remote control** | Pair **Telegram** from the TUI (`/remote-control` → Telegram): DM your bot, `**/pair`**, approve the code in-terminal. Keep the CLI running while you ping it from your phone. |
184184
| **No “mystery meat” UI** | OpenTUI React terminal UI—fast, keyboard-driven, not whatever glitchy thing you’re thinking of. |
185-
| **Skills** | Agent Skills under `**.agents/skills/<name>/SKILL.md`** (project) or `**~/.agents/skills/**` (user). Use `**/skills**` in the TUI to list what’s installed. |
186-
| **MCPs** | Extend with Model Context Protocol servers—configure via `**/mcps`** in the TUI or `**.grok/settings.json**` (`mcpServers`). |
185+
| **Skills** | Agent Skills under `**.agents/skills/<name>/SKILL.md`** (project) or `**~/.agents/skills/`** (user). Use `**/skills**` in the TUI to list what’s installed. |
186+
| **MCPs** | Extend with Model Context Protocol servers—configure via `**/mcps`** in the TUI or `**.grok/settings.json`** (`mcpServers`). |
187187
| **Sessions** | Conversations persist; `**--session latest`** picks up where you left off. |
188-
| **Headless** | `**--prompt`** / `**-p**` for non-interactive runs—pipe it, script it, bench it. |
188+
| **Headless** | `**--prompt`** / `**-p`** for non-interactive runs—pipe it, script it, bench it. |
189189
| **Hackable** | TypeScript, clear agent loop, bash-first tools—fork it, shamelessly. |
190190

191191

@@ -228,7 +228,7 @@ Optional `**subAgents**` — custom foreground sub-agents. Each entry needs `**n
228228
"subAgents": [
229229
{
230230
"name": "security-review",
231-
"model": "grok-code-fast-1",
231+
"model": "grok-4.3",
232232
"instruction": "Prioritize security implications and suggest concrete fixes."
233233
}
234234
]
@@ -320,7 +320,7 @@ Hook commands receive JSON on **stdin** (event details) and can return JSON on *
320320

321321
## Instructions & project brain
322322

323-
- `**AGENTS.md**` — merged from git root down to your cwd (Codex-style; see repo docs). `**AGENTS.override.md**` wins per directory when present.
323+
- `**AGENTS.md`** — merged from git root down to your cwd (Codex-style; see repo docs). `**AGENTS.override.md**` wins per directory when present.
324324

325325
---
326326

@@ -350,7 +350,7 @@ All settings are saved in `~/.grok/user-settings.json` (user) and `.grok/setting
350350

351351
### Verify
352352

353-
Run `**/verify`** in the TUI or `**--verify**` on the CLI to verify your app locally:
353+
Run `**/verify`** in the TUI or `**--verify`** on the CLI to verify your app locally:
354354

355355
```bash
356356
grok --verify
@@ -370,6 +370,7 @@ Common issues and solutions:
370370
**Install script fails on macOS**
371371

372372
Make sure you have a modern shell and `curl` available:
373+
373374
```bash
374375
# Verify curl is installed
375376
which curl
@@ -381,6 +382,7 @@ bash -c "$(curl -fsSL https://raw.githubusercontent.com/superagent-ai/grok-cli/m
381382
**Bun not found**
382383

383384
The install script bundles Bun, but if you want to use your own:
385+
384386
```bash
385387
curl -fsSL https://bun.sh/install | bash
386388
bun add -g grok-dev
@@ -391,6 +393,7 @@ bun add -g grok-dev
391393
**"Missing GROK_API_KEY" error**
392394

393395
Set your API key using one of these methods:
396+
394397
```bash
395398
# Environment variable
396399
export GROK_API_KEY=your_key_here
@@ -406,6 +409,7 @@ Get your API key from [x.ai](https://x.ai).
406409
**UI doesn't render correctly**
407410

408411
Try a different terminal emulator. Recommended:
412+
409413
- WezTerm (cross-platform)
410414
- Alacritty (cross-platform)
411415
- Ghostty (macOS/Linux)
@@ -439,7 +443,7 @@ If you're on Intel Mac or Linux, sandbox mode is not available. Use standard mod
439443
**Slow response times**
440444

441445
- Check your network connection to x.ai API
442-
- Try `grok-code-fast-1` model for faster responses
446+
- Try `grok-4.20-non-reasoning` for non-reasoning workloads
443447
- Reduce `--max-tool-rounds` for headless runs
444448

445449
**High memory usage**
@@ -487,4 +491,4 @@ bun run lint
487491

488492
## License
489493

490-
MIT
494+
MIT

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "grok-dev",
3-
"version": "1.1.5",
3+
"version": "1.1.6",
44
"description": "An open-source AI coding agent powered by Grok, built with Bun and OpenTUI.",
55
"type": "module",
66
"main": "dist/index.js",

src/agent/agent.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ import { containsEncryptedReasoning, sanitizeModelMessages } from "./reasoning";
9999
import { buildVisionUserMessages } from "./vision-input";
100100

101101
const MAX_TOOL_ROUNDS = 400;
102-
const VISION_MODEL = "grok-4-1-fast-reasoning";
103-
const COMPUTER_MODEL = "grok-4.20-0309-reasoning";
102+
const VISION_MODEL = "grok-4.3";
103+
const COMPUTER_MODEL = "grok-4.3";
104104

105105
interface AgentOptions {
106106
persistSession?: boolean;
@@ -1839,7 +1839,8 @@ export class Agent {
18391839
await this.fireHook(promptInput, signal).catch(() => {});
18401840

18411841
await this.consumeBackgroundNotifications();
1842-
const userModelMessage: ModelMessage = { role: "user", content: userMessage };
1842+
const userModelMessages = await buildVisionUserMessages(userMessage, this.bash.getCwd(), signal);
1843+
const userModelMessage = userModelMessages[0] ?? ({ role: "user", content: userMessage } satisfies ModelMessage);
18431844
this.messages.push(userModelMessage);
18441845
this.messageSeqs.push(null);
18451846

src/agent/batch-mode.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ describe("Agent batch mode", () => {
109109
childMessages: [{ role: "user", content: "Do the thing" }],
110110
childSystem: "system",
111111
childRuntime: {
112-
modelId: "grok-4-1-fast-reasoning",
112+
modelId: "grok-4.3",
113113
modelInfo: {
114114
supportsClientTools: false,
115115
supportsMaxOutputTokens: true,

src/agent/recap.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ async function importAgentModuleWithRecapMocks() {
55

66
const generateRecap = vi.fn(async () => ({
77
recap: "Recovered the latest session state.",
8-
modelId: "grok-4-1-fast-non-reasoning",
8+
modelId: "grok-4.20-non-reasoning",
99
usage: {
1010
inputTokens: 10,
1111
outputTokens: 4,
@@ -83,7 +83,7 @@ describe("Agent session recap", () => {
8383
workspaceId: "workspace-1",
8484
title: null,
8585
recap: null,
86-
model: "grok-4-1-fast",
86+
model: "grok-4.3",
8787
mode: "agent",
8888
cwdAtStart: process.cwd(),
8989
cwdLast: process.cwd(),

src/agent/vision-input.test.ts

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,24 @@ describe("buildVisionUserMessages", () => {
3636
text: `Validate the image at ${imagePath}`,
3737
});
3838
});
39+
40+
it("recognizes shell-escaped screenshot paths", async () => {
41+
const imageName = "Screenshot 2026-05-06 at 10.02.18.png";
42+
const imagePath = path.join(tempDir, imageName);
43+
fs.writeFileSync(imagePath, Buffer.from([1, 2, 3, 4]));
44+
const escapedPath = path.join(tempDir, "Screenshot\\ 2026-05-06\\ at\\ 10.02.18.png");
45+
46+
const messages = await buildVisionUserMessages(`${escapedPath}\nExplain this image`, tempDir);
47+
48+
const content = messages[0]?.content as Array<Record<string, unknown>>;
49+
expect(content[0]).toMatchObject({
50+
type: "file",
51+
mediaType: "image/png",
52+
});
53+
expect(content[0]?.data).toBeInstanceOf(Uint8Array);
54+
expect(content[1]).toMatchObject({
55+
type: "text",
56+
text: `${escapedPath}\nExplain this image`,
57+
});
58+
});
3959
});

src/grok/client.test.ts

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
44
import * as settings from "../utils/settings";
55
import { generateRecap, resolveModelRuntime } from "./client";
66

7-
const mockGenerateText = vi.fn();
7+
const mockGenerateText = vi.hoisted(() => vi.fn());
88

99
vi.mock("ai", () => {
1010
return {
@@ -34,7 +34,7 @@ describe("client", () => {
3434

3535
expect(result).toEqual({
3636
recap: "Wrapped up the parser fix. Next step is wiring the new recap banner.",
37-
modelId: "grok-4-1-fast-non-reasoning",
37+
modelId: "grok-4.20-non-reasoning",
3838
usage: { inputTokens: 11, outputTokens: 7, totalTokens: 18 },
3939
});
4040
expect(mockGenerateText).toHaveBeenCalledWith(
@@ -54,7 +54,7 @@ describe("client", () => {
5454

5555
expect(result).toEqual({
5656
recap: "",
57-
modelId: "grok-4-1-fast-non-reasoning",
57+
modelId: "grok-4.20-non-reasoning",
5858
});
5959
});
6060
});
@@ -67,21 +67,21 @@ describe("client", () => {
6767
expect(runtime.providerOptions).toBeUndefined();
6868
});
6969

70-
it("does not include providerOptions for grok-4-0709 even though it has reasoning flag", () => {
70+
it("normalizes retired flagship reasoning models to grok-4.3", () => {
7171
const runtime = resolveModelRuntime(mockProvider, "grok-4-0709");
72-
expect(runtime.modelId).toBe("grok-4-0709");
72+
expect(runtime.modelId).toBe("grok-4.3");
7373
expect(runtime.providerOptions).toBeUndefined();
7474
});
7575

76-
it("does not include providerOptions for grok-code-fast-1", () => {
76+
it("normalizes retired code models to grok-4.3", () => {
7777
const runtime = resolveModelRuntime(mockProvider, "grok-code-fast-1");
78-
expect(runtime.modelId).toBe("grok-code-fast-1");
78+
expect(runtime.modelId).toBe("grok-4.3");
7979
expect(runtime.providerOptions).toBeUndefined();
8080
});
8181

82-
it("does not include providerOptions for grok-4-1-fast-reasoning", () => {
82+
it("normalizes retired fast reasoning models to grok-4.3", () => {
8383
const runtime = resolveModelRuntime(mockProvider, "grok-4-1-fast-reasoning");
84-
expect(runtime.modelId).toBe("grok-4-1-fast-reasoning");
84+
expect(runtime.modelId).toBe("grok-4.3");
8585
expect(runtime.providerOptions).toBeUndefined();
8686
});
8787

@@ -91,9 +91,9 @@ describe("client", () => {
9191
expect(runtime.providerOptions).toBeUndefined();
9292
});
9393

94-
it("does not include providerOptions for grok-3", () => {
94+
it("normalizes retired non-reasoning models to grok-4.20-non-reasoning", () => {
9595
const runtime = resolveModelRuntime(mockProvider, "grok-3");
96-
expect(runtime.modelId).toBe("grok-3");
96+
expect(runtime.modelId).toBe("grok-4.20-non-reasoning");
9797
expect(runtime.providerOptions).toBeUndefined();
9898
});
9999
});
@@ -129,24 +129,24 @@ describe("client", () => {
129129
});
130130
});
131131

132-
it("does not include providerOptions for grok-4-0709 even when effort is configured", () => {
132+
it("does not include providerOptions for retired reasoning aliases even when effort is configured", () => {
133133
vi.spyOn(settings, "getReasoningEffortForModel").mockReturnValue("high");
134134
const runtime = resolveModelRuntime(mockProvider, "grok-4-0709");
135-
expect(runtime.modelId).toBe("grok-4-0709");
135+
expect(runtime.modelId).toBe("grok-4.3");
136136
expect(runtime.providerOptions).toBeUndefined();
137137
});
138138

139-
it("does not include providerOptions for grok-code-fast-1 even when effort is configured", () => {
139+
it("does not include providerOptions for retired code aliases even when effort is configured", () => {
140140
vi.spyOn(settings, "getReasoningEffortForModel").mockReturnValue("high");
141141
const runtime = resolveModelRuntime(mockProvider, "grok-code-fast-1");
142-
expect(runtime.modelId).toBe("grok-code-fast-1");
142+
expect(runtime.modelId).toBe("grok-4.3");
143143
expect(runtime.providerOptions).toBeUndefined();
144144
});
145145

146-
it("does not include providerOptions for grok-4-1-fast-reasoning even when effort is configured", () => {
146+
it("does not include providerOptions for grok-4.3 even when effort is configured", () => {
147147
vi.spyOn(settings, "getReasoningEffortForModel").mockReturnValue("high");
148-
const runtime = resolveModelRuntime(mockProvider, "grok-4-1-fast-reasoning");
149-
expect(runtime.modelId).toBe("grok-4-1-fast-reasoning");
148+
const runtime = resolveModelRuntime(mockProvider, "grok-4.3");
149+
expect(runtime.modelId).toBe("grok-4.3");
150150
expect(runtime.providerOptions).toBeUndefined();
151151
});
152152
});

0 commit comments

Comments
 (0)