Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/cool-carpets-clap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@transloadit/node': patch
'@transloadit/mcp-server': patch
'transloadit': patch
---

Document and test `gpt-image-2` support in the image generation intent flow.
5 changes: 4 additions & 1 deletion packages/node/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
| Flag | Type | Required | Example | Description |
| --- | --- | --- | --- | --- |
| `--prompt` | `string` | yes | `"A red bicycle in a studio"` | The prompt describing the desired image content. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. Backend-supported models include gpt-image-2 and Google Nano Banana variants. |
| `--format` | `string` | no | `jpg` | Format of the generated image. |
| `--seed` | `number` | no | — | Seed for the random number generator. |
| `--aspect-ratio` | `string` | no | — | Aspect ratio of the generated image. |
Expand All @@ -250,6 +250,8 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
```bash
# Generate an image from text
transloadit image generate --prompt "A red bicycle in a studio" --output output.png
# Generate with OpenAI gpt-image-2
transloadit image generate --model gpt-image-2 --width 1024 --height 1024 --prompt "A ceramic coffee mug on a white sweep" --output output.png
# Guide generation with one input image
transloadit image generate --input subject.jpg --prompt "Place subject.jpg on a magazine cover" --output output.png
# Guide generation with multiple input images
Expand Down Expand Up @@ -1860,3 +1862,4 @@ See [CONTRIBUTING](./CONTRIBUTING.md).




4 changes: 3 additions & 1 deletion packages/node/docs/intent-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
| Flag | Type | Required | Example | Description |
| --- | --- | --- | --- | --- |
| `--prompt` | `string` | yes | `"A red bicycle in a studio"` | The prompt describing the desired image content. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. Backend-supported models include gpt-image-2 and Google Nano Banana variants. |
| `--format` | `string` | no | `jpg` | Format of the generated image. |
| `--seed` | `number` | no | — | Seed for the random number generator. |
| `--aspect-ratio` | `string` | no | — | Aspect ratio of the generated image. |
Expand All @@ -118,6 +118,8 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
```bash
# Generate an image from text
transloadit image generate --prompt "A red bicycle in a studio" --output output.png
# Generate with OpenAI gpt-image-2
transloadit image generate --model gpt-image-2 --width 1024 --height 1024 --prompt "A ceramic coffee mug on a white sweep" --output output.png
# Guide generation with one input image
transloadit image generate --input subject.jpg --prompt "Place subject.jpg on a magazine cover" --output output.png
# Guide generation with multiple input images
Expand Down
8 changes: 4 additions & 4 deletions packages/node/src/alphalib/types/robots/image-generate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Best practice:
.string()
.optional()
.describe(
'The AI model to use. Defaults to google/nano-banana. Supported models include flux-1.1-pro-ultra, flux-schnell, recraft-v3, google/nano-banana, google/nano-banana-2, google/nano-banana-pro, and stability-ai/stable-diffusion-inpainting.',
'The AI model to use. Defaults to google/nano-banana. Supported models include flux-1.1-pro-ultra, flux-schnell, recraft-v3, google/nano-banana, google/nano-banana-2, google/nano-banana-pro, gpt-image-2, and stability-ai/stable-diffusion-inpainting.',
),
prompt: z
.string()
Expand All @@ -96,7 +96,7 @@ Best practice:
.enum(['jpeg', 'jpg', 'png', 'gif', 'webp', 'svg'])
.optional()
.describe(
'Output format. Defaults depend on model: png for Google models, svg for recraft-v3, jpeg for others. Google models currently return PNG only.',
'Output format. Defaults depend on model: png for Google models and gpt-image-2, svg for recraft-v3, jpeg for others. Google models currently return PNG only.',
),
seed: z.number().optional().describe('Seed for the random number generator.'),
aspect_ratio: z
Expand All @@ -108,11 +108,11 @@ Best practice:
height: z
.number()
.optional()
.describe('Requested output height in pixels (mainly used by Google image models).'),
.describe('Requested output height in pixels (mainly used by Google image models and gpt-image-2).'),
width: z
.number()
.optional()
.describe('Requested output width in pixels (mainly used by Google image models).'),
.describe('Requested output width in pixels (mainly used by Google image models and gpt-image-2).'),
style: z.string().optional().describe('Style of the generated image.'),
num_outputs: z
.number()
Expand Down
6 changes: 5 additions & 1 deletion packages/node/src/cli/semanticIntents/imageGenerate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ const imageGenerateOptionDefinitions = [
kind: 'string',
propertyName: 'model',
optionFlags: '--model',
description: `The AI model to use for image generation. Defaults to ${defaultImageGenerateModel}.`,
description: `The AI model to use for image generation. Defaults to ${defaultImageGenerateModel}. Backend-supported models include gpt-image-2 and Google Nano Banana variants.`,
required: false,
exampleValue: defaultImageGenerateModel,
},
Expand Down Expand Up @@ -93,6 +93,10 @@ const imageGenerateCommandPresentation = {
'Generate an image from text',
'transloadit image generate --prompt "A red bicycle in a studio" --output output.png',
],
[
'Generate with OpenAI gpt-image-2',
'transloadit image generate --model gpt-image-2 --width 1024 --height 1024 --prompt "A ceramic coffee mug on a white sweep" --output output.png',
],
[
'Guide generation with one input image',
'transloadit image generate --input subject.jpg --prompt "Place subject.jpg on a magazine cover" --output output.png',
Expand Down
38 changes: 38 additions & 0 deletions packages/node/test/unit/cli/intents.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,44 @@ describe('intent commands', () => {
)
})

it('passes through gpt-image-2 and explicit dimensions for image generate', async () => {
const { createSpy } = await runIntentCommand([
'image',
'generate',
'--prompt',
'A ceramic coffee mug on a white sweep',
'--model',
'gpt-image-2',
'--width',
'1024',
'--height',
'1024',
'--format',
'png',
'--output',
'generated.png',
])

expect(process.exitCode).toBeUndefined()
expect(createSpy).toHaveBeenCalledWith(
expect.any(OutputCtl),
expect.anything(),
expect.objectContaining({
stepsData: {
generate: expect.objectContaining({
robot: '/image/generate',
model: 'gpt-image-2',
prompt: 'A ceramic coffee mug on a white sweep',
width: 1024,
height: 1024,
format: 'png',
result: true,
}),
},
}),
)
})

it('bundles image generate inputs into a single /image/generate step', async () => {
const { createSpy } = await runIntentCommand([
'image',
Expand Down
7 changes: 7 additions & 0 deletions packages/node/test/unit/robots.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,11 @@ describe('robot catalog helpers', () => {
expect(Array.isArray(help.examples)).toBe(true)
expect(help.examples?.length).toBeGreaterThan(0)
})

it('includes gpt-image-2 in /image/generate model help text', () => {
const help = getRobotHelp({ robotName: '/image/generate', detailLevel: 'full' })
const modelParam = help.optionalParams.find((param) => param.name === 'model')

expect(modelParam?.description).toContain('gpt-image-2')
})
})
5 changes: 4 additions & 1 deletion packages/transloadit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
| Flag | Type | Required | Example | Description |
| --- | --- | --- | --- | --- |
| `--prompt` | `string` | yes | `"A red bicycle in a studio"` | The prompt describing the desired image content. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. |
| `--model` | `string` | no | `google/nano-banana-2` | The AI model to use for image generation. Defaults to google/nano-banana-2. Backend-supported models include gpt-image-2 and Google Nano Banana variants. |
| `--format` | `string` | no | `jpg` | Format of the generated image. |
| `--seed` | `number` | no | — | Seed for the random number generator. |
| `--aspect-ratio` | `string` | no | — | Aspect ratio of the generated image. |
Expand All @@ -250,6 +250,8 @@ npx transloadit image generate [--input <path|dir|url|->] [options]
```bash
# Generate an image from text
transloadit image generate --prompt "A red bicycle in a studio" --output output.png
# Generate with OpenAI gpt-image-2
transloadit image generate --model gpt-image-2 --width 1024 --height 1024 --prompt "A ceramic coffee mug on a white sweep" --output output.png
# Guide generation with one input image
transloadit image generate --input subject.jpg --prompt "Place subject.jpg on a magazine cover" --output output.png
# Guide generation with multiple input images
Expand Down Expand Up @@ -1860,3 +1862,4 @@ See [CONTRIBUTING](./CONTRIBUTING.md).