Skip to content

Commit 974a18d

Browse files
improvement(media-blocks): new versions of image and video gen with latest models + fixes (#4667)
* improvement(media-blocks): new versions of image and video gen with latest models + fixes * respect versioning for icons * fix integration routes * address comments * address api mismatches * more ltx 2.3 durations * typing tightness
1 parent e40c915 commit 974a18d

25 files changed

Lines changed: 3260 additions & 282 deletions

File tree

apps/docs/components/ui/icon-mapping.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,8 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
268268
extend_v2: ExtendIcon,
269269
fathom: FathomIcon,
270270
file: DocumentIcon,
271+
file_v2: DocumentIcon,
272+
file_v3: DocumentIcon,
271273
file_v4: DocumentIcon,
272274
findymail: FindymailIcon,
273275
firecrawl: FirecrawlIcon,
@@ -313,6 +315,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
313315
iam: IAMIcon,
314316
identity_center: IdentityCenterIcon,
315317
image_generator: ImageIcon,
318+
image_generator_v2: ImageIcon,
316319
imap: MailServerIcon,
317320
incidentio: IncidentioIcon,
318321
infisical: InfisicalIcon,
@@ -345,6 +348,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
345348
microsoft_planner: MicrosoftPlannerIcon,
346349
microsoft_teams: MicrosoftTeamsIcon,
347350
mistral_parse: MistralIcon,
351+
mistral_parse_v2: MistralIcon,
348352
mistral_parse_v3: MistralIcon,
349353
monday: MondayIcon,
350354
mongodb: MongoDBIcon,
@@ -427,6 +431,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
427431
vercel: VercelIcon,
428432
video_generator: VideoIcon,
429433
video_generator_v2: VideoIcon,
434+
video_generator_v3: VideoIcon,
430435
vision: EyeIcon,
431436
vision_v2: EyeIcon,
432437
wealthbox: WealthboxIcon,

apps/docs/content/docs/en/tools/image_generator.mdx

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -6,63 +6,75 @@ description: Generate images
66
import { BlockInfoCard } from "@/components/ui/block-info-card"
77

88
<BlockInfoCard
9-
type="image_generator"
9+
type="image_generator_v2"
1010
color="#4D5FFF"
1111
/>
1212

1313
{/* MANUAL-CONTENT-START:intro */}
14-
[DALL-E](https://openai.com/dall-e-3) is OpenAI's advanced AI system designed to generate realistic images and art from natural language descriptions. As a state-of-the-art image generation model, DALL-E can create detailed and creative visuals based on text prompts, allowing users to transform their ideas into visual content without requiring artistic skills.
14+
The Image Generator block creates images from text prompts using leading image generation providers. Choose OpenAI for GPT Image models, Google Gemini for Nano Banana models, or Fal.ai for a multi-model catalog that includes Nano Banana, GPT Image, Seedream, FLUX, and Grok Imagine.
1515

16-
With DALL-E, you can:
16+
Use it to:
1717

18-
- **Generate realistic images**: Create photorealistic visuals from textual descriptions
19-
- **Design conceptual art**: Transform abstract ideas into visual representations
20-
- **Produce variations**: Generate multiple interpretations of the same prompt
21-
- **Control artistic style**: Specify artistic styles, mediums, and visual aesthetics
22-
- **Create detailed scenes**: Describe complex scenes with multiple elements and relationships
23-
- **Visualize products**: Generate product mockups and design concepts
24-
- **Illustrate ideas**: Turn written concepts into visual illustrations
18+
- **Generate production images**: Create polished visuals from workflow prompts
19+
- **Choose the right provider**: Route requests to OpenAI, Gemini, or Fal.ai based on model availability and cost
20+
- **Control output shape**: Set provider-specific size, aspect ratio, resolution, quality, background, and output format options
21+
- **Use advanced Fal.ai features**: Configure safety tolerance, safety checking, web search grounding, seeds, and thinking level when supported
22+
- **Pass generated files downstream**: Use the returned image file or URL in later workflow steps
2523

26-
In Sim, the DALL-E integration enables your agents to generate images programmatically as part of their workflows. This allows for powerful automation scenarios such as content creation, visual design, and creative ideation. Your agents can formulate detailed prompts, generate corresponding images, and incorporate these visuals into their outputs or downstream processes. This integration bridges the gap between natural language processing and visual content creation, enabling your agents to communicate not just through text but also through compelling imagery. By connecting Sim with DALL-E, you can create agents that produce visual content on demand, illustrate concepts, generate design assets, and enhance user experiences with rich visual elements - all without requiring human intervention in the creative process.
24+
In Sim, the Image Generator block lets agents create visual assets programmatically as part of automated workflows. This is useful for content creation, design mockups, product visuals, creative ideation, and any flow that needs generated imagery without a manual handoff.
2725
{/* MANUAL-CONTENT-END */}
2826

2927

3028
## Usage Instructions
3129

32-
Integrate Image Generator into the workflow. Can generate images using DALL-E 3, GPT Image 1, or GPT Image 2.
30+
Generate images using OpenAI GPT Image, Google Nano Banana, or Fal.ai image models.
3331

3432

3533

3634
## Tools
3735

38-
### `openai_image`
36+
### `image_generate`
3937

40-
Generate images using OpenAI
38+
Generate images with OpenAI GPT Image, Google Nano Banana, or Fal.ai image models
4139

4240
#### Input
4341

4442
| Parameter | Type | Required | Description |
4543
| --------- | ---- | -------- | ----------- |
46-
| `model` | string | Yes | The model to use \(dall-e-3, gpt-image-1, or gpt-image-2\) |
47-
| `prompt` | string | Yes | A text description of the desired image |
48-
| `size` | string | Yes | Image size. dall-e-3: 1024x1024, 1024x1792, or 1792x1024. gpt-image-1: auto, 1024x1024, 1536x1024, or 1024x1536. gpt-image-2: auto or any size with edges ≤3840px and multiples of 16 \(e.g. 1024x1024, 1536x1024, 1024x1536, 2560x1440, 3840x2160\). |
49-
| `quality` | string | No | Quality. dall-e-3: standard\|hd. gpt-image-1/gpt-image-2: auto\|low\|medium\|high |
50-
| `style` | string | No | The style of the image \(vivid or natural\), only for dall-e-3 |
51-
| `background` | string | No | Background. gpt-image-1: auto\|transparent\|opaque. gpt-image-2: auto\|opaque \(transparent not supported\) |
52-
| `outputFormat` | string | No | Output image format \(png, jpeg, webp\), only for gpt-image-1 and gpt-image-2 |
53-
| `moderation` | string | No | Moderation level \(auto or low\), only for gpt-image-1 and gpt-image-2 |
54-
| `n` | number | No | The number of images to generate \(1-10\) |
55-
| `apiKey` | string | Yes | Your OpenAI API key |
44+
| `provider` | string | Yes | Image generation provider: openai, gemini, or falai |
45+
| `apiKey` | string | Yes | Provider API key |
46+
| `model` | string | Yes | Provider model ID, such as gpt-image-1.5, gemini-3.1-flash-image-preview, or nano-banana-2 |
47+
| `prompt` | string | Yes | Text prompt describing the image to generate |
48+
| `size` | string | No | Provider-specific image size |
49+
| `aspectRatio` | string | No | Aspect ratio, such as auto, 1:1, 16:9, or 9:16 |
50+
| `resolution` | string | No | Provider-specific image resolution, such as 1K, 2K, 4K, 1k, or 2k |
51+
| `quality` | string | No | Provider-specific image quality |
52+
| `background` | string | No | Background setting when supported |
53+
| `outputFormat` | string | No | Output image format: png, jpeg, or webp where supported |
54+
| `moderation` | string | No | OpenAI moderation level: auto or low |
55+
| `safetyTolerance` | string | No | Fal.ai safety tolerance when supported |
56+
| `numImages` | number | No | Number of images to generate, subject to provider limits |
57+
| `seed` | number | No | Random seed when supported |
58+
| `enableSafetyChecker` | boolean | No | Enable the Fal.ai safety checker when supported |
59+
| `enableWebSearch` | boolean | No | Enable web search grounding when supported by the selected Fal.ai model |
60+
| `thinkingLevel` | string | No | Fal.ai thinking level when supported: minimal or high |
5661

5762
#### Output
5863

5964
| Parameter | Type | Description |
6065
| --------- | ---- | ----------- |
61-
| `success` | boolean | Operation success status |
62-
| `output` | object | Generated image data |
63-
|`content` | string | Image URL or identifier |
64-
|`image` | string | Base64 encoded image data |
65-
|`metadata` | object | Image generation metadata |
66-
|`model` | string | Model used for image generation |
66+
| `content` | string | Generated image URL or identifier |
67+
| `image` | file | Generated image file |
68+
| `imageUrl` | string | Generated image URL |
69+
| `provider` | string | Provider used |
70+
| `model` | string | Model used |
71+
| `metadata` | json | Generation metadata |
72+
|`provider` | string | Provider used |
73+
|`model` | string | Model used |
74+
|`description` | string | Provider description |
75+
|`revisedPrompt` | string | Revised prompt |
76+
|`seed` | number | Seed used for generation |
77+
|`jobId` | string | Provider job ID |
78+
|`contentType` | string | Image MIME type |
6779

6880

apps/docs/content/docs/en/tools/video_generator.mdx

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,37 +6,35 @@ description: Generate videos from text using AI
66
import { BlockInfoCard } from "@/components/ui/block-info-card"
77

88
<BlockInfoCard
9-
type="video_generator_v2"
9+
type="video_generator_v3"
1010
color="#181C1E"
1111
/>
1212

1313
{/* MANUAL-CONTENT-START:intro */}
14-
Create videos from text prompts using cutting-edge AI models from top providers. Sim's Video Generator brings powerful, creative video synthesis capabilities to your workflow—supporting diverse models, aspect ratios, resolutions, camera controls, native audio, and advanced style and consistency features.
14+
Create videos from text prompts using leading AI video providers. Sim's Video Generator supports direct provider integrations for Runway, Google Veo, Luma, and MiniMax, plus a Fal.ai multi-model provider for newer and specialized models.
1515

1616
**Supported Providers & Models:**
1717

18-
- **[Runway Gen-4](https://research.runwayml.com/gen2/)** (Runway ML):
19-
Runway is a pioneer in text-to-video generation, known for powerful models like Gen-2, Gen-3, and Gen-4. The latest [Gen-4](https://research.runwayml.com/gen2/) model (and Gen-4 Turbo for faster results) supports more realistic motion, greater world consistency, and visual references for character, object, style, and location. Supports 16:9, 9:16, and 1:1 aspect ratios, 5–10 second durations, up to 4K resolution, style presets, and direct upload of reference images for consistent generations. Runway powers creative tools for filmmakers, studios, and content creators worldwide.
18+
- **[Runway Gen-4](https://docs.dev.runwayml.com/)**: Generate image-to-video clips with a required reference image, 5 or 10 second durations, and landscape, portrait, or square output.
2019

21-
- **[Google Veo](https://deepmind.google/technologies/veo/)** (Google DeepMind):
22-
[Veo](https://deepmind.google/technologies/veo/) is Google’s next-generation video generation model, offering high-quality, native-audio videos up to 1080p and 16 seconds. Supports advanced motion, cinematic effects, and nuanced text understanding. Veo can generate videos with built-in sound—activating native audio as well as silent clips. Options include 16:9 aspect, variable duration, different models (veo-3, veo-3.1), and prompt-based controls. Ideal for storytelling, advertising, research, and ideation.
20+
- **[Google Veo](https://ai.google.dev/gemini-api/docs/video)**: Generate text-to-video clips with Veo 3 and Veo 3.1 models, portrait or landscape aspect ratios, 4, 6, or 8 second durations, and 720p or 1080p output.
2321

24-
- **[Luma Dream Machine](https://lumalabs.ai/dream-machine)** (Luma AI):
25-
[Dream Machine](https://lumalabs.ai/dream-machine) delivers jaw-droppingly realistic and fluid video from text. It incorporates advanced camera control, cinematography prompts, and supports both ray-1 and ray-2 models. Dream Machine supports precise aspect ratios (16:9, 9:16, 1:1), variable durations, and the specification of camera paths for intricate visual direction. Luma is renowned for breakthrough visual fidelity and is backed by leading AI vision researchers.
22+
- **[Luma Dream Machine](https://docs.lumalabs.ai/docs/video-generation)**: Generate Ray 2 videos with 5 or 9 second durations, common aspect ratios, multiple resolutions, and optional camera concept controls.
2623

27-
- **[MiniMax Hailuo-02](https://minimax.chat/)** (via [Fal.ai](https://fal.ai/)):
28-
[MiniMax Hailuo-02](https://minimax.chat/) is a sophisticated Chinese generative video model, available globally through [Fal.ai](https://fal.ai/). Generate videos up to 16 seconds in landscape or portrait format, with options for prompt optimization to improve clarity and creativity. Pro and standard endpoints available, supporting high resolutions (up to 1920×1080). Well-suited for creative projects needing prompt translation and optimization, commercial storytelling, and rapid prototyping of visual ideas.
24+
- **[MiniMax Hailuo](https://platform.minimax.io/docs/api-reference/video-generation-t2v)**: Generate Hailuo 2.3 or Hailuo-02 videos through MiniMax's platform API, with standard or pro quality endpoints and prompt optimization.
25+
26+
- **[Fal.ai Multi-Model](https://fal.ai/docs/model-api-reference/video-generation-api/overview)**: Access Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0 and O3, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported Fal.ai models from one provider option.
2927

3028
**How to Choose:**
31-
Pick your provider and model based on your needs for quality, speed, duration, audio, cost, and unique features. Runway and Veo offer world-leading realism and cinematic capabilities; Luma excels in fluid motion and camera control; MiniMax is ideal for Chinese-language prompts and offers fast, affordable access. Consider reference support, style presets, audio requirements, and pricing when selecting your tool.
29+
Pick the provider and model based on quality, speed, duration, audio support, reference image needs, resolution, and cost. Runway is best when you have a visual reference, Veo and Luma are strong general text-to-video options, MiniMax offers a direct Hailuo API path, and Fal.ai is the best choice when you need access to the broadest model catalog.
3230

3331
For more details on features, restrictions, pricing, and model advances, see each provider’s official documentation above.
3432
{/* MANUAL-CONTENT-END */}
3533

3634

3735
## Usage Instructions
3836

39-
Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.
37+
Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.
4038

4139

4240

@@ -141,9 +139,10 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced
141139
| --------- | ---- | -------- | ----------- |
142140
| `provider` | string | Yes | Video provider \(minimax\) |
143141
| `apiKey` | string | Yes | MiniMax API key from platform.minimax.io |
144-
| `model` | string | No | MiniMax model: hailuo-02 \(default\) |
142+
| `model` | string | No | MiniMax model: hailuo-2.3 \(default\) or hailuo-02 |
145143
| `prompt` | string | Yes | Text prompt describing the video to generate |
146144
| `duration` | number | No | Video duration in seconds \(6 or 10, default: 6\) |
145+
| `endpoint` | string | No | Quality endpoint: standard \(768P\) or pro \(1080P for 6s videos\) |
147146
| `promptOptimizer` | boolean | No | Enable prompt optimization for better results \(default: true\) |
148147

149148
#### Output
@@ -161,20 +160,21 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced
161160

162161
### `video_falai`
163162

164-
Generate videos using Fal.ai platform with access to multiple models including Veo 3.1, Sora 2, Kling 2.5, MiniMax Hailuo, and more
163+
Generate videos using Fal.ai with access to Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported models
165164

166165
#### Input
167166

168167
| Parameter | Type | Required | Description |
169168
| --------- | ---- | -------- | ----------- |
170169
| `provider` | string | Yes | Video provider \(falai\) |
171170
| `apiKey` | string | Yes | Fal.ai API key |
172-
| `model` | string | Yes | Fal.ai model: veo-3.1 \(Google Veo 3.1\), sora-2 \(OpenAI Sora 2\), kling-2.5-turbo-pro \(Kling 2.5 Turbo Pro\), kling-2.1-pro \(Kling 2.1 Master\), minimax-hailuo-2.3-pro \(MiniMax Hailuo Pro\), minimax-hailuo-2.3-standard \(MiniMax Hailuo Standard\), wan-2.1 \(WAN T2V\), ltxv-0.9.8 \(LTXV 13B\) |
171+
| `model` | string | Yes | Fal.ai model: veo-3.1, veo-3.1-fast, sora-2, sora-2-pro, seedance-2.0, seedance-2.0-fast, kling-v3-pro, kling-v3-4k, kling-o3-pro, kling-o3-4k, minimax-hailuo-2.3-pro, minimax-hailuo-2.3-standard, wan-2.2-a14b-turbo, ltx-2.3, ltx-2.3-fast, plus previously supported model IDs |
173172
| `prompt` | string | Yes | Text prompt describing the video to generate |
174173
| `duration` | number | No | Video duration in seconds \(varies by model\) |
175174
| `aspectRatio` | string | No | Aspect ratio \(varies by model\): 16:9, 9:16, 1:1 |
176-
| `resolution` | string | No | Video resolution \(varies by model\): 540p, 720p, 1080p |
175+
| `resolution` | string | No | Video resolution \(varies by model\): 480p, 580p, 720p, 1080p, true_1080p, 1440p, 2160p, 4k |
177176
| `promptOptimizer` | boolean | No | Enable prompt optimization for MiniMax models \(default: true\) |
177+
| `generateAudio` | boolean | No | Generate native audio when supported by the selected Fal.ai model |
178178

179179
#### Output
180180

apps/sim/app/(landing)/integrations/data/icon-mapping.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
302302
hunter: HunterIOIcon,
303303
iam: IAMIcon,
304304
identity_center: IdentityCenterIcon,
305-
image_generator: ImageIcon,
305+
image_generator_v2: ImageIcon,
306306
imap: MailServerIcon,
307307
incidentio: IncidentioIcon,
308308
infisical: InfisicalIcon,
@@ -404,7 +404,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
404404
typeform: TypeformIcon,
405405
upstash: UpstashIcon,
406406
vercel: VercelIcon,
407-
video_generator_v2: VideoIcon,
407+
video_generator_v3: VideoIcon,
408408
vision_v2: EyeIcon,
409409
wealthbox: WealthboxIcon,
410410
webflow: WebflowIcon,

apps/sim/app/(landing)/integrations/data/integrations.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6749,11 +6749,11 @@
67496749
"tags": ["enrichment", "sales-engagement"]
67506750
},
67516751
{
6752-
"type": "image_generator",
6752+
"type": "image_generator_v2",
67536753
"slug": "image-generator",
67546754
"name": "Image Generator",
67556755
"description": "Generate images",
6756-
"longDescription": "Integrate Image Generator into the workflow. Can generate images using DALL-E 3, GPT Image 1, or GPT Image 2.",
6756+
"longDescription": "Generate images using OpenAI GPT Image, Google Nano Banana, or Fal.ai image models.",
67576757
"bgColor": "#4D5FFF",
67586758
"iconName": "ImageIcon",
67596759
"docsUrl": "https://docs.sim.ai/tools/image_generator",
@@ -14236,14 +14236,14 @@
1423614236
"tags": ["cloud", "ci-cd"]
1423714237
},
1423814238
{
14239-
"type": "video_generator_v2",
14239+
"type": "video_generator_v3",
1424014240
"slug": "video-generator",
1424114241
"name": "Video Generator",
1424214242
"description": "Generate videos from text using AI",
14243-
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.",
14243+
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.",
1424414244
"bgColor": "#181C1E",
1424514245
"iconName": "VideoIcon",
14246-
"docsUrl": "https://docs.sim.ai/tools/video-generator",
14246+
"docsUrl": "https://docs.sim.ai/tools/video_generator",
1424714247
"operations": [],
1424814248
"operationCount": 0,
1424914249
"triggers": [],

0 commit comments

Comments
 (0)