You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
improvement(media-blocks): new versions of image and video gen with latest models + fixes (#4667)
* improvement(media-blocks): new versions of image and video gen with latest models + fixes
* respect versioning for icons
* fix integration routes
* address comments
* address api mismatches
* more ltx 2.3 durations
* typing tightness
[DALL-E](https://openai.com/dall-e-3) is OpenAI's advanced AI system designed to generate realistic images and art from natural language descriptions. As a state-of-the-art image generation model, DALL-E can create detailed and creative visuals based on text prompts, allowing users to transform their ideas into visual content without requiring artistic skills.
14
+
The Image Generator block creates images from text prompts using leading image generation providers. Choose OpenAI for GPT Image models, Google Gemini for Nano Banana models, or Fal.ai for a multi-model catalog that includes Nano Banana, GPT Image, Seedream, FLUX, and Grok Imagine.
15
15
16
-
With DALL-E, you can:
16
+
Use it to:
17
17
18
-
-**Generate realistic images**: Create photorealistic visuals from textual descriptions
19
-
-**Design conceptual art**: Transform abstract ideas into visual representations
20
-
-**Produce variations**: Generate multiple interpretations of the same prompt
21
-
-**Control artistic style**: Specify artistic styles, mediums, and visual aesthetics
22
-
-**Create detailed scenes**: Describe complex scenes with multiple elements and relationships
23
-
-**Visualize products**: Generate product mockups and design concepts
24
-
-**Illustrate ideas**: Turn written concepts into visual illustrations
18
+
-**Generate production images**: Create polished visuals from workflow prompts
19
+
-**Choose the right provider**: Route requests to OpenAI, Gemini, or Fal.ai based on model availability and cost
20
+
-**Control output shape**: Set provider-specific size, aspect ratio, resolution, quality, background, and output format options
21
+
-**Use advanced Fal.ai features**: Configure safety tolerance, safety checking, web search grounding, seeds, and thinking level when supported
22
+
-**Pass generated files downstream**: Use the returned image file or URL in later workflow steps
25
23
26
-
In Sim, the DALL-E integration enables your agents to generate images programmatically as part of their workflows. This allows for powerful automation scenarios such as content creation, visual design, and creative ideation. Your agents can formulate detailed prompts, generate corresponding images, and incorporate these visuals into their outputs or downstream processes. This integration bridges the gap between natural language processing and visual content creation, enabling your agents to communicate not just through text but also through compelling imagery. By connecting Sim with DALL-E, you can create agents that produce visual content on demand, illustrate concepts, generate design assets, and enhance user experiences with rich visual elements - all without requiring human intervention in the creative process.
24
+
In Sim, the Image Generator block lets agents create visual assets programmatically as part of automated workflows. This is useful for content creation, design mockups, product visuals, creative ideation, and any flow that needs generated imagery without a manual handoff.
27
25
{/* MANUAL-CONTENT-END */}
28
26
29
27
30
28
## Usage Instructions
31
29
32
-
Integrate Image Generator into the workflow. Can generate images using DALL-E 3, GPT Image 1, or GPT Image 2.
30
+
Generate images using OpenAI GPT Image, Google Nano Banana, or Fal.ai image models.
33
31
34
32
35
33
36
34
## Tools
37
35
38
-
### `openai_image`
36
+
### `image_generate`
39
37
40
-
Generate images using OpenAI
38
+
Generate images with OpenAI GPT Image, Google Nano Banana, or Fal.ai image models
41
39
42
40
#### Input
43
41
44
42
| Parameter | Type | Required | Description |
45
43
| --------- | ---- | -------- | ----------- |
46
-
|`model`| string | Yes | The model to use \(dall-e-3, gpt-image-1, or gpt-image-2\)|
47
-
|`prompt`| string | Yes | A text description of the desired image |
48
-
|`size`| string | Yes | Image size. dall-e-3: 1024x1024, 1024x1792, or 1792x1024. gpt-image-1: auto, 1024x1024, 1536x1024, or 1024x1536. gpt-image-2: auto or any size with edges ≤3840px and multiples of 16 \(e.g. 1024x1024, 1536x1024, 1024x1536, 2560x1440, 3840x2160\). |
Create videos from text prompts using cutting-edge AI models from top providers. Sim's Video Generator brings powerful, creative video synthesis capabilities to your workflow—supporting diverse models, aspect ratios, resolutions, camera controls, native audio, and advanced style and consistency features.
14
+
Create videos from text prompts using leading AI video providers. Sim's Video Generator supports direct provider integrations for Runway, Google Veo, Luma, and MiniMax, plus a Fal.ai multi-model provider for newer and specialized models.
Runway is a pioneer in text-to-video generation, known for powerful models like Gen-2, Gen-3, and Gen-4. The latest [Gen-4](https://research.runwayml.com/gen2/) model (and Gen-4 Turbo for faster results) supports more realistic motion, greater world consistency, and visual references for character, object, style, and location. Supports 16:9, 9:16, and 1:1 aspect ratios, 5–10 second durations, up to 4K resolution, style presets, and direct upload of reference images for consistent generations. Runway powers creative tools for filmmakers, studios, and content creators worldwide.
18
+
-**[Runway Gen-4](https://docs.dev.runwayml.com/)**: Generate image-to-video clips with a required reference image, 5 or 10 second durations, and landscape, portrait, or square output.
[Veo](https://deepmind.google/technologies/veo/) is Google’s next-generation video generation model, offering high-quality, native-audio videos up to 1080p and 16 seconds. Supports advanced motion, cinematic effects, and nuanced text understanding. Veo can generate videos with built-in sound—activating native audio as well as silent clips. Options include 16:9 aspect, variable duration, different models (veo-3, veo-3.1), and prompt-based controls. Ideal for storytelling, advertising, research, and ideation.
20
+
-**[Google Veo](https://ai.google.dev/gemini-api/docs/video)**: Generate text-to-video clips with Veo 3 and Veo 3.1 models, portrait or landscape aspect ratios, 4, 6, or 8 second durations, and 720p or 1080p output.
[Dream Machine](https://lumalabs.ai/dream-machine) delivers jaw-droppingly realistic and fluid video from text. It incorporates advanced camera control, cinematography prompts, and supports both ray-1 and ray-2 models. Dream Machine supports precise aspect ratios (16:9, 9:16, 1:1), variable durations, and the specification of camera paths for intricate visual direction. Luma is renowned for breakthrough visual fidelity and is backed by leading AI vision researchers.
22
+
-**[Luma Dream Machine](https://docs.lumalabs.ai/docs/video-generation)**: Generate Ray 2 videos with 5 or 9 second durations, common aspect ratios, multiple resolutions, and optional camera concept controls.
[MiniMax Hailuo-02](https://minimax.chat/) is a sophisticated Chinese generative video model, available globally through [Fal.ai](https://fal.ai/). Generate videos up to 16 seconds in landscape or portrait format, with options for prompt optimization to improve clarity and creativity. Pro and standard endpoints available, supporting high resolutions (up to 1920×1080). Well-suited for creative projects needing prompt translation and optimization, commercial storytelling, and rapid prototyping of visual ideas.
24
+
-**[MiniMax Hailuo](https://platform.minimax.io/docs/api-reference/video-generation-t2v)**: Generate Hailuo 2.3 or Hailuo-02 videos through MiniMax's platform API, with standard or pro quality endpoints and prompt optimization.
25
+
26
+
-**[Fal.ai Multi-Model](https://fal.ai/docs/model-api-reference/video-generation-api/overview)**: Access Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0 and O3, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported Fal.ai models from one provider option.
29
27
30
28
**How to Choose:**
31
-
Pick your provider and model based on your needs for quality, speed, duration, audio, cost, and unique features. Runway and Veo offer world-leading realism and cinematic capabilities; Luma excels in fluid motion and camera control; MiniMax is ideal for Chinese-language prompts and offers fast, affordable access. Consider reference support, style presets, audio requirements, and pricing when selecting your tool.
29
+
Pick the provider and model based on quality, speed, duration, audio support, reference image needs, resolution, and cost. Runway is best when you have a visual reference, Veo and Luma are strong general text-to-video options, MiniMax offers a direct Hailuo API path, and Fal.ai is the best choice when you need access to the broadest model catalog.
32
30
33
31
For more details on features, restrictions, pricing, and model advances, see each provider’s official documentation above.
34
32
{/* MANUAL-CONTENT-END */}
35
33
36
34
37
35
## Usage Instructions
38
36
39
-
Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.
37
+
Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.
40
38
41
39
42
40
@@ -141,9 +139,10 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced
141
139
| --------- | ---- | -------- | ----------- |
142
140
|`provider`| string | Yes | Video provider \(minimax\)|
143
141
|`apiKey`| string | Yes | MiniMax API key from platform.minimax.io |
144
-
|`model`| string | No | MiniMax model: hailuo-02\(default\)|
142
+
|`model`| string | No | MiniMax model: hailuo-2.3\(default\) or hailuo-02|
145
143
|`prompt`| string | Yes | Text prompt describing the video to generate |
146
144
|`duration`| number | No | Video duration in seconds \(6 or 10, default: 6\)|
145
+
|`endpoint`| string | No | Quality endpoint: standard \(768P\) or pro \(1080P for 6s videos\)|
147
146
|`promptOptimizer`| boolean | No | Enable prompt optimization for better results \(default: true\)|
148
147
149
148
#### Output
@@ -161,20 +160,21 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced
161
160
162
161
### `video_falai`
163
162
164
-
Generate videos using Fal.ai platform with access to multiple models including Veo 3.1, Sora 2, Kling 2.5, MiniMax Hailuo, and more
163
+
Generate videos using Fal.ai with access to Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported models
165
164
166
165
#### Input
167
166
168
167
| Parameter | Type | Required | Description |
169
168
| --------- | ---- | -------- | ----------- |
170
169
|`provider`| string | Yes | Video provider \(falai\)|
"description": "Generate videos from text using AI",
14243
-
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.",
14243
+
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.",
0 commit comments