Skip to content

Commit b0b2d36

Browse files
authored
Merge pull request #147 from owndev/145-add-support-for-gemini-veo-models
Add support for Gemini Veo models
2 parents fd39f05 + 4f05f10 commit b0b2d36

3 files changed

Lines changed: 744 additions & 15 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,7 @@ The functions include a built-in encryption mechanism for sensitive information:
165165
- **Configurable Parameters**: Environment variables for image optimization (quality, max dimensions, format conversion).
166166
- **Multi-Image History**: Configurable history image limit, hash-based deduplication, and automatic `[Image N]` labels so the model can reference earlier images.
167167
- **Image Generation (Gemini 3)**: Configurable aspect ratio (e.g. `16:9`, `1:1`) and resolution (`1K`/`2K`/`4K`) for Gemini 3 image models; per-user valve overrides supported.
168+
- **Video Generation (Veo)**: Generate videos with Google Veo models (3.1, 3, 2). Configurable aspect ratio, resolution, duration, negative prompt, and person generation controls. Supports text-to-video and image-to-video for all supported Veo models. Videos are automatically uploaded and embedded with playback controls.
168169
- **Token Usage Tracking**: Returns prompt, completion, and total token counts to Open WebUI for automatic saving to the database.
169170
- **Model Whitelist & Additional Models**: Restrict the visible model list via `GOOGLE_MODEL_WHITELIST` and add SDK-unsupported models via `GOOGLE_MODEL_ADDITIONAL`.
170171
- Grounding with Google search with [google_search_tool.py filter](./filters/google_search_tool.py)

docs/google-gemini-integration.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ This integration enables **Open WebUI** to interact with **Google Gemini** model
3737
- **Advanced Image Generation**
3838
Support for text-to-image and image-to-image generation with Gemini 2.5 Flash Image Preview models.
3939

40+
- **Video Generation with Google Veo**
41+
Generate videos using Veo 3.1, 3, and 2 models with configurable aspect ratio, resolution, duration, and more. Supports text-to-video and image-to-video (Veo 3.1). Videos are automatically uploaded and embedded with playback controls.
42+
4043
- **Flexible Error Handling**
4144
Retries failed requests and logs errors for transparency.
4245

@@ -339,6 +342,116 @@ for part in response.parts:
339342
| Other gemini-3-\* models | ❌ Not image generation models |
340343
| Other models | ❌ Not image generation models |
341344

345+
## Video Generation Configuration
346+
347+
The Google Gemini pipeline supports video generation using **Google Veo models** (Veo 3.1, 3, and 2). Veo models appear automatically in the model list with a 🎬 indicator.
348+
349+
> [!IMPORTANT]
350+
> Video generation uses a different API path than text/image generation. Requests are **always non-streaming** — the pipeline submits a video generation job, polls for completion, and returns the result with embedded video playback.
351+
352+
### Supported Models
353+
354+
| Model ID | Description |
355+
| --------------------------------- | ------------------------------------- |
356+
| `veo-3.1-generate-preview` | Veo 3.1 — highest quality, 4k, reference images |
357+
| `veo-3.1-fast-generate-preview` | Veo 3.1 Fast — faster generation |
358+
| `veo-3-generate-preview` | Veo 3 — balanced quality |
359+
| `veo-3.0-fast-generate-001` | Veo 3 Fast |
360+
| `veo-2.0-generate-001` | Veo 2 — legacy model |
361+
362+
### Per-Model Feature Support
363+
364+
Not all parameters are supported by every Veo model. The pipeline automatically gates features based on the model used. Unsupported parameters are silently skipped to avoid API errors.
365+
366+
| Feature | Veo 3.1 | Veo 3.1 Fast | Veo 3 | Veo 3 Fast | Veo 2 |
367+
| -------------------- | ----------------- | ----------------- | ----------------- | ----------------- | ----------------- |
368+
| Aspect Ratio | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16 |
369+
| Resolution | 720p, 1080p, 4k | 720p, 1080p, 4k | 720p, 1080p | 720p, 1080p ||
370+
| Duration (seconds) | 4, 6, 8 | 4, 6, 8 | 8 only | 8 only | 5, 6, 8 |
371+
| Negative Prompt | Yes | Yes | Yes | Yes | Yes |
372+
| Person Generation | Yes | Yes | Yes | Yes | Yes |
373+
| Enhance Prompt | Yes || Yes |||
374+
| Image-to-Video | Yes | Yes | Yes | Yes | Yes |
375+
| Reference Images | ⚠️ API only¹ | ⚠️ API only¹ ||||
376+
| Last Frame (interp.) | ⚠️ Not yet² | ⚠️ Not yet² | ⚠️ Not yet² | ⚠️ Not yet² | ⚠️ Not yet² |
377+
| Video Extension | ⚠️ Not yet² | ⚠️ Not yet² ||||
378+
| Audio | Native | Native | Native | Native | Silent only |
379+
| Max Videos/Request | 1 | 1 | 1 | 1 | 2 |
380+
381+
> ¹ The Veo API supports up to 3 reference images for Veo 3.1, but the pipeline currently only forwards a single attached image via the `image` parameter.
382+
>
383+
> ² Last-frame interpolation and video extension are Veo API capabilities not yet exposed by the pipeline.
384+
385+
### Environment Variables
386+
387+
```bash
388+
# Default aspect ratio for videos (16:9 landscape or 9:16 portrait)
389+
# Supported by: all Veo models
390+
# Default: "default" (API decides)
391+
GOOGLE_VIDEO_GENERATION_ASPECT_RATIO="default"
392+
393+
# Default video resolution (720p, 1080p, or 4k)
394+
# Supported by: Veo 3.1/3 only (ignored for Veo 2; 4k only on Veo 3.1)
395+
# Default: "default" (API decides)
396+
GOOGLE_VIDEO_GENERATION_RESOLUTION="default"
397+
398+
# Default video duration in seconds
399+
# Veo 3.1: 4, 6, 8 | Veo 3: 8 only | Veo 2: 5, 6, 8
400+
# Default: "default" (API decides)
401+
GOOGLE_VIDEO_GENERATION_DURATION="default"
402+
403+
# Negative prompt — describes what to avoid in the generated video
404+
# Supported by: all Veo models
405+
# Default: "" (empty)
406+
GOOGLE_VIDEO_GENERATION_NEGATIVE_PROMPT=""
407+
408+
# Controls generation of people in videos
409+
# Valid values: "allow_all", "allow_adult", "dont_allow"
410+
# Default: "default" (API decides)
411+
GOOGLE_VIDEO_GENERATION_PERSON_GENERATION="default"
412+
413+
# Enable prompt enhancement for video generation
414+
# Supported by: Veo 3.1 and Veo 3 (non-Fast variants only; ignored for Fast models and Veo 2)
415+
# Default: true
416+
GOOGLE_VIDEO_GENERATION_ENHANCE_PROMPT=true
417+
418+
# Polling interval in seconds when waiting for video generation
419+
# Default: 10
420+
GOOGLE_VIDEO_POLL_INTERVAL=10
421+
422+
# Maximum time in seconds to wait for video generation before timing out
423+
# Set to 0 to disable timeout (not recommended)
424+
# Default: 600
425+
GOOGLE_VIDEO_POLL_TIMEOUT=600
426+
```
427+
428+
### User-Configurable Settings
429+
430+
Users can override the following settings per-user via Open WebUI valve overrides:
431+
432+
- **Aspect Ratio**: `VIDEO_GENERATION_ASPECT_RATIO`
433+
- **Resolution**: `VIDEO_GENERATION_RESOLUTION`
434+
- **Duration**: `VIDEO_GENERATION_DURATION`
435+
436+
### Image-to-Video
437+
438+
Attach an image to your message when using any Veo model to use it as the starting frame for video generation. The pipeline automatically detects attached images and passes the first one to the Veo API via the `image` parameter.
439+
440+
> [!NOTE]
441+
> All Veo models support single-image image-to-video. **Multi-reference images** (up to 3 style/content guides, Veo 3.1 only) and **last-frame interpolation** are Veo API capabilities not yet exposed by the pipeline.
442+
443+
### How It Works
444+
445+
1. Select a Veo model (marked with 🎬) from the model list
446+
2. Type your video description prompt
447+
3. Optionally attach an image for image-to-video (supported by all Veo models)
448+
4. The pipeline submits the request and shows polling status updates
449+
5. Once complete, the video is uploaded to Open WebUI and embedded with a `<video>` player
450+
451+
### Vertex AI Note
452+
453+
When using Vertex AI, video download via `files.download()` is not available. If the Veo API returns a GCS URI instead of raw bytes, the current pipeline does not yet surface that URI or attach the video output in the chat. You may need to retrieve the generated video directly from Vertex AI or the underlying GCS bucket.
454+
342455
## Model Configuration
343456

344457
The Google Gemini pipeline provides two complementary mechanisms for controlling which models appear in the model list: `MODEL_ADDITIONAL` and `MODEL_WHITELIST`.

0 commit comments

Comments
 (0)