Skip to content

Commit 9deb545

Browse files
CypherNaught-0xPfannkuchensacklstein
authored
External models (Gemini Nano Banana & OpenAI GPT Image) (#8633) (#8884)
* feat: initial external model support * feat: support reference images for external models * fix: sorting lint error * chore: hide Reidentify button for external models * review: enable auto-install/remove fro external models * feat: show external mode name during install * review: model descriptions * review: implemented review comments * review: added optional seed control for external models * chore: fix linter warning * review: save api keys to a seperate file * docs: updated external model docs * chore: fix linter errors * fix: sync configured external starter models on startup * feat(ui): add provider-specific external generation nodes * feat: expose external panel schemas in model configs * feat(ui): drive external panels from panel schema * docs: sync app config docstring order * feat: add gemini 3.1 flash image preview starter model * feat: update gemini image model limits * fix: resolve TypeScript errors and move external provider config to api_keys.yaml Add 'external', 'external_image_generator', and 'external_api' to Zod enum schemas (zBaseModelType, zModelType, zModelFormat) to match the generated OpenAPI types. Remove redundant union workarounds from component prop types and Record definitions. Fix type errors in ModelEdit (react-hook-form Control invariance), parsing.tsx (model identifier narrowing), buildExternalGraph (edge typing), and ModelSettings import/export buttons. Move external_gemini_base_url and external_openai_base_url into api_keys.yaml alongside the API keys so all external provider config lives in one dedicated file, separate from invokeai.yaml. * feat: add resolution presets and imageConfig support for Gemini 3 models Add combined resolution preset selector for external models that maps aspect ratio + image size to fixed dimensions. Gemini 3 Pro and 3.1 Flash now send imageConfig (aspectRatio + imageSize) via generationConfig instead of text-based aspect ratio hints used by Gemini 2.5 Flash. Backend: ExternalResolutionPreset model, resolution_presets capability field, image_size on ExternalGenerationRequest, and Gemini provider imageConfig logic. Frontend: ExternalSettingsAccordion with combo resolution select, dimension slider disabling for fixed-size models, and panel schema constraint wiring for Steps/Guidance/Seed controls. * Remove unused external model fields and add provider-specific parameters - Remove negative_prompt, steps, guidance, reference_image_weights, reference_image_modes from external model nodes (unused by any provider) - Remove supports_negative_prompt, supports_steps, supports_guidance from ExternalModelCapabilities - Add provider_options dict to ExternalGenerationRequest for provider-specific parameters - Add OpenAI-specific fields: quality, background, input_fidelity - Add Gemini-specific fields: temperature, thinking_level - Add new OpenAI starter models: GPT Image 1.5, GPT Image 1 Mini, DALL-E 3, DALL-E 2 - Fix OpenAI provider to use output_format (GPT Image) vs response_format (DALL-E) and send model ID in requests - Add fixed aspect ratio sizes for OpenAI models (bucketing) - Add ExternalProviderRateLimitError with retry logic for 429 responses - Add provider-specific UI components in ExternalSettingsAccordion - Simplify ParamSteps/ParamGuidance by removing dead external overrides - Update all backend and frontend tests * Chore Ruff check & format * Chore typegen * feat: full canvas workflow integration for external models - Add missing aspect ratios (4:5, 5:4, 8:1, 4:1, 1:4, 1:8) to type system for external model support - Sync canvas bbox when external model resolution preset is selected - Use params preset dimensions in buildExternalGraph to prevent "unsupported aspect ratio" errors - Lock all bbox controls (resize handles, aspect ratio select, width/height sliders, swap/optimal buttons) for external models with fixed dimension presets - Disable denoise strength slider for external models (not applicable) - Sync bbox aspect ratio changes back to paramsSlice for external models - Initialize bbox dimensions when switching to an external model * Chore typegen Linux seperator * feat: full canvas workflow integration for external models - Update buildExternalGraph test to include dimensions in mock params * Merge remote-tracking branch 'upstream/main' into external-models * Chore pnpm fix * add missing parameter * docs: add External Models guide with Gemini and OpenAI provider pages * fix(external-models): address PR review feedback - Gemini recall: write temperature, thinking_level, image_size to image metadata; wire external graph as metadata receiver; add recall handlers. - Canvas: gate regional guidance, inpaint mask, and control layer for external models. - Canvas: throw a clear error on outpainting for external models (was falling back to inpaint and hitting an API-side mask/image size mismatch). - Workflow editor: add ui_model_provider_id filter so OpenAI and Gemini nodes only list their own provider's models. - Workflow editor: silently drop seed when the selected model does not support it instead of raising a capability error. - Remove the legacy external_image_generation invocation and the graph-builder fallback; providers must register a dedicated node. - Regenerate schema.ts. - remove Gemini debug dumps to outputs/external_debug * fix(external-models): resolve TSC errors in metadata parsing and external graph - Export imageSizeChanged from paramsSlice (required by the new ImageSize recall handler). - Emit the external graph's metadata model entry via zModelIdentifierField since ExternalApiModelConfig is not part of the AnyModelConfig union. * chore: prettier format ModelIdentifierFieldInputComponent * fix: remove unsupported thinkingConfig from Gemini image models and restrict GPT Image models to txt2img * chore typegen * chore(docs): regenerate settings.json for external provider fields * fix(external): fix mask handling and mode support for external providers - Remove img2img and inpaint modes from Gemini models (Gemini has no bitmap mask or dedicated edit API; image editing works via reference images in the UI) - Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha channel transparency (OpenAI expects transparent=edit area) and convert init image to RGBA when mask is present * fix(external): update mode support and UI for external providers - Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026) - Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint) - Set Gemini models to txt2img only (no mask/edit API; editing via ref images) - Hide mode/init_image/mask_image fields on Gemini node (not usable) - Hide mask_image field on OpenAI node (no model supports inpaint) * Chore typegen * fix(external): improve OpenAI node UX and disable cache by default - Hide OpenAI node's mode and init_image fields: OpenAI's API has no img2img/inpaint distinction (the edits endpoint is invoked automatically when reference images are provided). init_image is functionally identical to a reference image and was misleading users. - Default use_cache to False for external image generation nodes: external API calls are non-deterministic and incur usage costs. Cache hits returned stale image references that did not produce new gallery entries on repeat invokes. * fix(external): duplicate cached images on cache hit instead of skipping External image generation nodes use the standard invocation cache, but returning the cached output (with stale image_name references) on cache hits resulted in no new gallery entries — the Invoke button would spin indefinitely on repeat invokes with identical parameters. Override invoke_internal so that on cache hit, the cached images are loaded and re-saved as new gallery entries. The expensive API call is still skipped (cost saving), but the user sees a new image as expected. * Chore typegen + ruff * CHore ruff format * fix(external): restore OpenAI advanced settings on Remix recall Remix recall iterates through ImageMetadataHandlers but only Gemini's temperature handler was wired up — OpenAI's quality, background, and input_fidelity were stored in image metadata but never parsed back into the params slice. Add the three missing handlers so Remix restores these settings as expected. --------- Co-authored-by: Alexander Eichhorn <alex@eichhorn.dev> Co-authored-by: Alexander Eichhorn <alex@code-with.us> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
1 parent f621bc8 commit 9deb545

140 files changed

Lines changed: 7420 additions & 399 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs-old/contributing/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ We welcome contributions, whether features, bug fixes, code cleanup, testing, co
88

99
If you’d like to help with development, please see our [development guide](contribution_guides/development.md).
1010

11+
## External Providers
12+
13+
If you are adding external image generation providers or configs, see our [external provider integration guide](EXTERNAL_PROVIDERS.md).
14+
1115
**New Contributors:** If you’re unfamiliar with contributing to open source projects, take a look at our [new contributor guide](contribution_guides/newContributorChecklist.md).
1216

1317
## Nodes
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# External Provider Integration
2+
3+
This guide covers:
4+
5+
1. Adding a new **external model** (most common; existing provider).
6+
2. Adding a brand-new **external provider** (adapter + config + UI wiring).
7+
8+
## 1) Add a New External Model (Existing Provider)
9+
10+
For provider-backed models (for example, OpenAI or Gemini), the source of truth is
11+
`invokeai/backend/model_manager/starter_models.py`.
12+
13+
### Required model fields
14+
15+
Define a `StarterModel` with:
16+
17+
- `base=BaseModelType.External`
18+
- `type=ModelType.ExternalImageGenerator`
19+
- `format=ModelFormat.ExternalApi`
20+
- `source="external://<provider_id>/<provider_model_id>"`
21+
- `name`, `description`
22+
- `capabilities=ExternalModelCapabilities(...)`
23+
- optional `default_settings=ExternalApiModelDefaultSettings(...)`
24+
25+
Example:
26+
27+
```python
28+
new_external_model = StarterModel(
29+
name="Provider Model Name",
30+
base=BaseModelType.External,
31+
source="external://openai/my-model-id",
32+
description=(
33+
"Provider model (external API). "
34+
"Requires a configured OpenAI API key and may incur provider usage costs."
35+
),
36+
type=ModelType.ExternalImageGenerator,
37+
format=ModelFormat.ExternalApi,
38+
capabilities=ExternalModelCapabilities(
39+
modes=["txt2img", "img2img", "inpaint"],
40+
supports_negative_prompt=False,
41+
supports_seed=False,
42+
supports_guidance=False,
43+
supports_steps=False,
44+
supports_reference_images=True,
45+
max_images_per_request=4,
46+
),
47+
default_settings=ExternalApiModelDefaultSettings(
48+
width=1024,
49+
height=1024,
50+
num_images=1,
51+
),
52+
)
53+
```
54+
55+
Then append it to `STARTER_MODELS`.
56+
57+
### Required description text
58+
59+
External starter model descriptions must clearly state:
60+
61+
- an API key is required
62+
- usage may incur provider-side costs
63+
64+
### Capabilities must be accurate
65+
66+
These flags directly control UI visibility and request payload fields:
67+
68+
- `supports_negative_prompt`
69+
- `supports_seed`
70+
- `supports_guidance`
71+
- `supports_steps`
72+
- `supports_reference_images`
73+
74+
`supports_steps` is especially important: if `False`, steps are hidden for that model and `steps` is sent as `null`.
75+
76+
### Source string stability
77+
78+
Starter overrides are matched by `source` (`external://provider/model-id`). Keep this stable:
79+
80+
- runtime capability/default overrides depend on it
81+
- installation detection in starter-model APIs depends on it
82+
83+
`STARTER_MODELS` enforces unique `source` values with an assertion.
84+
85+
### Install behavior notes
86+
87+
- External starter models are managed in **External Providers** setup (not the regular Starter Models tab).
88+
- External starter models auto-install when a provider is configured.
89+
- Removing a provider API key removes installed external models for that provider.
90+
91+
## 2) Credentials and Config
92+
93+
External provider API keys are stored separately from `invokeai.yaml`:
94+
95+
- default file: `~/invokeai/api_keys.yaml`
96+
- resolved path: `<INVOKEAI_ROOT>/api_keys.yaml`
97+
98+
Non-secret provider settings (for example base URL overrides) stay in `invokeai.yaml`.
99+
100+
Environment variables are still supported, e.g.:
101+
102+
- `INVOKEAI_EXTERNAL_GEMINI_API_KEY`
103+
- `INVOKEAI_EXTERNAL_OPENAI_API_KEY`
104+
105+
## 3) Add a New Provider (Only If Needed)
106+
107+
If your model uses a provider that is not already integrated:
108+
109+
1. Add config fields in `invokeai/app/services/config/config_default.py`
110+
`external_<provider>_api_key` and optional `external_<provider>_base_url`.
111+
2. Add provider field mapping in `invokeai/app/api/routers/app_info.py`
112+
(`EXTERNAL_PROVIDER_FIELDS`).
113+
3. Implement provider adapter in `invokeai/app/services/external_generation/providers/`
114+
by subclassing `ExternalProvider`.
115+
4. Register the provider in `invokeai/app/api/dependencies.py` when building
116+
`ExternalGenerationService`.
117+
5. Add starter model entries using `source="external://<provider>/<model-id>"`.
118+
6. Optional UI ordering tweak:
119+
`invokeai/frontend/web/src/features/modelManagerV2/subpanels/AddModelPanel/ExternalProviders/ExternalProvidersForm.tsx`
120+
(`PROVIDER_SORT_ORDER`).
121+
122+
## 4) Optional Manual Installation
123+
124+
You can also install external models directly via:
125+
126+
`POST /api/v2/models/install?source=external://<provider_id>/<provider_model_id>`
127+
128+
If omitted, `path`, `source`, and `hash` are auto-populated for external model configs.
129+
Set capabilities conservatively; the external generation service enforces capability checks at runtime.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Google Gemini
3+
---
4+
5+
# :material-google: Google Gemini
6+
7+
Invoke supports Google's Gemini image generation models through the Gemini API. This provider is a good fit if you want high-quality text-to-image and reference-based image edits without running a local model.
8+
9+
## Getting an API Key
10+
11+
1. Open [Google AI Studio](https://aistudio.google.com/) and sign in with your Google account.
12+
2. Generate a new API key.
13+
3. Note the key — it will only be shown once.
14+
15+
## Configuration
16+
17+
Add your key to `api_keys.yaml` in your Invoke root directory:
18+
19+
```yaml
20+
external_gemini_api_key: "your-gemini-api-key"
21+
22+
# Optional — only set this if you need to route requests through a different endpoint
23+
external_gemini_base_url: "https://generativelanguage.googleapis.com"
24+
```
25+
26+
Restart Invoke for the change to take effect.
27+
28+
## Available Models
29+
30+
| Model | Modes | Reference Images | Notes |
31+
| --- | --- | --- | --- |
32+
| **Gemini 2.5 Flash Image** | txt2img, img2img, inpaint | Yes | 10 aspect ratios, fixed per-ratio resolutions. |
33+
| **Gemini 3 Pro Image Preview** | txt2img, img2img, inpaint | Up to 14 (6 object + 5 character) | 1K / 2K / 4K resolution presets. |
34+
| **Gemini 3.1 Flash Image Preview** | txt2img, img2img, inpaint | Up to 14 (10 object + 4 character) | 512 / 1K / 2K / 4K resolution presets. |
35+
36+
All Gemini models are single-image-per-request — batch size is fixed at 1. To generate multiple variations, queue multiple invocations.
37+
38+
## Provider-Specific Options
39+
40+
Gemini exposes a **temperature** control in the parameters panel. Lower values make outputs more deterministic, higher values increase variability.
41+
42+
## Tips
43+
44+
- **Reference images** are sent directly to the API as inlined PNG data. Large references increase request latency and cost — crop tightly where possible.
45+
- **Aspect ratios** are mapped to the closest Gemini-supported ratio. For Gemini 3 models, use the resolution presets to stay at the provider's native output sizes and avoid unnecessary rescaling.
46+
- **Pricing** varies by model and region. Check Google's documentation before running large batches.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: External Models
3+
---
4+
5+
# :material-cloud-outline: External Models
6+
7+
External models let you generate images in Invoke by calling third-party image generation APIs instead of running a model locally. This is useful when:
8+
9+
- You don't have the GPU or VRAM to run a model locally.
10+
- You want access to closed-source models (e.g. GPT Image, Gemini).
11+
- You need a specific provider capability (very high resolutions, fast batches, bilingual text rendering, etc.).
12+
13+
External models appear in the model picker alongside locally installed models. Generations are routed to the provider's API, billed against your provider account, and the resulting images are imported back into Invoke like any other generation.
14+
15+
## Supported Providers
16+
17+
- [Google Gemini](gemini.md) — Gemini 2.5 Flash Image, Gemini 3 Pro Image Preview, Gemini 3.1 Flash Image Preview
18+
- [OpenAI](openai.md) — GPT Image 1 / 1.5 / 1-mini, DALL·E 3, DALL·E 2
19+
20+
## Configuring API Keys
21+
22+
External provider credentials are stored in a dedicated `api_keys.yaml` file alongside `invokeai.yaml` in your Invoke root directory.
23+
24+
```yaml
25+
# api_keys.yaml
26+
external_gemini_api_key: "your-gemini-api-key"
27+
external_openai_api_key: "your-openai-api-key"
28+
29+
# Optional: override the provider base URL (e.g. for a compatible proxy or regional endpoint)
30+
external_gemini_base_url: "https://generativelanguage.googleapis.com"
31+
external_openai_base_url: "https://api.openai.com"
32+
```
33+
34+
Restart Invoke after editing `api_keys.yaml` so the new values are picked up.
35+
36+
!!! warning "Keep your keys private"
37+
`api_keys.yaml` contains secrets. Do not commit it to version control and do not share it with other users of your machine.
38+
39+
## Installing External Models
40+
41+
External models are listed in the starter models dialog under their provider. Install them like any other starter model — Invoke records a model reference but does not download weights (there are no weights to download).
42+
43+
Once installed, external models show up everywhere a model can be selected. Choose one, set the usual parameters (prompt, dimensions, num images, etc.), and invoke as normal.
44+
45+
## Capabilities and Settings Visibility
46+
47+
Each external model declares its own **capabilities** — for example:
48+
49+
- Which generation modes it supports (`txt2img`, `img2img`, `inpaint`).
50+
- Whether it accepts reference images, and how many.
51+
- Which aspect ratios and resolutions it allows.
52+
- Whether it supports a negative prompt, seed, or batch size > 1.
53+
54+
Invoke uses these capabilities to drive the UI: only the settings a given model actually supports will be shown in the parameters panel. If a field you expect is missing, it's because the selected model does not support it.
55+
56+
## Costs and Rate Limits
57+
58+
External providers charge for each request. Check the provider's pricing page before running large batches. Rate-limit errors from the provider are surfaced in Invoke as generation failures — wait a moment and try again, or lower your concurrent batch size.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: OpenAI
3+
---
4+
5+
# :material-alpha-o-circle-outline: OpenAI
6+
7+
Invoke supports OpenAI's image generation models — both the GPT Image family and the older DALL·E models — through the OpenAI API.
8+
9+
## Getting an API Key
10+
11+
1. Open the [OpenAI API Platform](https://platform.openai.com/api-keys) and sign in.
12+
2. Create a new secret API key.
13+
3. Make sure your account has billing set up — image endpoints are paid per request.
14+
15+
## Configuration
16+
17+
Add your key to `api_keys.yaml` in your Invoke root directory:
18+
19+
```yaml
20+
external_openai_api_key: "sk-..."
21+
22+
# Optional — use this to point at a compatible proxy or Azure OpenAI deployment
23+
external_openai_base_url: "https://api.openai.com"
24+
```
25+
26+
Restart Invoke for the change to take effect.
27+
28+
## Available Models
29+
30+
| Model | Modes | Aspect Ratios | Batch | Notes |
31+
| --- | --- | --- | --- | --- |
32+
| **GPT Image 1.5** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | Fastest and cheapest GPT Image model. |
33+
| **GPT Image 1** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | Highest quality of the GPT Image family. |
34+
| **GPT Image 1 Mini** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | ~80% cheaper than GPT Image 1. |
35+
| **DALL·E 3** | txt2img only | 1:1, 7:4, 4:7 | 1 | No reference-image / edit support. |
36+
| **DALL·E 2** | txt2img, img2img, inpaint | 1:1 | up to 10 | Square only. |
37+
38+
## Provider-Specific Options
39+
40+
For **GPT Image** models, Invoke surfaces two provider-specific options in the parameters panel:
41+
42+
- **Quality** — `low`, `medium`, `high`, or `auto`. Higher quality costs more and takes longer.
43+
- **Background** — `auto`, `transparent`, or `opaque`. Use `transparent` for PNG output with an alpha channel.
44+
45+
DALL·E 2 and DALL·E 3 do not expose these options.
46+
47+
## How Requests Are Routed
48+
49+
- Pure text-to-image requests hit `/v1/images/generations`.
50+
- Any request with an init image or reference images is sent to `/v1/images/edits` instead. This is done transparently — you don't need to pick an endpoint.
51+
52+
## Tips
53+
54+
- **Batching** on GPT Image and DALL·E 2 tops out at 10 per request. Larger batches are split into multiple API calls.
55+
- **Costs** can climb quickly with high-quality GPT Image generations. Start with GPT Image 1 Mini when iterating on prompts.
56+
- **Rate limits** from OpenAI surface as failed invocations — retry after a short wait.

docs/src/generated/settings.json

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -723,6 +723,50 @@
723723
"required": false,
724724
"type": "<class 'bool'>",
725725
"validation": {}
726+
},
727+
{
728+
"category": "EXTERNAL PROVIDERS",
729+
"default": null,
730+
"description": "API key for Gemini image generation.",
731+
"env_var": "INVOKEAI_EXTERNAL_GEMINI_API_KEY",
732+
"literal_values": [],
733+
"name": "external_gemini_api_key",
734+
"required": false,
735+
"type": "typing.Optional[str]",
736+
"validation": {}
737+
},
738+
{
739+
"category": "EXTERNAL PROVIDERS",
740+
"default": null,
741+
"description": "API key for OpenAI image generation.",
742+
"env_var": "INVOKEAI_EXTERNAL_OPENAI_API_KEY",
743+
"literal_values": [],
744+
"name": "external_openai_api_key",
745+
"required": false,
746+
"type": "typing.Optional[str]",
747+
"validation": {}
748+
},
749+
{
750+
"category": "EXTERNAL PROVIDERS",
751+
"default": null,
752+
"description": "Base URL override for Gemini image generation.",
753+
"env_var": "INVOKEAI_EXTERNAL_GEMINI_BASE_URL",
754+
"literal_values": [],
755+
"name": "external_gemini_base_url",
756+
"required": false,
757+
"type": "typing.Optional[str]",
758+
"validation": {}
759+
},
760+
{
761+
"category": "EXTERNAL PROVIDERS",
762+
"default": null,
763+
"description": "Base URL override for OpenAI image generation.",
764+
"env_var": "INVOKEAI_EXTERNAL_OPENAI_BASE_URL",
765+
"literal_values": [],
766+
"name": "external_openai_base_url",
767+
"required": false,
768+
"type": "typing.Optional[str]",
769+
"validation": {}
726770
}
727771
]
728772
}

0 commit comments

Comments
 (0)