🤖 feat: update Gemini Flash to Gemini 3.5 Flash (#3334)

ThomasK33 · web-flow · commit 58a06c35c063 · 2026-05-20T08:07:19.000Z
## Summary Updates the curated Gemini Flash slot so the stable \ alias now resolves to \, with matching local metadata, docs, and provider thinking controls. ## Background Gemini Flash is a stable user-facing alias in Mux. The new Gemini 3.5 Flash release should be the first-class Flash target without adding a separate curated preview entry for the older Gemini 3 Flash Preview model. ## Implementation - Repointed \ to \ while keeping the existing \ alias. - Added local token/capability metadata for Gemini 3.5 Flash in \. - Added a narrow Gemini Flash thinking-policy helper shared by policy and Google provider options. - Mapped Mux \ to Google \ for Gemini 3.5 Flash, while preserving \ / \ / \ with thoughts included. - Regenerated model docs and built-in skill content. ## Validation - \bun test v1.2.15 (df017990) - \ - Dogfooded in a dev-server sandbox with provider config copied from \: selected Gemini 3.5 Flash, sent a prompt, and received a successful Gemini 3.5 Flash response. ## Risks Low-to-moderate risk, scoped to model selection, model metadata, and Google thinking options. Existing Gemini 3.1 Pro behavior is covered by tests and left unchanged. --- <details> <summary>📋 Implementation Plan</summary> # Plan: Repoint Gemini Flash to Gemini 3.5 Flash ## Decision Use **Option A**: update the existing first-class Flash slot so `gemini-flash` tracks the latest Flash tier. - Replace the curated Gemini Flash model ID from `google:gemini-3-flash-preview` to the Gemini 3.5 Flash API model ID after verifying the exact ID from Google API/AI Studio (`gemini-3.5-flash` is the likely ID, but the implementer must confirm against an API model list or official developer docs before committing metadata). - Keep `gemini-flash` as the stable user-facing alias. - Do **not** add a separate first-class selector entry for `gemini-3-flash-preview` unless verification shows the old preview must remain curated for compatibility. **Recommended approach net product LoC estimate:** ~45–75 LoC if local `models-extra.ts` metadata is needed; ~20–35 LoC if `bun scripts/update_models.ts` now pulls complete LiteLLM metadata. This excludes tests, docs, and generated `models.json` churn. ## Evidence and constraints - Current curated model registry is `src/common/constants/knownModels.ts`; `KNOWN_MODELS`, aliases, tokenizer overrides, and selector built-ins derive from `MODEL_DEFINITIONS`. - Current Gemini entries: - `GEMINI_31_PRO` → `google:gemini-3.1-pro-preview`, aliases `gemini`, `gemini-pro`. - `GEMINI_3_FLASH` → `google:gemini-3-flash-preview`, alias `gemini-flash`. - Prior Gemini history supports this alias policy: - Gemini 3.1 Pro replaced the earlier Pro entry and kept bare aliases on latest Pro. - Gemini Flash alias was normalized to `gemini-flash`, implying it should track latest Flash. - Current `src/common/utils/tokens/models.json` probe found `gemini-3-flash-preview`, but not `gemini-3.5-flash`. - `src/common/constants/knownModels.test.ts` will fail unless the new `providerModelId` exists in either `models.json` or `models-extra.ts`. - Current thinking policy is wrong for a `gemini-3.5-flash`-style ID: `includes("gemini-3-flash")` misses it, while generic `includes("gemini-3")` catches it as Pro-style. - Google/DeepMind currently describe Gemini 3.5 Flash as available in Gemini API / AI Studio, with 1M input tokens, 64k output tokens, January 2025 knowledge cutoff, multimodal inputs, text output, and tool use including function calling and structured output. ## Phase 0 — Verify exact provider facts before editing 1. Confirm the exact Gemini API model ID from one of: - Google AI Studio model picker / API model list. - Official Gemini API developer docs if updated. - A safe read-only `listModels` call using a configured Google API key, if available. 2. Confirm pricing source: - Prefer official Gemini API pricing docs if updated for Gemini 3.5 Flash. - If official pricing is not yet published in developer docs, either: - use verified LiteLLM metadata from `bun scripts/update_models.ts`, or - add conservative local metadata with a comment that it must be revisited once Google publishes official pricing. 3. Confirm thinking semantics: - Gemini Flash family should expose `minimal`, `low`, `medium`, `high` on the Google API side. - Mux should continue exposing user-facing `off`, `low`, `medium`, `high`, mapping `off` to Google `minimal` for Flash models that do not support true thinking-off. **Quality gate:** record the exact source used for model ID, limits, pricing, and thinking levels in code comments near local metadata or provider mapping if official docs are incomplete/ambiguous. ## Phase 1 — Repoint the curated model registry Edit `src/common/constants/knownModels.ts`: 1. Keep the existing `GEMINI_3_FLASH` key by default for a minimal Option A diff. Add or update its comment to say it tracks the latest Flash tier. Only rename to `GEMINI_35_FLASH` if `rg "GEMINI_3_FLASH"` shows negligible references and the resulting diff is smaller/clearer. 2. Set `providerModelId` to the verified API ID, expected: ```ts providerModelId: "gemini-3.5-flash" ``` 3. Keep only the stable alias unless product explicitly wants version-specific slash aliases: ```ts aliases: ["gemini-flash"] ``` Users can still select the exact full model string with `/model google:gemini-3.5-flash`; avoiding a version alias minimizes future cleanup. 4. Keep tokenizer override unless `ai-tokenizer` has added a better exact tokenizer: ```ts tokenizerOverride: "google/gemini-2.5-pro" ``` **Quality gate:** run `bun test src/common/constants/knownModels.test.ts` after metadata work; alias uniqueness and token metadata coverage should pass. Add a targeted alias assertion if not already covered by nearby tests: `MODEL_ABBREVIATIONS["gemini-flash"] === "google:<verified-id>"` or `resolveModelAlias("gemini-flash") === "google:<verified-id>"`. ## Phase 2 — Add or refresh token/capability metadata Preferred path: 1. Run `bun scripts/update_models.ts` before adding manual metadata. 2. Inspect the generated diff. Keep it only if the churn is acceptable and it adds a **bare** key for the verified model ID, expected `"gemini-3.5-flash"`, with complete pricing/context/capability fields. 3. If the refresh only adds provider-scoped keys such as `gemini/gemini-3.5-flash`, `knownModels.test.ts` will still fail for a `google:` known model; add a bare-key fallback in `models-extra.ts` instead of relying on scoped-only metadata. Fallback path if LiteLLM is not updated, creates broad unrelated churn, or lacks a bare key: 1. Add an entry to `src/common/utils/tokens/models-extra.ts` keyed by the bare provider model ID, expected `"gemini-3.5-flash"`. 2. Include at minimum: - `max_input_tokens: 1048576` - `max_output_tokens: 65536` - `input_cost_per_token` and `output_cost_per_token` from a verified pricing source - `cache_read_input_token_cost` only if the verified pricing source confirms context-cache pricing - `litellm_provider: "vertex_ai-language-models"` - `mode: "chat"` - `supports_function_calling: true` - `supports_vision: true` - `supports_pdf_input: true` - `supports_reasoning: true` - `supports_response_schema: true` - `knowledge_cutoff: "2025-01"` 3. If storing official multimodal support locally, extend the local `ModelData` interface in `models-extra.ts` to include: - `supports_audio_input?: boolean` - `supports_video_input?: boolean` **Quality gate:** add/adjust `src/common/utils/tokens/modelStats.test.ts` and `src/common/utils/ai/modelCapabilities.test.ts` only around behavior that matters: context size, nonzero pricing, and media support. Avoid tautological tests that only repeat static prose. ## Phase 3 — Fix Gemini Flash thinking policy and provider mapping Edit `src/common/utils/thinking/policy.ts`: 1. Replace literal substring detection for Flash with a narrow helper that matches only verified chat Flash IDs, for example: ```ts function isGeminiFlashThinkingLevelModelName(modelName: string): boolean { return ( modelName === "gemini-3-flash-preview" || modelName === "gemini-3.5-flash" || modelName === "gemini-3.5-flash-preview" // only keep if this ID is verified ); } ``` Use the helper before the generic Gemini 3/3.1 Pro branch. Avoid a broad regex that accidentally treats `gemini-3.1-flash-lite-preview`, image, TTS, or other non-chat variants as the same model. 2. Return Mux levels for verified Flash chat models: ```ts ["off", "low", "medium", "high"] ``` 3. Keep Pro behavior separate. If current docs now say Gemini 3.1 Pro supports `medium`, decide whether to broaden Pro in a separate change; do not conflate that with Gemini 3.5 Flash support unless required by failing tests or verified product behavior. Edit `src/common/utils/ai/providerOptions.ts` as a required part of this change: 1. Reuse the same Flash detection helper, or extract a tiny shared helper, so policy and provider option mapping cannot drift. 2. The current Google branch sends `thinkingConfig.thinkingLevel` for `capModelName.includes("gemini-3")`; `gemini-3.5-flash` should still enter that branch. 3. For verified Flash chat models, map Mux `off` to Google `minimal` and **do not** set `includeThoughts` for that lowest mode unless verified docs require it: ```ts thinkingConfig = { thinkingLevel: "minimal" }; ``` Do not rely on omitting `thinkingConfig`; Gemini 3.5 Flash may default to `medium`, which would make Mux `off` misleading. 4. For Flash `low`, `medium`, and `high`, pass through the level and keep `includeThoughts: true`: ```ts thinkingConfig = { includeThoughts: true, thinkingLevel: effectiveThinking }; ``` 5. If `xhigh` or `max` somehow reaches provider mapping despite policy enforcement, defensively map to `high` rather than throwing in the request path. Add a short comment that policy should clamp before provider options, but the provider adapter avoids sending invalid Google values. **Quality gate:** extend `src/common/utils/thinking/policy.test.ts` and `src/common/utils/ai/providerOptions.test.ts` to prove: - `google:gemini-3.5-flash` gets `off/low/medium/high`. - gateway form like `mux-gateway:google/gemini-3.5-flash` behaves the same. - Optional explicit gateway form like `openrouter:google/gemini-3.5-flash` behaves correctly if current normalization supports it. - Flash `off` maps to `{ thinkingConfig: { thinkingLevel: "minimal" } }` without `includeThoughts` unless docs prove otherwise. - Flash `medium` maps to `{ thinkingConfig: { includeThoughts: true, thinkingLevel: "medium" } }`. - Gemini 3.1 Pro behavior remains unchanged. - Optional custom model mapping: a provider model entry `mappedToModel: "google:gemini-3.5-flash"` uses Flash mapping for policy/provider options. ## Phase 4 — Update docs and generated/model-adjacent outputs 1. Run or update `scripts/gen_docs.ts` output so `docs/config/models.mdx` lists: - `Gemini 3.5 Flash` - `google:<verified-id>` - alias `gemini-flash` 2. If display output is unexpectedly wrong, add a focused `src/common/utils/ai/modelDisplay.test.ts` case. The current generic Gemini formatter likely needs no production change, but a dotted-version expectation is cheap if touched nearby. 3. Search for stale `KNOWN_MODELS.GEMINI_3_FLASH` references only if the key is renamed. If the key is kept, no reference churn is expected. **Quality gate:** do not hand-edit generated docs if an existing generation script owns the table; run the generator and keep only expected diffs. ## Phase 5 — Validation Run targeted tests first: ```bash bun test src/common/constants/knownModels.test.ts bun test src/common/utils/thinking/policy.test.ts bun test src/common/utils/ai/providerOptions.test.ts bun test src/common/utils/tokens/modelStats.test.ts bun test src/common/utils/ai/modelCapabilities.test.ts bun test src/common/utils/ai/modelDisplay.test.ts ``` Then run broader checks: ```bash make typecheck make fmt-check make static-check ``` If `bun scripts/update_models.ts` produces broad generated churn, inspect whether it is acceptable; if too broad, prefer `models-extra.ts` for this targeted launch support. ## Phase 6 — Dogfooding plan Because this is a model-selection/provider behavior change, dogfood in the desktop app with a configured Google provider. 1. Start Mux: ```bash make dev ``` 2. In Settings → Providers, confirm Google is configured and enabled. 3. Use the model selector and confirm: - `Gemini 3.5 Flash` appears. - `gemini-flash` resolves to `google:<verified-id>`. - old `Gemini 3 Flash Preview` is no longer the curated `gemini-flash` target. 4. Send smoke prompts at all Flash thinking levels: - `off` / numeric `0` - `low` - `medium` - `high` 5. Use `agent-browser` to capture reviewer evidence: - Screenshot of the model selector showing Gemini 3.5 Flash. - Screenshot of a successful response using `gemini-flash`. - Screenshot of thinking-level control or slash-command usage. - Video recording of selecting the model and sending one prompt. 6. Multimodal smoke check if provider/API key allows it: - Attach a small image or PDF and verify the send path is allowed. - Capture a screenshot of the attachment flow and successful response. ## Acceptance criteria - `gemini-flash` resolves to the verified Gemini 3.5 Flash Google model ID. - No new version-specific alias is added unless product explicitly asks for it. - The first-class model selector lists Gemini 3.5 Flash when Google/direct or configured gateway routing makes it available. - The known-model metadata invariant passes with a bare metadata key for the verified provider model ID in `models.json` or `models-extra.ts`. - Token meter/context warnings use Gemini 3.5 Flash limits and costs. - Gemini 3.5 Flash thinking policy exposes Mux levels `off/low/medium/high`. - Provider options translate Mux `off` to Google `minimal` for Gemini 3.5 Flash instead of accidentally using the API default, and omit `includeThoughts` for this lowest mode unless docs prove otherwise. - Provider options pass Flash `low/medium/high` through with `includeThoughts: true`. - Existing Gemini Pro behavior is unchanged unless explicitly verified and intentionally updated. - Docs table reflects Gemini 3.5 Flash. - Targeted tests, typecheck, fmt-check, and static-check pass. - Dogfooding screenshots and a video recording are captured for reviewer verification. ## Risks and mitigations - **API model ID ambiguity:** block implementation until exact ID is verified from official API/AI Studio, not inferred only from marketing copy. - **Pricing docs lag:** prefer LiteLLM refresh if available; otherwise add local metadata with a clear source/revisit comment. Do not commit press/blog-derived pricing unless official API pricing, LiteLLM, or another trusted provider metadata source confirms it. - **Thinking-level drift:** keep tests focused on observed provider behavior, especially `off` → `minimal` and absence of `includeThoughts` for the lowest mode unless docs require it. - **Overbroad Flash matching:** use a narrow verified-ID helper so image, TTS, Flash Lite, or future non-chat variants do not inherit chat-model thinking behavior accidentally. - **Generated metadata churn:** if `models.json` refresh touches many unrelated entries or lacks a bare key, use `models-extra.ts` for a surgical release. - **Alias compatibility:** existing users selecting `google:gemini-3-flash-preview` explicitly can still use it as a custom model; only the curated `gemini-flash` alias changes. </details> --- _Generated with [\](https://github.com/coder/mux) • Model: \ • Thinking: \ • Cost: \_
diff --git a/docs/config/models.mdx b/docs/config/models.mdx
@@ -25,7 +25,7 @@ Mux ships with curated models kept up to date with the frontier. Use any custom
 | Codex Mini 5.1         | openai:gpt-5.1-codex-mini     | `codex-mini`                                                 |         |
 | Codex Max 5.1          | openai:gpt-5.1-codex-max      | `codex-max`                                                  |         |
 | Gemini 3.1 Pro Preview | google:gemini-3.1-pro-preview | `gemini`, `gemini-pro`                                       |         |
-| Gemini 3 Flash Preview | google:gemini-3-flash-preview | `gemini-flash`                                               |         |
+| Gemini 3.5 Flash       | google:gemini-3.5-flash       | `gemini-flash`                                               |         |
 | Grok 4 1 Fast          | xai:grok-4-1-fast             | `grok`, `grok-4`, `grok-4.1`, `grok-4-1`                     |         |
 | Grok Code Fast 1       | xai:grok-code-fast-1          | `grok-code`                                                  |         |
 | DeepSeek V4 Pro        | deepseek:deepseek-v4-pro      | `deepseek`, `deepseek-pro`, `deepseek-v4`, `deepseek-v4-pro` |         |
diff --git a/src/common/constants/knownModels.test.ts b/src/common/constants/knownModels.test.ts
@@ -29,6 +29,10 @@ describe("Known Models Integration", () => {
     }
   });
 
+  test("gemini-flash resolves to the stable Gemini 3.5 Flash model", () => {
+    expect(MODEL_ABBREVIATIONS["gemini-flash"]).toBe("google:gemini-3.5-flash");
+  });
+
   test("known model ids and aliases stay unique across the curated registry", () => {
     const seenIds = new Set<string>();
     const seenAliases = new Set<string>();
diff --git a/src/common/constants/knownModels.ts b/src/common/constants/knownModels.ts
@@ -115,9 +115,10 @@ const MODEL_DEFINITIONS = {
     aliases: ["gemini", "gemini-pro"],
     tokenizerOverride: "google/gemini-2.5-pro",
   },
-  GEMINI_3_FLASH: {
+  // Gemini Flash alias tracks the latest stable Flash tier.
+  GEMINI_FLASH: {
     provider: "google",
-    providerModelId: "gemini-3-flash-preview",
+    providerModelId: "gemini-3.5-flash",
     aliases: ["gemini-flash"],
     tokenizerOverride: "google/gemini-2.5-pro",
   },
diff --git a/src/common/utils/ai/modelCapabilities.test.ts b/src/common/utils/ai/modelCapabilities.test.ts
@@ -47,6 +47,15 @@ describe("getModelCapabilities", () => {
     expect(caps?.maxPdfSizeMb).toBeGreaterThan(0);
   });
 
+  it("returns multimodal capabilities for Gemini 3.5 Flash", () => {
+    const caps = getModelCapabilities("google:gemini-3.5-flash");
+    expect(caps).not.toBeNull();
+    expect(caps?.supportsPdfInput).toBe(true);
+    expect(caps?.supportsVision).toBe(true);
+    expect(caps?.supportsAudioInput).toBe(true);
+    expect(caps?.supportsVideoInput).toBe(true);
+  });
+
   it("returns null for unknown models", () => {
     expect(getModelCapabilities("anthropic:this-model-does-not-exist")).toBeNull();
   });
diff --git a/src/common/utils/ai/modelDisplay.test.ts b/src/common/utils/ai/modelDisplay.test.ts
@@ -45,6 +45,7 @@ describe("formatModelDisplayName", () => {
   describe("Gemini models", () => {
     test("formats Gemini models", () => {
       expect(formatModelDisplayName("gemini-2-0-flash-exp")).toBe("Gemini 2.0 Flash Exp");
+      expect(formatModelDisplayName("gemini-3.5-flash")).toBe("Gemini 3.5 Flash");
       expect(formatModelDisplayName("gemini-3.1-pro-preview")).toBe("Gemini 3.1 Pro Preview");
     });
   });
diff --git a/src/common/utils/ai/providerOptions.test.ts b/src/common/utils/ai/providerOptions.test.ts
@@ -749,6 +749,155 @@ describe("buildProviderOptions - OpenAI", () => {
   });
 });
 
+describe("buildProviderOptions - Google", () => {
+  test("maps Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3.5-flash", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps gateway Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("mux-gateway:google/gemini-3.5-flash", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps namespaced Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:models/gemini-3.5-flash", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps versioned Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3.5-flash-001", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps Gemini 3.5 Flash medium to thinkingLevel medium with thoughts", () => {
+    expect(buildProviderOptions("mux-gateway:google/gemini-3.5-flash", "medium")).toEqual({
+      google: {
+        thinkingConfig: {
+          includeThoughts: true,
+          thinkingLevel: "medium",
+        },
+      },
+    });
+  });
+
+  test("uses mapped model capabilities for custom Gemini 3.5 Flash aliases", () => {
+    const providersConfig = createMockProvidersConfig({
+      "google:custom-flash": "google:gemini-3.5-flash",
+    });
+
+    expect(
+      buildProviderOptions(
+        "google:custom-flash",
+        "off",
+        undefined,
+        undefined,
+        undefined,
+        undefined,
+        undefined,
+        providersConfig
+      )
+    ).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps non-preview Gemini 3 Flash off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3-flash", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps Gemini 3 Flash Preview off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3-flash-preview", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("maps versioned Gemini 3 Flash Preview off to minimal thinking without thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3-flash-preview-latest", "off")).toEqual({
+      google: {
+        thinkingConfig: {
+          thinkingLevel: "minimal",
+        },
+      },
+    });
+  });
+
+  test("defensively maps unsupported Gemini 3.5 Flash xhigh to high", () => {
+    expect(buildProviderOptions("google:gemini-3.5-flash", "xhigh")).toEqual({
+      google: {
+        thinkingConfig: {
+          includeThoughts: true,
+          thinkingLevel: "high",
+        },
+      },
+    });
+  });
+
+  test("passes Gemini 3.1 Pro low through as thinkingLevel low with thoughts", () => {
+    expect(buildProviderOptions("google:gemini-3.1-pro-preview", "low")).toEqual({
+      google: {
+        thinkingConfig: {
+          includeThoughts: true,
+          thinkingLevel: "low",
+        },
+      },
+    });
+  });
+
+  test("defensively maps unsupported Gemini 3.5 Flash max to high", () => {
+    expect(buildProviderOptions("google:gemini-3.5-flash", "max")).toEqual({
+      google: {
+        thinkingConfig: {
+          includeThoughts: true,
+          thinkingLevel: "high",
+        },
+      },
+    });
+  });
+
+  test("keeps Gemini 3.1 Pro off without provider thinking config", () => {
+    expect(buildProviderOptions("google:gemini-3.1-pro-preview", "off")).toEqual({
+      google: {
+        thinkingConfig: undefined,
+      },
+    });
+  });
+});
+
 describe("buildRequestHeaders", () => {
   for (const { name, model, options, expected } of [
     {
diff --git a/src/common/utils/ai/providerOptions.ts b/src/common/utils/ai/providerOptions.ts
@@ -23,6 +23,7 @@ import {
   OPENAI_REASONING_EFFORT,
   OPENROUTER_REASONING_EFFORT,
 } from "@/common/types/thinking";
+import { isGeminiFlashThinkingLevelModelName } from "@/common/utils/thinking/policy";
 import { resolveModelForMetadata } from "@/common/utils/providers/modelEntries";
 import { log } from "@/node/services/log";
 import type { MuxMessage } from "@/common/types/message";
@@ -409,22 +410,25 @@ export function buildProviderOptions(
 
   // Build Google-specific options
   if (formatProvider === "google") {
-    const isGemini3 = capModelName.includes("gemini-3");
+    const capBareModelName = capModelName.split("/").at(-1) ?? capModelName;
+    const usesGeminiThinkingLevelConfig = capBareModelName.includes("gemini-3");
+    const isGeminiFlashThinkingModel = isGeminiFlashThinkingLevelModelName(capBareModelName);
     let thinkingConfig: GoogleGenerativeAIProviderOptions["thinkingConfig"];
 
-    if (effectiveThinking !== "off") {
+    if (isGeminiFlashThinkingModel && effectiveThinking === "off") {
+      // Gemini Flash chat models default to medium and do not support true thinking-off;
+      // send minimal explicitly so Mux's "off" setting means lowest-effort behavior.
+      thinkingConfig = { thinkingLevel: "minimal" };
+    } else if (effectiveThinking !== "off") {
       thinkingConfig = {
         includeThoughts: true,
       };
 
-      if (isGemini3) {
-        // Policy enforcement already clamped to valid levels for Flash/Pro,
-        // so effectiveThinking is guaranteed in the model's allowed set.
-        // Flash: off/low/medium/high; Pro: low/high. "xhigh" can't reach here.
-        thinkingConfig.thinkingLevel = effectiveThinking as Exclude<
-          ThinkingLevel,
-          "off" | "xhigh" | "max"
-        >;
+      if (usesGeminiThinkingLevelConfig) {
+        // Policy enforcement should clamp to valid Google levels before this adapter runs.
+        // Avoid leaking xhigh/max to Google if a caller bypasses policy.
+        thinkingConfig.thinkingLevel =
+          effectiveThinking === "xhigh" || effectiveThinking === "max" ? "high" : effectiveThinking;
       } else {
         // Gemini 2.5 uses thinkingBudget
         const budget = GEMINI_THINKING_BUDGETS[effectiveThinking];
diff --git a/src/common/utils/thinking/policy.test.ts b/src/common/utils/thinking/policy.test.ts
@@ -1,5 +1,10 @@
 import { describe, expect, test } from "bun:test";
-import { getThinkingPolicyForModel, enforceThinkingPolicy, resolveThinkingInput } from "./policy";
+import {
+  getThinkingPolicyForModel,
+  enforceThinkingPolicy,
+  resolveThinkingInput,
+  isGeminiFlashThinkingLevelModelName,
+} from "./policy";
 
 describe("getThinkingPolicyForModel", () => {
   test("returns 5 levels including xhigh for gpt-5.1-codex-max", () => {
@@ -386,6 +391,55 @@ describe("getThinkingPolicyForModel", () => {
     expect(getThinkingPolicyForModel("google:gemini-3.1-pro-preview")).toEqual(["low", "high"]);
   });
 
+  test("returns off/low/medium/high for stable Gemini 3.5 Flash", () => {
+    expect(getThinkingPolicyForModel("google:gemini-3.5-flash")).toEqual([
+      "off",
+      "low",
+      "medium",
+      "high",
+    ]);
+    expect(getThinkingPolicyForModel("mux-gateway:google/gemini-3.5-flash")).toEqual([
+      "off",
+      "low",
+      "medium",
+      "high",
+    ]);
+  });
+
+  test("returns off/low/medium/high for versioned stable Gemini 3.5 Flash IDs", () => {
+    for (const model of [
+      "google:gemini-3.5-flash-001",
+      "google:gemini-3.5-flash-latest",
+      "google:gemini-3.5-flash-preview",
+    ]) {
+      expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
+    }
+  });
+
+  test("returns off/low/medium/high for stable Gemini 3.5 Flash behind OpenRouter", () => {
+    expect(getThinkingPolicyForModel("openrouter:google/gemini-3.5-flash")).toEqual([
+      "off",
+      "low",
+      "medium",
+      "high",
+    ]);
+  });
+
+  test("returns off/low/medium/high for non-preview Gemini 3 Flash IDs", () => {
+    for (const model of ["google:gemini-3-flash", "google:gemini-3-flash-001"]) {
+      expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
+    }
+  });
+
+  test("returns off/low/medium/high for versioned Gemini 3 Flash Preview IDs", () => {
+    for (const model of [
+      "google:gemini-3-flash-preview-20251217",
+      "google:gemini-3-flash-preview-latest",
+    ]) {
+      expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
+    }
+  });
+
   test("returns off/low/medium/high for Gemini 3 Flash", () => {
     expect(getThinkingPolicyForModel("google:gemini-3-flash-preview")).toEqual([
       "off",
@@ -411,6 +465,13 @@ describe("getThinkingPolicyForModel", () => {
   });
 });
 
+describe("isGeminiFlashThinkingLevelModelName", () => {
+  test("does not classify Gemini Flash Lite variants as Flash thinking-level chat models", () => {
+    expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);
+    expect(isGeminiFlashThinkingLevelModelName("gemini-3.5-flash-lite")).toBe(false);
+  });
+});
+
 describe("enforceThinkingPolicy", () => {
   describe("single-option policy models (gpt-5-pro)", () => {
     test("enforces high for any requested level", () => {
diff --git a/src/common/utils/thinking/policy.ts b/src/common/utils/thinking/policy.ts
@@ -25,6 +25,20 @@ import {
  */
 export type ThinkingPolicy = readonly ThinkingLevel[];
 
+/**
+ * True when modelName is a bare Gemini Flash chat model ID using Google's
+ * thinkingLevel config (minimal/low/medium/high) instead of Gemini 2.x thinkingBudget.
+ * @param modelName Provider model ID without the provider prefix (e.g. "gemini-3.5-flash", not "google:gemini-3.5-flash").
+ */
+export function isGeminiFlashThinkingLevelModelName(modelName: string): boolean {
+  const normalized = modelName.trim().toLowerCase();
+  return (
+    ((normalized === "gemini-3-flash" || normalized.startsWith("gemini-3-flash-")) &&
+      !normalized.startsWith("gemini-3-flash-lite")) ||
+    (normalized.startsWith("gemini-3.5-flash") && !normalized.startsWith("gemini-3.5-flash-lite"))
+  );
+}
+
 /**
  * Returns the thinking policy for a given model.
  *
@@ -36,7 +50,8 @@ export type ThinkingPolicy = readonly ThinkingLevel[];
  * - openai:gpt-5.2 / openai:gpt-5.5 → ["off", "low", "medium", "high", "xhigh"]
  * - openai:gpt-5.2-pro / openai:gpt-5.5-pro → ["medium", "high", "xhigh"] (3 levels)
  * - openai:gpt-5-pro → ["high"] (only supported level, legacy)
- * - gemini-3 → ["low", "high"] (thinking level only)
+ * - Gemini Flash chat variants → ["off", "low", "medium", "high"]
+ * - gemini-3 Pro variants → ["low", "high"] (thinking level only)
  * - default → ["off", "low", "medium", "high"] (standard 4 levels; xhigh is opt-in per model)
  *
  * Tolerates version suffixes (e.g., gpt-5-pro-2025-10-06).
@@ -95,8 +110,8 @@ export function getThinkingPolicyForModel(modelString: string): ThinkingPolicy {
     return ["high"];
   }
 
-  // Gemini 3 Flash supports 4 levels: off (minimal), low, medium, high
-  if (withoutProviderNamespace.includes("gemini-3-flash")) {
+  // Gemini Flash chat models support minimal/low/medium/high. Mux exposes minimal as "off".
+  if (isGeminiFlashThinkingLevelModelName(withoutProviderNamespace)) {
     return ["off", "low", "medium", "high"];
   }
 
diff --git a/src/common/utils/tokens/modelStats.test.ts b/src/common/utils/tokens/modelStats.test.ts
@@ -43,6 +43,15 @@ describe("getModelStats", () => {
     expect(stats.tiered_pricing_threshold_tokens).toBeUndefined();
   });
 
+  test("resolves Gemini 3.5 Flash with published standard pricing and limits", () => {
+    const stats = expectStats(KNOWN_MODELS.GEMINI_FLASH.id);
+    expect(stats.max_input_tokens).toBe(1048576);
+    expect(stats.max_output_tokens).toBe(65536);
+    expect(stats.input_cost_per_token).toBe(0.0000015);
+    expect(stats.output_cost_per_token).toBe(0.000009);
+    expect(stats.cache_read_input_token_cost).toBe(0.00000015);
+  });
+
   test("defaults tiered pricing threshold to 200K when metadata only ships *_above_200k rates", () => {
     const stats = expectStats("google:gemini-3.1-pro-preview");
     expect(stats.tiered_pricing_threshold_tokens).toBe(200000);
diff --git a/src/common/utils/tokens/models-extra.ts b/src/common/utils/tokens/models-extra.ts
diff --git a/src/node/services/agentSkills/builtInSkillContent.generated.ts b/src/node/services/agentSkills/builtInSkillContent.generated.ts