Skip to content

Commit 58a06c3

Browse files
authored
🤖 feat: update Gemini Flash to Gemini 3.5 Flash (#3334)
## Summary Updates the curated Gemini Flash slot so the stable \ alias now resolves to \, with matching local metadata, docs, and provider thinking controls. ## Background Gemini Flash is a stable user-facing alias in Mux. The new Gemini 3.5 Flash release should be the first-class Flash target without adding a separate curated preview entry for the older Gemini 3 Flash Preview model. ## Implementation - Repointed \ to \ while keeping the existing \ alias. - Added local token/capability metadata for Gemini 3.5 Flash in \. - Added a narrow Gemini Flash thinking-policy helper shared by policy and Google provider options. - Mapped Mux \ to Google \ for Gemini 3.5 Flash, while preserving \ / \ / \ with thoughts included. - Regenerated model docs and built-in skill content. ## Validation - \bun test v1.2.15 (df017990) - \ - Dogfooded in a dev-server sandbox with provider config copied from \: selected Gemini 3.5 Flash, sent a prompt, and received a successful Gemini 3.5 Flash response. ## Risks Low-to-moderate risk, scoped to model selection, model metadata, and Google thinking options. Existing Gemini 3.1 Pro behavior is covered by tests and left unchanged. --- <details> <summary>📋 Implementation Plan</summary> # Plan: Repoint Gemini Flash to Gemini 3.5 Flash ## Decision Use **Option A**: update the existing first-class Flash slot so `gemini-flash` tracks the latest Flash tier. - Replace the curated Gemini Flash model ID from `google:gemini-3-flash-preview` to the Gemini 3.5 Flash API model ID after verifying the exact ID from Google API/AI Studio (`gemini-3.5-flash` is the likely ID, but the implementer must confirm against an API model list or official developer docs before committing metadata). - Keep `gemini-flash` as the stable user-facing alias. - Do **not** add a separate first-class selector entry for `gemini-3-flash-preview` unless verification shows the old preview must remain curated for compatibility. **Recommended approach net product LoC estimate:** ~45–75 LoC if local `models-extra.ts` metadata is needed; ~20–35 LoC if `bun scripts/update_models.ts` now pulls complete LiteLLM metadata. This excludes tests, docs, and generated `models.json` churn. ## Evidence and constraints - Current curated model registry is `src/common/constants/knownModels.ts`; `KNOWN_MODELS`, aliases, tokenizer overrides, and selector built-ins derive from `MODEL_DEFINITIONS`. - Current Gemini entries: - `GEMINI_31_PRO` → `google:gemini-3.1-pro-preview`, aliases `gemini`, `gemini-pro`. - `GEMINI_3_FLASH` → `google:gemini-3-flash-preview`, alias `gemini-flash`. - Prior Gemini history supports this alias policy: - Gemini 3.1 Pro replaced the earlier Pro entry and kept bare aliases on latest Pro. - Gemini Flash alias was normalized to `gemini-flash`, implying it should track latest Flash. - Current `src/common/utils/tokens/models.json` probe found `gemini-3-flash-preview`, but not `gemini-3.5-flash`. - `src/common/constants/knownModels.test.ts` will fail unless the new `providerModelId` exists in either `models.json` or `models-extra.ts`. - Current thinking policy is wrong for a `gemini-3.5-flash`-style ID: `includes("gemini-3-flash")` misses it, while generic `includes("gemini-3")` catches it as Pro-style. - Google/DeepMind currently describe Gemini 3.5 Flash as available in Gemini API / AI Studio, with 1M input tokens, 64k output tokens, January 2025 knowledge cutoff, multimodal inputs, text output, and tool use including function calling and structured output. ## Phase 0 — Verify exact provider facts before editing 1. Confirm the exact Gemini API model ID from one of: - Google AI Studio model picker / API model list. - Official Gemini API developer docs if updated. - A safe read-only `listModels` call using a configured Google API key, if available. 2. Confirm pricing source: - Prefer official Gemini API pricing docs if updated for Gemini 3.5 Flash. - If official pricing is not yet published in developer docs, either: - use verified LiteLLM metadata from `bun scripts/update_models.ts`, or - add conservative local metadata with a comment that it must be revisited once Google publishes official pricing. 3. Confirm thinking semantics: - Gemini Flash family should expose `minimal`, `low`, `medium`, `high` on the Google API side. - Mux should continue exposing user-facing `off`, `low`, `medium`, `high`, mapping `off` to Google `minimal` for Flash models that do not support true thinking-off. **Quality gate:** record the exact source used for model ID, limits, pricing, and thinking levels in code comments near local metadata or provider mapping if official docs are incomplete/ambiguous. ## Phase 1 — Repoint the curated model registry Edit `src/common/constants/knownModels.ts`: 1. Keep the existing `GEMINI_3_FLASH` key by default for a minimal Option A diff. Add or update its comment to say it tracks the latest Flash tier. Only rename to `GEMINI_35_FLASH` if `rg "GEMINI_3_FLASH"` shows negligible references and the resulting diff is smaller/clearer. 2. Set `providerModelId` to the verified API ID, expected: ```ts providerModelId: "gemini-3.5-flash" ``` 3. Keep only the stable alias unless product explicitly wants version-specific slash aliases: ```ts aliases: ["gemini-flash"] ``` Users can still select the exact full model string with `/model google:gemini-3.5-flash`; avoiding a version alias minimizes future cleanup. 4. Keep tokenizer override unless `ai-tokenizer` has added a better exact tokenizer: ```ts tokenizerOverride: "google/gemini-2.5-pro" ``` **Quality gate:** run `bun test src/common/constants/knownModels.test.ts` after metadata work; alias uniqueness and token metadata coverage should pass. Add a targeted alias assertion if not already covered by nearby tests: `MODEL_ABBREVIATIONS["gemini-flash"] === "google:<verified-id>"` or `resolveModelAlias("gemini-flash") === "google:<verified-id>"`. ## Phase 2 — Add or refresh token/capability metadata Preferred path: 1. Run `bun scripts/update_models.ts` before adding manual metadata. 2. Inspect the generated diff. Keep it only if the churn is acceptable and it adds a **bare** key for the verified model ID, expected `"gemini-3.5-flash"`, with complete pricing/context/capability fields. 3. If the refresh only adds provider-scoped keys such as `gemini/gemini-3.5-flash`, `knownModels.test.ts` will still fail for a `google:` known model; add a bare-key fallback in `models-extra.ts` instead of relying on scoped-only metadata. Fallback path if LiteLLM is not updated, creates broad unrelated churn, or lacks a bare key: 1. Add an entry to `src/common/utils/tokens/models-extra.ts` keyed by the bare provider model ID, expected `"gemini-3.5-flash"`. 2. Include at minimum: - `max_input_tokens: 1048576` - `max_output_tokens: 65536` - `input_cost_per_token` and `output_cost_per_token` from a verified pricing source - `cache_read_input_token_cost` only if the verified pricing source confirms context-cache pricing - `litellm_provider: "vertex_ai-language-models"` - `mode: "chat"` - `supports_function_calling: true` - `supports_vision: true` - `supports_pdf_input: true` - `supports_reasoning: true` - `supports_response_schema: true` - `knowledge_cutoff: "2025-01"` 3. If storing official multimodal support locally, extend the local `ModelData` interface in `models-extra.ts` to include: - `supports_audio_input?: boolean` - `supports_video_input?: boolean` **Quality gate:** add/adjust `src/common/utils/tokens/modelStats.test.ts` and `src/common/utils/ai/modelCapabilities.test.ts` only around behavior that matters: context size, nonzero pricing, and media support. Avoid tautological tests that only repeat static prose. ## Phase 3 — Fix Gemini Flash thinking policy and provider mapping Edit `src/common/utils/thinking/policy.ts`: 1. Replace literal substring detection for Flash with a narrow helper that matches only verified chat Flash IDs, for example: ```ts function isGeminiFlashThinkingLevelModelName(modelName: string): boolean { return ( modelName === "gemini-3-flash-preview" || modelName === "gemini-3.5-flash" || modelName === "gemini-3.5-flash-preview" // only keep if this ID is verified ); } ``` Use the helper before the generic Gemini 3/3.1 Pro branch. Avoid a broad regex that accidentally treats `gemini-3.1-flash-lite-preview`, image, TTS, or other non-chat variants as the same model. 2. Return Mux levels for verified Flash chat models: ```ts ["off", "low", "medium", "high"] ``` 3. Keep Pro behavior separate. If current docs now say Gemini 3.1 Pro supports `medium`, decide whether to broaden Pro in a separate change; do not conflate that with Gemini 3.5 Flash support unless required by failing tests or verified product behavior. Edit `src/common/utils/ai/providerOptions.ts` as a required part of this change: 1. Reuse the same Flash detection helper, or extract a tiny shared helper, so policy and provider option mapping cannot drift. 2. The current Google branch sends `thinkingConfig.thinkingLevel` for `capModelName.includes("gemini-3")`; `gemini-3.5-flash` should still enter that branch. 3. For verified Flash chat models, map Mux `off` to Google `minimal` and **do not** set `includeThoughts` for that lowest mode unless verified docs require it: ```ts thinkingConfig = { thinkingLevel: "minimal" }; ``` Do not rely on omitting `thinkingConfig`; Gemini 3.5 Flash may default to `medium`, which would make Mux `off` misleading. 4. For Flash `low`, `medium`, and `high`, pass through the level and keep `includeThoughts: true`: ```ts thinkingConfig = { includeThoughts: true, thinkingLevel: effectiveThinking }; ``` 5. If `xhigh` or `max` somehow reaches provider mapping despite policy enforcement, defensively map to `high` rather than throwing in the request path. Add a short comment that policy should clamp before provider options, but the provider adapter avoids sending invalid Google values. **Quality gate:** extend `src/common/utils/thinking/policy.test.ts` and `src/common/utils/ai/providerOptions.test.ts` to prove: - `google:gemini-3.5-flash` gets `off/low/medium/high`. - gateway form like `mux-gateway:google/gemini-3.5-flash` behaves the same. - Optional explicit gateway form like `openrouter:google/gemini-3.5-flash` behaves correctly if current normalization supports it. - Flash `off` maps to `{ thinkingConfig: { thinkingLevel: "minimal" } }` without `includeThoughts` unless docs prove otherwise. - Flash `medium` maps to `{ thinkingConfig: { includeThoughts: true, thinkingLevel: "medium" } }`. - Gemini 3.1 Pro behavior remains unchanged. - Optional custom model mapping: a provider model entry `mappedToModel: "google:gemini-3.5-flash"` uses Flash mapping for policy/provider options. ## Phase 4 — Update docs and generated/model-adjacent outputs 1. Run or update `scripts/gen_docs.ts` output so `docs/config/models.mdx` lists: - `Gemini 3.5 Flash` - `google:<verified-id>` - alias `gemini-flash` 2. If display output is unexpectedly wrong, add a focused `src/common/utils/ai/modelDisplay.test.ts` case. The current generic Gemini formatter likely needs no production change, but a dotted-version expectation is cheap if touched nearby. 3. Search for stale `KNOWN_MODELS.GEMINI_3_FLASH` references only if the key is renamed. If the key is kept, no reference churn is expected. **Quality gate:** do not hand-edit generated docs if an existing generation script owns the table; run the generator and keep only expected diffs. ## Phase 5 — Validation Run targeted tests first: ```bash bun test src/common/constants/knownModels.test.ts bun test src/common/utils/thinking/policy.test.ts bun test src/common/utils/ai/providerOptions.test.ts bun test src/common/utils/tokens/modelStats.test.ts bun test src/common/utils/ai/modelCapabilities.test.ts bun test src/common/utils/ai/modelDisplay.test.ts ``` Then run broader checks: ```bash make typecheck make fmt-check make static-check ``` If `bun scripts/update_models.ts` produces broad generated churn, inspect whether it is acceptable; if too broad, prefer `models-extra.ts` for this targeted launch support. ## Phase 6 — Dogfooding plan Because this is a model-selection/provider behavior change, dogfood in the desktop app with a configured Google provider. 1. Start Mux: ```bash make dev ``` 2. In Settings → Providers, confirm Google is configured and enabled. 3. Use the model selector and confirm: - `Gemini 3.5 Flash` appears. - `gemini-flash` resolves to `google:<verified-id>`. - old `Gemini 3 Flash Preview` is no longer the curated `gemini-flash` target. 4. Send smoke prompts at all Flash thinking levels: - `off` / numeric `0` - `low` - `medium` - `high` 5. Use `agent-browser` to capture reviewer evidence: - Screenshot of the model selector showing Gemini 3.5 Flash. - Screenshot of a successful response using `gemini-flash`. - Screenshot of thinking-level control or slash-command usage. - Video recording of selecting the model and sending one prompt. 6. Multimodal smoke check if provider/API key allows it: - Attach a small image or PDF and verify the send path is allowed. - Capture a screenshot of the attachment flow and successful response. ## Acceptance criteria - `gemini-flash` resolves to the verified Gemini 3.5 Flash Google model ID. - No new version-specific alias is added unless product explicitly asks for it. - The first-class model selector lists Gemini 3.5 Flash when Google/direct or configured gateway routing makes it available. - The known-model metadata invariant passes with a bare metadata key for the verified provider model ID in `models.json` or `models-extra.ts`. - Token meter/context warnings use Gemini 3.5 Flash limits and costs. - Gemini 3.5 Flash thinking policy exposes Mux levels `off/low/medium/high`. - Provider options translate Mux `off` to Google `minimal` for Gemini 3.5 Flash instead of accidentally using the API default, and omit `includeThoughts` for this lowest mode unless docs prove otherwise. - Provider options pass Flash `low/medium/high` through with `includeThoughts: true`. - Existing Gemini Pro behavior is unchanged unless explicitly verified and intentionally updated. - Docs table reflects Gemini 3.5 Flash. - Targeted tests, typecheck, fmt-check, and static-check pass. - Dogfooding screenshots and a video recording are captured for reviewer verification. ## Risks and mitigations - **API model ID ambiguity:** block implementation until exact ID is verified from official API/AI Studio, not inferred only from marketing copy. - **Pricing docs lag:** prefer LiteLLM refresh if available; otherwise add local metadata with a clear source/revisit comment. Do not commit press/blog-derived pricing unless official API pricing, LiteLLM, or another trusted provider metadata source confirms it. - **Thinking-level drift:** keep tests focused on observed provider behavior, especially `off` → `minimal` and absence of `includeThoughts` for the lowest mode unless docs require it. - **Overbroad Flash matching:** use a narrow verified-ID helper so image, TTS, Flash Lite, or future non-chat variants do not inherit chat-model thinking behavior accidentally. - **Generated metadata churn:** if `models.json` refresh touches many unrelated entries or lacks a bare key, use `models-extra.ts` for a surgical release. - **Alias compatibility:** existing users selecting `google:gemini-3-flash-preview` explicitly can still use it as a custom model; only the curated `gemini-flash` alias changes. </details> --- _Generated with [\](https://github.com/coder/mux) • Model: \ • Thinking: \ • Cost: \_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=47.10 -->
1 parent 6e9a0c0 commit 58a06c3

12 files changed

Lines changed: 295 additions & 18 deletions

File tree

docs/config/models.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Mux ships with curated models kept up to date with the frontier. Use any custom
2525
| Codex Mini 5.1 | openai:gpt-5.1-codex-mini | `codex-mini` | |
2626
| Codex Max 5.1 | openai:gpt-5.1-codex-max | `codex-max` | |
2727
| Gemini 3.1 Pro Preview | google:gemini-3.1-pro-preview | `gemini`, `gemini-pro` | |
28-
| Gemini 3 Flash Preview | google:gemini-3-flash-preview | `gemini-flash` | |
28+
| Gemini 3.5 Flash | google:gemini-3.5-flash | `gemini-flash` | |
2929
| Grok 4 1 Fast | xai:grok-4-1-fast | `grok`, `grok-4`, `grok-4.1`, `grok-4-1` | |
3030
| Grok Code Fast 1 | xai:grok-code-fast-1 | `grok-code` | |
3131
| DeepSeek V4 Pro | deepseek:deepseek-v4-pro | `deepseek`, `deepseek-pro`, `deepseek-v4`, `deepseek-v4-pro` | |

src/common/constants/knownModels.test.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ describe("Known Models Integration", () => {
2929
}
3030
});
3131

32+
test("gemini-flash resolves to the stable Gemini 3.5 Flash model", () => {
33+
expect(MODEL_ABBREVIATIONS["gemini-flash"]).toBe("google:gemini-3.5-flash");
34+
});
35+
3236
test("known model ids and aliases stay unique across the curated registry", () => {
3337
const seenIds = new Set<string>();
3438
const seenAliases = new Set<string>();

src/common/constants/knownModels.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,9 +115,10 @@ const MODEL_DEFINITIONS = {
115115
aliases: ["gemini", "gemini-pro"],
116116
tokenizerOverride: "google/gemini-2.5-pro",
117117
},
118-
GEMINI_3_FLASH: {
118+
// Gemini Flash alias tracks the latest stable Flash tier.
119+
GEMINI_FLASH: {
119120
provider: "google",
120-
providerModelId: "gemini-3-flash-preview",
121+
providerModelId: "gemini-3.5-flash",
121122
aliases: ["gemini-flash"],
122123
tokenizerOverride: "google/gemini-2.5-pro",
123124
},

src/common/utils/ai/modelCapabilities.test.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,15 @@ describe("getModelCapabilities", () => {
4747
expect(caps?.maxPdfSizeMb).toBeGreaterThan(0);
4848
});
4949

50+
it("returns multimodal capabilities for Gemini 3.5 Flash", () => {
51+
const caps = getModelCapabilities("google:gemini-3.5-flash");
52+
expect(caps).not.toBeNull();
53+
expect(caps?.supportsPdfInput).toBe(true);
54+
expect(caps?.supportsVision).toBe(true);
55+
expect(caps?.supportsAudioInput).toBe(true);
56+
expect(caps?.supportsVideoInput).toBe(true);
57+
});
58+
5059
it("returns null for unknown models", () => {
5160
expect(getModelCapabilities("anthropic:this-model-does-not-exist")).toBeNull();
5261
});

src/common/utils/ai/modelDisplay.test.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ describe("formatModelDisplayName", () => {
4545
describe("Gemini models", () => {
4646
test("formats Gemini models", () => {
4747
expect(formatModelDisplayName("gemini-2-0-flash-exp")).toBe("Gemini 2.0 Flash Exp");
48+
expect(formatModelDisplayName("gemini-3.5-flash")).toBe("Gemini 3.5 Flash");
4849
expect(formatModelDisplayName("gemini-3.1-pro-preview")).toBe("Gemini 3.1 Pro Preview");
4950
});
5051
});

src/common/utils/ai/providerOptions.test.ts

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -749,6 +749,155 @@ describe("buildProviderOptions - OpenAI", () => {
749749
});
750750
});
751751

752+
describe("buildProviderOptions - Google", () => {
753+
test("maps Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
754+
expect(buildProviderOptions("google:gemini-3.5-flash", "off")).toEqual({
755+
google: {
756+
thinkingConfig: {
757+
thinkingLevel: "minimal",
758+
},
759+
},
760+
});
761+
});
762+
763+
test("maps gateway Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
764+
expect(buildProviderOptions("mux-gateway:google/gemini-3.5-flash", "off")).toEqual({
765+
google: {
766+
thinkingConfig: {
767+
thinkingLevel: "minimal",
768+
},
769+
},
770+
});
771+
});
772+
773+
test("maps namespaced Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
774+
expect(buildProviderOptions("google:models/gemini-3.5-flash", "off")).toEqual({
775+
google: {
776+
thinkingConfig: {
777+
thinkingLevel: "minimal",
778+
},
779+
},
780+
});
781+
});
782+
783+
test("maps versioned Gemini 3.5 Flash off to minimal thinking without thoughts", () => {
784+
expect(buildProviderOptions("google:gemini-3.5-flash-001", "off")).toEqual({
785+
google: {
786+
thinkingConfig: {
787+
thinkingLevel: "minimal",
788+
},
789+
},
790+
});
791+
});
792+
793+
test("maps Gemini 3.5 Flash medium to thinkingLevel medium with thoughts", () => {
794+
expect(buildProviderOptions("mux-gateway:google/gemini-3.5-flash", "medium")).toEqual({
795+
google: {
796+
thinkingConfig: {
797+
includeThoughts: true,
798+
thinkingLevel: "medium",
799+
},
800+
},
801+
});
802+
});
803+
804+
test("uses mapped model capabilities for custom Gemini 3.5 Flash aliases", () => {
805+
const providersConfig = createMockProvidersConfig({
806+
"google:custom-flash": "google:gemini-3.5-flash",
807+
});
808+
809+
expect(
810+
buildProviderOptions(
811+
"google:custom-flash",
812+
"off",
813+
undefined,
814+
undefined,
815+
undefined,
816+
undefined,
817+
undefined,
818+
providersConfig
819+
)
820+
).toEqual({
821+
google: {
822+
thinkingConfig: {
823+
thinkingLevel: "minimal",
824+
},
825+
},
826+
});
827+
});
828+
829+
test("maps non-preview Gemini 3 Flash off to minimal thinking without thoughts", () => {
830+
expect(buildProviderOptions("google:gemini-3-flash", "off")).toEqual({
831+
google: {
832+
thinkingConfig: {
833+
thinkingLevel: "minimal",
834+
},
835+
},
836+
});
837+
});
838+
839+
test("maps Gemini 3 Flash Preview off to minimal thinking without thoughts", () => {
840+
expect(buildProviderOptions("google:gemini-3-flash-preview", "off")).toEqual({
841+
google: {
842+
thinkingConfig: {
843+
thinkingLevel: "minimal",
844+
},
845+
},
846+
});
847+
});
848+
849+
test("maps versioned Gemini 3 Flash Preview off to minimal thinking without thoughts", () => {
850+
expect(buildProviderOptions("google:gemini-3-flash-preview-latest", "off")).toEqual({
851+
google: {
852+
thinkingConfig: {
853+
thinkingLevel: "minimal",
854+
},
855+
},
856+
});
857+
});
858+
859+
test("defensively maps unsupported Gemini 3.5 Flash xhigh to high", () => {
860+
expect(buildProviderOptions("google:gemini-3.5-flash", "xhigh")).toEqual({
861+
google: {
862+
thinkingConfig: {
863+
includeThoughts: true,
864+
thinkingLevel: "high",
865+
},
866+
},
867+
});
868+
});
869+
870+
test("passes Gemini 3.1 Pro low through as thinkingLevel low with thoughts", () => {
871+
expect(buildProviderOptions("google:gemini-3.1-pro-preview", "low")).toEqual({
872+
google: {
873+
thinkingConfig: {
874+
includeThoughts: true,
875+
thinkingLevel: "low",
876+
},
877+
},
878+
});
879+
});
880+
881+
test("defensively maps unsupported Gemini 3.5 Flash max to high", () => {
882+
expect(buildProviderOptions("google:gemini-3.5-flash", "max")).toEqual({
883+
google: {
884+
thinkingConfig: {
885+
includeThoughts: true,
886+
thinkingLevel: "high",
887+
},
888+
},
889+
});
890+
});
891+
892+
test("keeps Gemini 3.1 Pro off without provider thinking config", () => {
893+
expect(buildProviderOptions("google:gemini-3.1-pro-preview", "off")).toEqual({
894+
google: {
895+
thinkingConfig: undefined,
896+
},
897+
});
898+
});
899+
});
900+
752901
describe("buildRequestHeaders", () => {
753902
for (const { name, model, options, expected } of [
754903
{

src/common/utils/ai/providerOptions.ts

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import {
2323
OPENAI_REASONING_EFFORT,
2424
OPENROUTER_REASONING_EFFORT,
2525
} from "@/common/types/thinking";
26+
import { isGeminiFlashThinkingLevelModelName } from "@/common/utils/thinking/policy";
2627
import { resolveModelForMetadata } from "@/common/utils/providers/modelEntries";
2728
import { log } from "@/node/services/log";
2829
import type { MuxMessage } from "@/common/types/message";
@@ -409,22 +410,25 @@ export function buildProviderOptions(
409410

410411
// Build Google-specific options
411412
if (formatProvider === "google") {
412-
const isGemini3 = capModelName.includes("gemini-3");
413+
const capBareModelName = capModelName.split("/").at(-1) ?? capModelName;
414+
const usesGeminiThinkingLevelConfig = capBareModelName.includes("gemini-3");
415+
const isGeminiFlashThinkingModel = isGeminiFlashThinkingLevelModelName(capBareModelName);
413416
let thinkingConfig: GoogleGenerativeAIProviderOptions["thinkingConfig"];
414417

415-
if (effectiveThinking !== "off") {
418+
if (isGeminiFlashThinkingModel && effectiveThinking === "off") {
419+
// Gemini Flash chat models default to medium and do not support true thinking-off;
420+
// send minimal explicitly so Mux's "off" setting means lowest-effort behavior.
421+
thinkingConfig = { thinkingLevel: "minimal" };
422+
} else if (effectiveThinking !== "off") {
416423
thinkingConfig = {
417424
includeThoughts: true,
418425
};
419426

420-
if (isGemini3) {
421-
// Policy enforcement already clamped to valid levels for Flash/Pro,
422-
// so effectiveThinking is guaranteed in the model's allowed set.
423-
// Flash: off/low/medium/high; Pro: low/high. "xhigh" can't reach here.
424-
thinkingConfig.thinkingLevel = effectiveThinking as Exclude<
425-
ThinkingLevel,
426-
"off" | "xhigh" | "max"
427-
>;
427+
if (usesGeminiThinkingLevelConfig) {
428+
// Policy enforcement should clamp to valid Google levels before this adapter runs.
429+
// Avoid leaking xhigh/max to Google if a caller bypasses policy.
430+
thinkingConfig.thinkingLevel =
431+
effectiveThinking === "xhigh" || effectiveThinking === "max" ? "high" : effectiveThinking;
428432
} else {
429433
// Gemini 2.5 uses thinkingBudget
430434
const budget = GEMINI_THINKING_BUDGETS[effectiveThinking];

src/common/utils/thinking/policy.test.ts

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
import { describe, expect, test } from "bun:test";
2-
import { getThinkingPolicyForModel, enforceThinkingPolicy, resolveThinkingInput } from "./policy";
2+
import {
3+
getThinkingPolicyForModel,
4+
enforceThinkingPolicy,
5+
resolveThinkingInput,
6+
isGeminiFlashThinkingLevelModelName,
7+
} from "./policy";
38

49
describe("getThinkingPolicyForModel", () => {
510
test("returns 5 levels including xhigh for gpt-5.1-codex-max", () => {
@@ -386,6 +391,55 @@ describe("getThinkingPolicyForModel", () => {
386391
expect(getThinkingPolicyForModel("google:gemini-3.1-pro-preview")).toEqual(["low", "high"]);
387392
});
388393

394+
test("returns off/low/medium/high for stable Gemini 3.5 Flash", () => {
395+
expect(getThinkingPolicyForModel("google:gemini-3.5-flash")).toEqual([
396+
"off",
397+
"low",
398+
"medium",
399+
"high",
400+
]);
401+
expect(getThinkingPolicyForModel("mux-gateway:google/gemini-3.5-flash")).toEqual([
402+
"off",
403+
"low",
404+
"medium",
405+
"high",
406+
]);
407+
});
408+
409+
test("returns off/low/medium/high for versioned stable Gemini 3.5 Flash IDs", () => {
410+
for (const model of [
411+
"google:gemini-3.5-flash-001",
412+
"google:gemini-3.5-flash-latest",
413+
"google:gemini-3.5-flash-preview",
414+
]) {
415+
expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
416+
}
417+
});
418+
419+
test("returns off/low/medium/high for stable Gemini 3.5 Flash behind OpenRouter", () => {
420+
expect(getThinkingPolicyForModel("openrouter:google/gemini-3.5-flash")).toEqual([
421+
"off",
422+
"low",
423+
"medium",
424+
"high",
425+
]);
426+
});
427+
428+
test("returns off/low/medium/high for non-preview Gemini 3 Flash IDs", () => {
429+
for (const model of ["google:gemini-3-flash", "google:gemini-3-flash-001"]) {
430+
expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
431+
}
432+
});
433+
434+
test("returns off/low/medium/high for versioned Gemini 3 Flash Preview IDs", () => {
435+
for (const model of [
436+
"google:gemini-3-flash-preview-20251217",
437+
"google:gemini-3-flash-preview-latest",
438+
]) {
439+
expect(getThinkingPolicyForModel(model)).toEqual(["off", "low", "medium", "high"]);
440+
}
441+
});
442+
389443
test("returns off/low/medium/high for Gemini 3 Flash", () => {
390444
expect(getThinkingPolicyForModel("google:gemini-3-flash-preview")).toEqual([
391445
"off",
@@ -411,6 +465,13 @@ describe("getThinkingPolicyForModel", () => {
411465
});
412466
});
413467

468+
describe("isGeminiFlashThinkingLevelModelName", () => {
469+
test("does not classify Gemini Flash Lite variants as Flash thinking-level chat models", () => {
470+
expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);
471+
expect(isGeminiFlashThinkingLevelModelName("gemini-3.5-flash-lite")).toBe(false);
472+
});
473+
});
474+
414475
describe("enforceThinkingPolicy", () => {
415476
describe("single-option policy models (gpt-5-pro)", () => {
416477
test("enforces high for any requested level", () => {

src/common/utils/thinking/policy.ts

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,20 @@ import {
2525
*/
2626
export type ThinkingPolicy = readonly ThinkingLevel[];
2727

28+
/**
29+
* True when modelName is a bare Gemini Flash chat model ID using Google's
30+
* thinkingLevel config (minimal/low/medium/high) instead of Gemini 2.x thinkingBudget.
31+
* @param modelName Provider model ID without the provider prefix (e.g. "gemini-3.5-flash", not "google:gemini-3.5-flash").
32+
*/
33+
export function isGeminiFlashThinkingLevelModelName(modelName: string): boolean {
34+
const normalized = modelName.trim().toLowerCase();
35+
return (
36+
((normalized === "gemini-3-flash" || normalized.startsWith("gemini-3-flash-")) &&
37+
!normalized.startsWith("gemini-3-flash-lite")) ||
38+
(normalized.startsWith("gemini-3.5-flash") && !normalized.startsWith("gemini-3.5-flash-lite"))
39+
);
40+
}
41+
2842
/**
2943
* Returns the thinking policy for a given model.
3044
*
@@ -36,7 +50,8 @@ export type ThinkingPolicy = readonly ThinkingLevel[];
3650
* - openai:gpt-5.2 / openai:gpt-5.5 → ["off", "low", "medium", "high", "xhigh"]
3751
* - openai:gpt-5.2-pro / openai:gpt-5.5-pro → ["medium", "high", "xhigh"] (3 levels)
3852
* - openai:gpt-5-pro → ["high"] (only supported level, legacy)
39-
* - gemini-3 → ["low", "high"] (thinking level only)
53+
* - Gemini Flash chat variants → ["off", "low", "medium", "high"]
54+
* - gemini-3 Pro variants → ["low", "high"] (thinking level only)
4055
* - default → ["off", "low", "medium", "high"] (standard 4 levels; xhigh is opt-in per model)
4156
*
4257
* Tolerates version suffixes (e.g., gpt-5-pro-2025-10-06).
@@ -95,8 +110,8 @@ export function getThinkingPolicyForModel(modelString: string): ThinkingPolicy {
95110
return ["high"];
96111
}
97112

98-
// Gemini 3 Flash supports 4 levels: off (minimal), low, medium, high
99-
if (withoutProviderNamespace.includes("gemini-3-flash")) {
113+
// Gemini Flash chat models support minimal/low/medium/high. Mux exposes minimal as "off".
114+
if (isGeminiFlashThinkingLevelModelName(withoutProviderNamespace)) {
100115
return ["off", "low", "medium", "high"];
101116
}
102117

src/common/utils/tokens/modelStats.test.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,15 @@ describe("getModelStats", () => {
4343
expect(stats.tiered_pricing_threshold_tokens).toBeUndefined();
4444
});
4545

46+
test("resolves Gemini 3.5 Flash with published standard pricing and limits", () => {
47+
const stats = expectStats(KNOWN_MODELS.GEMINI_FLASH.id);
48+
expect(stats.max_input_tokens).toBe(1048576);
49+
expect(stats.max_output_tokens).toBe(65536);
50+
expect(stats.input_cost_per_token).toBe(0.0000015);
51+
expect(stats.output_cost_per_token).toBe(0.000009);
52+
expect(stats.cache_read_input_token_cost).toBe(0.00000015);
53+
});
54+
4655
test("defaults tiered pricing threshold to 200K when metadata only ships *_above_200k rates", () => {
4756
const stats = expectStats("google:gemini-3.1-pro-preview");
4857
expect(stats.tiered_pricing_threshold_tokens).toBe(200000);

0 commit comments

Comments
 (0)