[Together] Default text-to-speech voice (#2184)

hanouticelina · web-flow · commit da1107d64c02 · 2026-05-21T15:15:53.000+02:00
> Comment from @hanouticelina : this PR should fix the widget for text-to-speech with Together ## Summary Together's `/v1/audio/speech` requires a `voice` field, but the SDK didn't set one — so any TTS call that didn't pass `parameters.voice` failed with HTTP 400 `"voice is required"`. This also caused the periodic HF mapping validator to flip `hexgrad/Kokoro-82M` to `status: "error"`. Defaults `voice` to `af_alloy` (a valid Kokoro voice) **only when the target model is Kokoro** — the only TTS model currently registered for Together. User-supplied parameters always override. Other model families (Orpheus, Cartesia, …) get no default and continue to surface Together's clear `"voice is required"` error if the caller omits one. ## Behavior matrix | Call | Before | After | |---|---|---| | Kokoro, no `voice` | 400 "voice is required" | `voice: "af_alloy"` ✓ | | Kokoro, `parameters: { voice: undefined }` | 400 | `voice: "af_alloy"` ✓ | | Kokoro, `parameters: { voice: "af_bella" }` | `af_bella` ✓ | `af_bella` ✓ (user override) | | Future non-Kokoro, no `voice` | 400 | 400 "voice is required" (no wrong default) | | Future non-Kokoro, `parameters: { voice: "tara" }` | `tara` ✓ | `tara` ✓ | ## Test plan - [x] `pnpm --filter @huggingface/inference run check` (tsc) — clean - [x] `pnpm --filter @huggingface/inference run lint:check` — clean - [x] `pnpm --filter @huggingface/inference run format` — clean - [x] Live against `api.together.xyz` with `hexgrad/Kokoro-82M`: - no `parameters` → 108.9 KB WAV (default `af_alloy`) - `parameters: {}` → 108.5 KB WAV (default `af_alloy`) - `parameters: { voice: undefined }` → 108.6 KB WAV (default `af_alloy`) - `parameters: { voice: "af_bella" }` → 92.9 KB WAV (user override) - [x] Mock-fetch on a synthetic non-Kokoro model: - no `voice` → `body.voice` absent (no wrong default) - `voice: "tara"` → `body.voice: "tara"` (user value passes through) - [x] Reproduced the failure on `main` for direct comparison
diff --git a/packages/inference/src/providers/together.ts b/packages/inference/src/providers/together.ts
@@ -523,9 +523,17 @@ export class TogetherTextToSpeechTask extends TaskProviderHelper implements Text
 	}
 
 	preparePayload(params: BodyParams): Record<string, unknown> {
+		const userParams = (params.args.parameters as Record<string, unknown> | undefined) ?? {};
+		// Together's /v1/audio/speech requires a `voice` field. Voices are model-specific
+		// (Kokoro accepts `af_*`, Orpheus uses different names, etc.), so we only default
+		// when the target model is Kokoro — the only TTS model currently registered.
+		const isKokoro = params.model.toLowerCase().includes("kokoro");
+		const voice = userParams.voice ?? (isKokoro ? "af_alloy" : undefined);
+
 		return {
 			...omit(params.args, ["inputs", "parameters"]),
-			...(params.args.parameters as Record<string, unknown> | undefined),
+			...userParams,
+			...(voice !== undefined ? { voice } : {}),
 			input: params.args.inputs,
 			model: params.model,
 		};