Skip to content

Commit da1107d

Browse files
[Together] Default text-to-speech voice (#2184)
> Comment from @hanouticelina : this PR should fix the widget for text-to-speech with Together ## Summary Together's `/v1/audio/speech` requires a `voice` field, but the SDK didn't set one — so any TTS call that didn't pass `parameters.voice` failed with HTTP 400 `"voice is required"`. This also caused the periodic HF mapping validator to flip `hexgrad/Kokoro-82M` to `status: "error"`. Defaults `voice` to `af_alloy` (a valid Kokoro voice) **only when the target model is Kokoro** — the only TTS model currently registered for Together. User-supplied parameters always override. Other model families (Orpheus, Cartesia, …) get no default and continue to surface Together's clear `"voice is required"` error if the caller omits one. ## Behavior matrix | Call | Before | After | |---|---|---| | Kokoro, no `voice` | 400 "voice is required" | `voice: "af_alloy"` ✓ | | Kokoro, `parameters: { voice: undefined }` | 400 | `voice: "af_alloy"` ✓ | | Kokoro, `parameters: { voice: "af_bella" }` | `af_bella` ✓ | `af_bella` ✓ (user override) | | Future non-Kokoro, no `voice` | 400 | 400 "voice is required" (no wrong default) | | Future non-Kokoro, `parameters: { voice: "tara" }` | `tara` ✓ | `tara` ✓ | ## Test plan - [x] `pnpm --filter @huggingface/inference run check` (tsc) — clean - [x] `pnpm --filter @huggingface/inference run lint:check` — clean - [x] `pnpm --filter @huggingface/inference run format` — clean - [x] Live against `api.together.xyz` with `hexgrad/Kokoro-82M`: - no `parameters` → 108.9 KB WAV (default `af_alloy`) - `parameters: {}` → 108.5 KB WAV (default `af_alloy`) - `parameters: { voice: undefined }` → 108.6 KB WAV (default `af_alloy`) - `parameters: { voice: "af_bella" }` → 92.9 KB WAV (user override) - [x] Mock-fetch on a synthetic non-Kokoro model: - no `voice` → `body.voice` absent (no wrong default) - `voice: "tara"` → `body.voice: "tara"` (user value passes through) - [x] Reproduced the failure on `main` for direct comparison
1 parent c755461 commit da1107d

1 file changed

Lines changed: 9 additions & 1 deletion

File tree

packages/inference/src/providers/together.ts

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -523,9 +523,17 @@ export class TogetherTextToSpeechTask extends TaskProviderHelper implements Text
523523
}
524524

525525
preparePayload(params: BodyParams): Record<string, unknown> {
526+
const userParams = (params.args.parameters as Record<string, unknown> | undefined) ?? {};
527+
// Together's /v1/audio/speech requires a `voice` field. Voices are model-specific
528+
// (Kokoro accepts `af_*`, Orpheus uses different names, etc.), so we only default
529+
// when the target model is Kokoro — the only TTS model currently registered.
530+
const isKokoro = params.model.toLowerCase().includes("kokoro");
531+
const voice = userParams.voice ?? (isKokoro ? "af_alloy" : undefined);
532+
526533
return {
527534
...omit(params.args, ["inputs", "parameters"]),
528-
...(params.args.parameters as Record<string, unknown> | undefined),
535+
...userParams,
536+
...(voice !== undefined ? { voice } : {}),
529537
input: params.args.inputs,
530538
model: params.model,
531539
};

0 commit comments

Comments
 (0)