QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder by freddy311082 · Pull Request #60 · tetherto/qvac-ext-lib-whisper.cpp

freddy311082 · 2026-06-19T21:42:38Z

What

Makes the Supertonic vocoder differentiable for voice-clone enrollment by adding
an analytic, model-free C++ backward pass that returns d(loss)/d(latent),
validated against the Task 2 finite-difference gradcheck harness.

Follows the same pattern as the sibling tickets already on master
(#55 text-encoder tail / QVAC-20978, #58 vector estimator / QVAC-20982): a pure
double reference backward, gradchecked component-wise, with the op×backend gap
documented.

The "transposed convolution" risk

The ticket flags the transposed convolution as the main risk op. In the
Supertonic vocoder there is no conv_transpose op: the time upsampling
(ttl_chunk_compress_factor) is a fixed reshape + permute + cont (the latent
unpack), so its backward is a pure permutation (latent_unpack_backward), with
no conv-transpose kernel risk.

Changes

src/supertonic_vocoder_backward.{h,cpp} — new VocoderBackward class.
Owns the frozen weights and caches per-call activations as state; public
surface is forward(latent) / backward(d_wav). Private instance helpers
implement the vocoder-specific primitives: latent denorm, causal conv1d,
causal depthwise, affine batch norm, leaky-relu (PReLU), latent unpack and the
scalar-gamma ConvNeXt block. Channel layer norm, erf-GELU and 1×1 conv are
reused from ve_grad (identical math).
test/test_supertonic_vocoder_backward.cpp — gradchecks every primitive,
the ConvNeXt block and the full chain (8 checks) against central finite
differences. Registered in the always-on unit ctest tier (no model/fixtures,
no-skip policy).
docs/voiceclone-backward-vocoder.md — op×backend gap matrix and
CPU-fallback behavior for enrollment.
CMakeLists.txt — register the test-supertonic-vocoder-backward target.

CPU fallback (documented)

NORM, GELU_ERF and the custom ggml_supertonic_* ops have no backward in the
vendored ggml, so the enrollment backward cannot use ggml autodiff on any
backend. It is provided as the analytic C++ backward and runs on CPU on every
backend (enrollment is offline; the realtime synthesis GPU fast paths are
untouched). See the doc for the full matrix.

Make the vocoder differentiable for voice-clone enrollment via an analytic, model-free C++ backward that returns d(loss)/d(latent), gradchecked against the Task 2 finite-difference harness. The "transposed convolution" flagged as the main risk is realized in the vocoder as a fixed reshape+permute (the latent unpack), so its backward is a pure permutation rather than a conv-transpose kernel. - Add VocoderBackward (src/supertonic_vocoder_backward.{h,cpp}): owns the frozen weights, caches per-call activations as state, exposes forward/backward. Private instance helpers implement denorm, causal conv1d, causal depthwise, affine batch norm, leaky-relu, latent unpack and the scalar-gamma ConvNeXt block; channel layer norm, erf-GELU and 1x1 conv are reused from ve_grad. - Add test/test_supertonic_vocoder_backward.cpp: gradchecks every primitive, the ConvNeXt block and the full chain (8 checks). Registered in the always-on `unit` ctest tier (no model/fixtures, no-skip policy). - Document the op x backend gap and CPU-fallback behavior for enrollment in docs/voiceclone-backward-vocoder.md (NORM / GELU_ERF / custom ops have no ggml backward, so the backward runs on CPU on every backend). Tests: test-supertonic-vocoder-backward -> 8/8 gradchecks pass.

github-actions · 2026-06-19T21:42:48Z

Review Status

Current Status: ❌ PENDING
Approvals so far: none

Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member.

…AC-20983) The existing vocoder gradcheck is self-referential: it validates the analytic backward against the in-file VocoderBackward::forward, so any drift between that reference forward and the production vocoder goes undetected (this is how the per-channel `gamma` modeled as a scalar slipped past it). Add a model-free parity test that builds a synthetic CPU-backed supertonic_model from deterministic weights and feeds the identical raw buffers to both supertonic_vocoder_forward_cpu (production) and VocoderBackward::forward (reference). It runs the full chain (denorm, embed, 10x ConvNeXt with the real {1,2,4,1,2,4,1,1,1,1} dilation schedule and per-channel gamma, batch norm, leaky-relu head) and asserts the waveforms match within 1e-5 (observed ~2e-8). Catches gamma layout, dilation, and weight-index drift the gradcheck cannot see. Also fix the stale scalar-gamma comment in supertonic_vocoder_backward.h.

…file

freddy311082 requested review from a team as code owners June 19, 2026 21:42

freddy311082 mentioned this pull request Jun 19, 2026

QVAC-20984 feat: add analytic gradchecked backward pass for the CAMPPlus speaker encoder #61

Merged

GustavoA1604 requested changes Jun 22, 2026

View reviewed changes

Comment thread tts-cpp/test/test_supertonic_vocoder_backward_parity.cpp

test: include <algorithm> header in Supertonic vocoder parity test …

c8596a5

…file

GustavoA1604 approved these changes Jun 22, 2026

View reviewed changes

pratiknarola-t approved these changes Jun 22, 2026

View reviewed changes

Zbig9000 approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60

QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60
freddy311082 wants to merge 3 commits into
masterfrom
QVAC-20983/ggml-backward-vocoder

freddy311082 commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

freddy311082 commented Jun 19, 2026

What

The "transposed convolution" risk

Changes

CPU fallback (documented)

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Jun 19, 2026 •

edited

Loading