Skip to content

QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60

Open
freddy311082 wants to merge 3 commits into
masterfrom
QVAC-20983/ggml-backward-vocoder
Open

QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60
freddy311082 wants to merge 3 commits into
masterfrom
QVAC-20983/ggml-backward-vocoder

Conversation

@freddy311082

Copy link
Copy Markdown

What

Makes the Supertonic vocoder differentiable for voice-clone enrollment by adding
an analytic, model-free C++ backward pass that returns d(loss)/d(latent),
validated against the Task 2 finite-difference gradcheck harness.

Follows the same pattern as the sibling tickets already on master
(#55 text-encoder tail / QVAC-20978, #58 vector estimator / QVAC-20982): a pure
double reference backward, gradchecked component-wise, with the op×backend gap
documented.

The "transposed convolution" risk

The ticket flags the transposed convolution as the main risk op. In the
Supertonic vocoder there is no conv_transpose op: the time upsampling
(ttl_chunk_compress_factor) is a fixed reshape + permute + cont (the latent
unpack), so its backward is a pure permutation (latent_unpack_backward), with
no conv-transpose kernel risk.

Changes

  • src/supertonic_vocoder_backward.{h,cpp} — new VocoderBackward class.
    Owns the frozen weights and caches per-call activations as state; public
    surface is forward(latent) / backward(d_wav). Private instance helpers
    implement the vocoder-specific primitives: latent denorm, causal conv1d,
    causal depthwise, affine batch norm, leaky-relu (PReLU), latent unpack and the
    scalar-gamma ConvNeXt block. Channel layer norm, erf-GELU and 1×1 conv are
    reused from ve_grad (identical math).
  • test/test_supertonic_vocoder_backward.cpp — gradchecks every primitive,
    the ConvNeXt block and the full chain (8 checks) against central finite
    differences. Registered in the always-on unit ctest tier (no model/fixtures,
    no-skip policy).
  • docs/voiceclone-backward-vocoder.md — op×backend gap matrix and
    CPU-fallback behavior for enrollment.
  • CMakeLists.txt — register the test-supertonic-vocoder-backward target.

CPU fallback (documented)

NORM, GELU_ERF and the custom ggml_supertonic_* ops have no backward in the
vendored ggml, so the enrollment backward cannot use ggml autodiff on any
backend. It is provided as the analytic C++ backward and runs on CPU on every
backend
(enrollment is offline; the realtime synthesis GPU fast paths are
untouched). See the doc for the full matrix.

Make the vocoder differentiable for voice-clone enrollment via an analytic,
model-free C++ backward that returns d(loss)/d(latent), gradchecked against
the Task 2 finite-difference harness.

The "transposed convolution" flagged as the main risk is realized in the
vocoder as a fixed reshape+permute (the latent unpack), so its backward is a
pure permutation rather than a conv-transpose kernel.

- Add VocoderBackward (src/supertonic_vocoder_backward.{h,cpp}): owns the
  frozen weights, caches per-call activations as state, exposes forward/backward.
  Private instance helpers implement denorm, causal conv1d, causal depthwise,
  affine batch norm, leaky-relu, latent unpack and the scalar-gamma ConvNeXt
  block; channel layer norm, erf-GELU and 1x1 conv are reused from ve_grad.
- Add test/test_supertonic_vocoder_backward.cpp: gradchecks every primitive,
  the ConvNeXt block and the full chain (8 checks). Registered in the always-on
  `unit` ctest tier (no model/fixtures, no-skip policy).
- Document the op x backend gap and CPU-fallback behavior for enrollment in
  docs/voiceclone-backward-vocoder.md (NORM / GELU_ERF / custom ops have no ggml
  backward, so the backward runs on CPU on every backend).

Tests: test-supertonic-vocoder-backward -> 8/8 gradchecks pass.
@freddy311082 freddy311082 requested review from a team as code owners June 19, 2026 21:42
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Status

Current Status: ❌ PENDING
Approvals so far: none

Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member.

…AC-20983)

The existing vocoder gradcheck is self-referential: it validates the
analytic backward against the in-file VocoderBackward::forward, so any
drift between that reference forward and the production vocoder goes
undetected (this is how the per-channel `gamma` modeled as a scalar
slipped past it).

Add a model-free parity test that builds a synthetic CPU-backed
supertonic_model from deterministic weights and feeds the identical raw
buffers to both supertonic_vocoder_forward_cpu (production) and
VocoderBackward::forward (reference). It runs the full chain (denorm,
embed, 10x ConvNeXt with the real {1,2,4,1,2,4,1,1,1,1} dilation
schedule and per-channel gamma, batch norm, leaky-relu head) and asserts
the waveforms match within 1e-5 (observed ~2e-8). Catches gamma layout,
dilation, and weight-index drift the gradcheck cannot see.

Also fix the stale scalar-gamma comment in supertonic_vocoder_backward.h.
Comment thread tts-cpp/test/test_supertonic_vocoder_backward_parity.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants