QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60
Open
freddy311082 wants to merge 3 commits into
Open
QVAC-20983 feat: add analytic gradchecked backward pass for the Supertonic vocoder#60freddy311082 wants to merge 3 commits into
freddy311082 wants to merge 3 commits into
Conversation
Make the vocoder differentiable for voice-clone enrollment via an analytic,
model-free C++ backward that returns d(loss)/d(latent), gradchecked against
the Task 2 finite-difference harness.
The "transposed convolution" flagged as the main risk is realized in the
vocoder as a fixed reshape+permute (the latent unpack), so its backward is a
pure permutation rather than a conv-transpose kernel.
- Add VocoderBackward (src/supertonic_vocoder_backward.{h,cpp}): owns the
frozen weights, caches per-call activations as state, exposes forward/backward.
Private instance helpers implement denorm, causal conv1d, causal depthwise,
affine batch norm, leaky-relu, latent unpack and the scalar-gamma ConvNeXt
block; channel layer norm, erf-GELU and 1x1 conv are reused from ve_grad.
- Add test/test_supertonic_vocoder_backward.cpp: gradchecks every primitive,
the ConvNeXt block and the full chain (8 checks). Registered in the always-on
`unit` ctest tier (no model/fixtures, no-skip policy).
- Document the op x backend gap and CPU-fallback behavior for enrollment in
docs/voiceclone-backward-vocoder.md (NORM / GELU_ERF / custom ops have no ggml
backward, so the backward runs on CPU on every backend).
Tests: test-supertonic-vocoder-backward -> 8/8 gradchecks pass.
Review StatusCurrent Status: ❌ PENDING Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member. |
…AC-20983)
The existing vocoder gradcheck is self-referential: it validates the
analytic backward against the in-file VocoderBackward::forward, so any
drift between that reference forward and the production vocoder goes
undetected (this is how the per-channel `gamma` modeled as a scalar
slipped past it).
Add a model-free parity test that builds a synthetic CPU-backed
supertonic_model from deterministic weights and feeds the identical raw
buffers to both supertonic_vocoder_forward_cpu (production) and
VocoderBackward::forward (reference). It runs the full chain (denorm,
embed, 10x ConvNeXt with the real {1,2,4,1,2,4,1,1,1,1} dilation
schedule and per-channel gamma, batch norm, leaky-relu head) and asserts
the waveforms match within 1e-5 (observed ~2e-8). Catches gamma layout,
dilation, and weight-index drift the gradcheck cannot see.
Also fix the stale scalar-gamma comment in supertonic_vocoder_backward.h.
GustavoA1604
requested changes
Jun 22, 2026
GustavoA1604
approved these changes
Jun 22, 2026
pratiknarola-t
approved these changes
Jun 22, 2026
Zbig9000
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes the Supertonic vocoder differentiable for voice-clone enrollment by adding
an analytic, model-free C++ backward pass that returns
d(loss)/d(latent),validated against the Task 2 finite-difference gradcheck harness.
Follows the same pattern as the sibling tickets already on
master(#55 text-encoder tail / QVAC-20978, #58 vector estimator / QVAC-20982): a pure
doublereference backward, gradchecked component-wise, with the op×backend gapdocumented.
The "transposed convolution" risk
The ticket flags the transposed convolution as the main risk op. In the
Supertonic vocoder there is no
conv_transposeop: the time upsampling(
ttl_chunk_compress_factor) is a fixedreshape + permute + cont(the latentunpack), so its backward is a pure permutation (
latent_unpack_backward), withno conv-transpose kernel risk.
Changes
src/supertonic_vocoder_backward.{h,cpp}— newVocoderBackwardclass.Owns the frozen weights and caches per-call activations as state; public
surface is
forward(latent)/backward(d_wav). Private instance helpersimplement the vocoder-specific primitives: latent denorm, causal conv1d,
causal depthwise, affine batch norm, leaky-relu (PReLU), latent unpack and the
scalar-gamma ConvNeXt block. Channel layer norm, erf-GELU and 1×1 conv are
reused from
ve_grad(identical math).test/test_supertonic_vocoder_backward.cpp— gradchecks every primitive,the ConvNeXt block and the full chain (8 checks) against central finite
differences. Registered in the always-on
unitctest tier (no model/fixtures,no-skip policy).
docs/voiceclone-backward-vocoder.md— op×backend gap matrix andCPU-fallback behavior for enrollment.
CMakeLists.txt— register thetest-supertonic-vocoder-backwardtarget.CPU fallback (documented)
NORM,GELU_ERFand the customggml_supertonic_*ops have no backward in thevendored
ggml, so the enrollment backward cannot use ggml autodiff on anybackend. It is provided as the analytic C++ backward and runs on CPU on every
backend (enrollment is offline; the realtime synthesis GPU fast paths are
untouched). See the doc for the full matrix.