This repository was archived by the owner on Apr 27, 2026. It is now read-only.
Normalize per-sample embeddings before averaging centroid#5
Open
ComputelessComputer wants to merge 1 commit intomainfrom
Open
Normalize per-sample embeddings before averaging centroid#5ComputelessComputer wants to merge 1 commit intomainfrom
ComputelessComputer wants to merge 1 commit intomainfrom
Conversation
Speaker embeddings must be L2-normalized before averaging so high-magnitude samples don't dominate the centroid. The old code summed raw WeSpeaker outputs and only normalized at the end, which biases the centroid toward louder or longer clips. Now each sample is L2-normalized before summation; the resulting mean is re-normalized as before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Speaker embeddings must be L2-normalized before averaging so samples with larger raw magnitudes (typically longer or louder clips) don't bias the centroid. The previous
normalizedEmbeddingCentroidsummed raw WeSpeaker outputs and only L2-normalized the result at the end.What changed
src-tauri/swift-permissions/src/speech_bridge.swift— per-sample L2 normalization before summation insidenormalizedEmbeddingCentroid. Zero-magnitude samples are skipped. The final L2-normalization of the summed vector is preserved.Why it helps
Centroid embeddings drive speaker similarity comparisons (used today for cross-meeting speaker identification and, after #5, for constraining over-segmented diarization). Giving each sample equal weight — regardless of raw magnitude — matches the standard recipe for averaging speaker embeddings and reduces drift when a speaker has one long monologue plus several short contributions.
What's not in this PR
constrainDiarizedSegmentsembedding-based reassignment (separate PR).selectSpeakerEmbeddingSegments(H3 in the issue) — follow-up.Testing notes
Swift-only change.
bun run buildfor the frontend still passes. Please verify with the existing Swift test suite and, if available, the in-app speaker-suggestion flow with a known speaker to confirm match quality is the same or better.Addresses #4.