Skip to content

Commit 516f9c9

Browse files
committed
prettier
Signed-off-by: Will Manning <will@willmanning.io>
1 parent a33f166 commit 516f9c9

1 file changed

Lines changed: 34 additions & 34 deletions

File tree

proposed/0033-block-turboquant.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@ child encodings.
358358
In the initial implementation, block decomposition is embedded inside
359359
`TurboQuantArray` — all blocks use TQ MSE-only encoding with independent SORF
360360
rotations, and TQ-specific children (centroids, rotation signs) are stored
361-
alongside the blocks. However, the *concept* of block decomposition is
361+
alongside the blocks. However, the _concept_ of block decomposition is
362362
encoding-agnostic: a future refactor could extract it into a general-purpose
363363
`BlockDecomposedFSLArray` that wraps k independently-encoded child arrays. This
364364
matters for straggler-block support (see below), where the straggler may use a
@@ -370,17 +370,17 @@ power-of-2 TQ array with an independent B-dim SORF rotation.
370370

371371
**Changes vs. Stage 1 (with TQ blocks):**
372372

373-
| Aspect | Stage 1 | Stage 2 |
374-
| --------------------- | ------------------------------------------- | ---------------------------------------------------------------------------- |
375-
| Block count | k = 1 (single power-of-2 block) | **k = d/B** (multiple blocks) |
376-
| SORF dimension | padded_dim (next power-of-2 ≥ dim) | **B** (e.g., 256 for d=768) |
377-
| Rotation signs | `FSL`, len = R, element dim = padded_dim | **`FSL`, len = k × R**, element dim = B |
378-
| Centroids | Computed for padded_dim distribution | **Computed for B-dim distribution** (different codebook!) |
379-
| Norms child | `PrimitiveArray<F>`, 1 per vector | **`PrimitiveArray<F>` (k=1) or `FixedSizeListArray<F>` (k>1)**, same dtype F |
380-
| Codes list_size | padded_dim | **k × B** (= d) |
381-
| Scheme compress() | Single SORF → quantize | **Choose B → split → per-block normalize/rotate/quantize** |
382-
| Quantized dot product | Single sum over padded_dim centroids | **Per-block weighted sum** (Σ_k norm_a_k · norm_b_k · unit_dot_k) |
383-
| L2 norm readthrough | O(1) — return stored norm | **O(k)** — compute √(Σ_k norm_k²) |
373+
| Aspect | Stage 1 | Stage 2 |
374+
| --------------------- | ---------------------------------------- | ---------------------------------------------------------------------------- |
375+
| Block count | k = 1 (single power-of-2 block) | **k = d/B** (multiple blocks) |
376+
| SORF dimension | padded_dim (next power-of-2 ≥ dim) | **B** (e.g., 256 for d=768) |
377+
| Rotation signs | `FSL`, len = R, element dim = padded_dim | **`FSL`, len = k × R**, element dim = B |
378+
| Centroids | Computed for padded_dim distribution | **Computed for B-dim distribution** (different codebook!) |
379+
| Norms child | `PrimitiveArray<F>`, 1 per vector | **`PrimitiveArray<F>` (k=1) or `FixedSizeListArray<F>` (k>1)**, same dtype F |
380+
| Codes list_size | padded_dim | **k × B** (= d) |
381+
| Scheme compress() | Single SORF → quantize | **Choose B → split → per-block normalize/rotate/quantize** |
382+
| Quantized dot product | Single sum over padded_dim centroids | **Per-block weighted sum** (Σ_k norm_a_k · norm_b_k · unit_dot_k) |
383+
| L2 norm readthrough | O(1) — return stored norm | **O(k)** — compute √(Σ_k norm_k²) |
384384

385385
**Unchanged from Stage 1:** SORF construction (R-round HD, default R=3),
386386
Max-Lloyd algorithm, f32 internal quantization, slice/take semantics (per-row
@@ -731,12 +731,12 @@ validated.
731731

732732
If pursued, four strategies should be compared:
733733

734-
| Strategy | Theoretical | Speed | Storage |
735-
| -------------------- | --------------------- | ---------------- | --------------- |
736-
| Per-block Gaussian | Correct (Lemma 4 [1]) | O(B²)/block | k×B²×4 bytes |
737-
| Per-block SORF | Approximate | O(B log B)/block | k×R×B bits |
738-
| Full-dim SORF | Approximate | O(d log d) total | R×d bits |
739-
| MSE-only (no QJL) | N/A | 0 | None |
734+
| Strategy | Theoretical | Speed | Storage |
735+
| ------------------ | --------------------- | ---------------- | ------------ |
736+
| Per-block Gaussian | Correct (Lemma 4 [1]) | O(B²)/block | k×B²×4 bytes |
737+
| Per-block SORF | Approximate | O(B log B)/block | k×R×B bits |
738+
| Full-dim SORF | Approximate | O(d log d) total | R×d bits |
739+
| MSE-only (no QJL) | N/A | 0 | None |
740740

741741
The paper's QJL uses Gaussian S (not SORF); Lemma 4 [1] is proved specifically
742742
for Gaussian. SORF for QJL is an additional approximation (the
@@ -820,19 +820,19 @@ replace 32 with 64 in the norms row — ratios decrease accordingly):
820820

821821
**At b_mse=8 (default, near-lossless):**
822822

823-
| d | B | k | Per-vec bits | Ratio | Notes |
824-
| ------------- | ---- | --- | ----------------------- | ----- | ------------------------ |
825-
| 768 | 256 | 3 | 3×256×8 + 3×32 = 6240 | 3.9× | Block decomp; no padding |
826-
| 1024 | 1024 | 1 | 1024×8 + 32 = 8224 | 4.0× | Single block (= current) |
827-
| 768 (padded)| 1024 | 1 | 1024×8 + 32 = 8224 | 3.0× | Padded; 33% overhead |
823+
| d | B | k | Per-vec bits | Ratio | Notes |
824+
| ------------ | ---- | --- | --------------------- | ----- | ------------------------ |
825+
| 768 | 256 | 3 | 3×256×8 + 3×32 = 6240 | 3.9× | Block decomp; no padding |
826+
| 1024 | 1024 | 1 | 1024×8 + 32 = 8224 | 4.0× | Single block (= current) |
827+
| 768 (padded) | 1024 | 1 | 1024×8 + 32 = 8224 | 3.0× | Padded; 33% overhead |
828828

829829
**At b_mse=5 (32 centroids):**
830830

831-
| d | B | k | Per-vec bits | Ratio | Notes |
832-
| ------------- | ---- | --- | ----------------------- | ----- | ------------------------ |
833-
| 768 | 256 | 3 | 3×256×5 + 3×32 = 3936 | 6.2× | Block decomp; no padding |
834-
| 1024 | 1024 | 1 | 1024×5 + 32 = 5152 | 6.4× | Single block (= current) |
835-
| 768 (padded)| 1024 | 1 | 1024×5 + 32 = 5152 | 4.8× | Padded; 33% overhead |
831+
| d | B | k | Per-vec bits | Ratio | Notes |
832+
| ------------ | ---- | --- | --------------------- | ----- | ------------------------ |
833+
| 768 | 256 | 3 | 3×256×5 + 3×32 = 3936 | 6.2× | Block decomp; no padding |
834+
| 1024 | 1024 | 1 | 1024×5 + 32 = 5152 | 6.4× | Single block (= current) |
835+
| 768 (padded) | 1024 | 1 | 1024×5 + 32 = 5152 | 4.8× | Padded; 33% overhead |
836836

837837
Block decomposition improves the compression ratio at both bit widths. At b=8
838838
for d=768: from ~3.0× (padded) to ~3.9× (block decomp). At b=5 for d=768: from
@@ -986,7 +986,7 @@ For common model dimensions, the most promising configurations are:
986986
| ---------------------- | --------------------------- | -------------------------------------------------------------------------- |
987987
| 512, 1024, 2048, 4096 | Single-block MSE-only + PDX | B=d, no decomposition needed. Same as current TQ but with PDX scan layout. |
988988
| 768, 1536, 3072 | 3-block MSE-only + PDX | B=256 or 512. No padding waste. 3 blocks, shared centroids. |
989-
| No qualifying B (rare) | Padded single-block | Internal zero-padding to next power-of-2, single SORF. |
989+
| No qualifying B (rare) | Padded single-block | Internal zero-padding to next power-of-2, single SORF. |
990990

991991
In all cases, MSE-only is the recommended starting point. QJL should only be
992992
added if experiments demonstrate clear recall@k improvements for the target
@@ -1121,11 +1121,11 @@ TurboQuant.
11211121

11221122
**Incremental shipping:**
11231123

1124-
| Stage | Ships to users? | Reads prior stage files? | Notes |
1125-
| --------- | ---------------- | --------------------------- | ---------------------------------- |
1126-
| 1 (MSE) | Yes | N/A (first stable version) | Single block, variable SORF rounds |
1127-
| 2 (blocks) | Yes | Yes (k=1 is identical) | k>1 files need Stage 2+ decoder |
1128-
| 3 (PDX) | Yes | Yes (FSL codes still work) | PDX codes need PDXArray registered |
1124+
| Stage | Ships to users? | Reads prior stage files? | Notes |
1125+
| ---------- | --------------- | -------------------------- | ---------------------------------- |
1126+
| 1 (MSE) | Yes | N/A (first stable version) | Single block, variable SORF rounds |
1127+
| 2 (blocks) | Yes | Yes (k=1 is identical) | k>1 files need Stage 2+ decoder |
1128+
| 3 (PDX) | Yes | Yes (FSL codes still work) | PDX codes need PDXArray registered |
11291129

11301130
Each stage is independently shippable. Users can upgrade incrementally. Files
11311131
written by earlier stages are always readable by later decoders.

0 commit comments

Comments
 (0)