Skip to content

Commit a74a7bf

Browse files
committed
prettier
Signed-off-by: Will Manning <will@willmanning.io>
1 parent 35cfc33 commit a74a7bf

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

proposed/0033-block-turboquant.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -661,7 +661,7 @@ dot_contribution_k = ‖query_k‖ × ‖data_k‖ × Σ_j dist_table[q_code[j]]
661661
On GPU, this becomes:
662662

663663
1. **Stage distance table in shared memory**: `dist_table[i][j] = centroids[i] ×
664-
centroids[j]`, 16×16 = 1 KB at b=4. Fits trivially in shared memory.
664+
centroids[j]`, 16×16 = 1 KB at b=4. Fits trivially in shared memory.
665665

666666
2. **Stream code bytes from HBM**: For each 64-vector × 64-dim tile (matching
667667
the PDX layout), gather from the distance table and accumulate in registers.
@@ -706,6 +706,7 @@ The GPU decode pipeline reads compressed data from Vortex files:
706706
The BlockTurboQuant encoding's child arrays (codes, norms, rotation signs) are
707707
individually compressed by the cascading compressor. For GPU decode, we need
708708
either:
709+
709710
- Host-side decompression of the cascade, then GPU transfer of the raw children
710711
- Direct GPU decompression of FastLanes/ALP (if GPU decompression kernels exist)
711712

0 commit comments

Comments
 (0)