Skip to content

Commit d86f337

Browse files
committed
docs(upstream): file Path C tracker B + C as PRs #2146/#2147
After re-author against current PR #2142 macro shape (the previous probe- failed drafts targeted a non-existent tileop scheduler hierarchy), both patches now apply cleanly on jorgecurious metal-gemm-upstream-rebase + #2142 prereq stack. Filed: - PR tile-ai/tilelang#2146 (Path C tracker B): fused FP8 scale broadcast into T.fp8_scaled_matmul K-loop. 16/8 LOC delta in tilelang/language/ fp8_op.py. Closes the 3-6× audiohacking perf gap on FP8 scaled matmul per the cppmega.mlx Path C consumer at fp8_vecmat_path_c.py. - PR tile-ai/tilelang#2147 (Path C tracker C): T.BlockScaledLayout.e8m0_k32 + T.e8m0_to_float DSL primitive. 5 files touched (tilelang/language/ blockscaled_layout.py new, fp8_op.py extended, __init__.py re-export, metal_quant.py Metal lowering, e8m0 layout test). Unblocks Sparse-MLA blockscaled Path C QK reducer. Both stack on PR #2142 (T.fp8_scaled_matmul intrinsic) which stacks on PR #2130 (jorgecurious base). Independent of each other — different gaps, different files (B touches the macro body, C adds the layout primitive). Receipt _filed_prs_2026_05_04.md updated with rows 13-14. Total filed PRs: 14 (across ml-explore/mlx, apache/tvm, tile-ai/tilelang, tile-ai/tvm). All OPEN. Path C tracker A (pipelined_32x32) shipped in commit 3cb6457 + 6746ff9. Path C tracker B (#2146) and C (#2147) now filed upstream. All three Path C follow-up entries from docs/upstream/_path_c_blockers_tracker.md have landing receipts.
1 parent edcc523 commit d86f337

7 files changed

Lines changed: 693 additions & 852 deletions

File tree

docs/upstream/_filed_prs_2026_05_04.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,20 @@ GitHub user: apstenku123 (David Gornshtein, davidgornshtein@gmail.com)
103103
<td>[Metal] FP8 vector cast lanes 2/3/4 (extends storage-only FP8) (TVM-mirror half)</td>
104104
<td>filed against tilelang_main @ 0e15b274, depends on #38 + paired with #2145</td>
105105
</tr>
106+
<tr>
107+
<td>13</td>
108+
<td>tile-ai/tilelang</td>
109+
<td>[#2146](https://github.com/tile-ai/tilelang/pull/2146)</td>
110+
<td>[Metal] fuse per-load FP8 scale broadcast into T.fp8_scaled_matmul K-loop (Path C tracker B)</td>
111+
<td>filed against main, **stacks on PR #2142** which stacks on PR #2130; closes audiohacking 3-6× perf gap on FP8 scaled matmul</td>
112+
</tr>
113+
<tr>
114+
<td>14</td>
115+
<td>tile-ai/tilelang</td>
116+
<td>[#2147](https://github.com/tile-ai/tilelang/pull/2147)</td>
117+
<td>[Metal] add T.BlockScaledLayout.e8m0_k32 + T.e8m0_to_float (blockscaled FP8) (Path C tracker C)</td>
118+
<td>filed against main, **stacks on PR #2142** which stacks on PR #2130; unblocks Sparse-MLA blockscaled Path C QK reducer</td>
119+
</tr>
106120
</tbody>
107121
</table>
108122

0 commit comments

Comments
 (0)