|
| 1 | +# Filed upstream PRs — 2026-05-04 |
| 2 | + |
| 3 | +GitHub user: `apstenku123` (David Gornshtein, davidgornshtein@gmail.com) |
| 4 | + |
| 5 | +## Filed (6 of 9 artifacts) |
| 6 | + |
| 7 | +| # | Repo | PR | Title | Status | |
| 8 | +|---|---|---|---|---| |
| 9 | +| 1 | `ml-explore/mlx` | [#3476](https://github.com/ml-explore/mlx/pull/3476) | [Python] Add mx.from_dlpack(obj) Metal-aware consumer | filed against `main` | |
| 10 | +| 2 | `apache/tvm` | [#19504](https://github.com/apache/tvm/pull/19504) | [Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers | filed against `main` | |
| 11 | +| 3 | `tile-ai/tilelang` | [#2139](https://github.com/tile-ai/tilelang/pull/2139) | [Metal] allow mixed-dtype T.gemm via scalar fallback | filed against `main`, **stacks on PR #2130** | |
| 12 | +| 4 | `tile-ai/tilelang` | [#2140](https://github.com/tile-ai/tilelang/pull/2140) | [Metal] route FP8-input T.gemm to scalar fallback | filed against `main`, **stacks on PR #2130** | |
| 13 | +| 5 | `tile-ai/tilelang` | [#2141](https://github.com/tile-ai/tilelang/pull/2141) | [Metal] thread stage dim through T.access_ptr for T.Pipelined num_stages>1 | filed against `main`, **stacks on PR #2130** | |
| 14 | +| 6 | `tile-ai/tilelang` | [#2142](https://github.com/tile-ai/tilelang/pull/2142) | tilelang: T.fp8_scaled_matmul DSL intrinsic + Metal lowering | filed against `main`, **stacks on PR #2130** | |
| 15 | + |
| 16 | +## Deferred (split needed: tilelang supermodule + TileLang/tvm submodule) |
| 17 | + |
| 18 | +These two patches modify both `src/target/codegen_metal.{cc,h}` (tilelang |
| 19 | +supermodule) and `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}` |
| 20 | +(vendored TileLang/tvm submodule). They cannot be filed as a single |
| 21 | +tilelang PR because the supermodule and submodule have separate review |
| 22 | +chains. Split into companion PRs needed: |
| 23 | + |
| 24 | +| Artifact | Tilelang side | TileLang/tvm side | |
| 25 | +|---|---|---| |
| 26 | +| `tilelang_metal_fp8` | `src/target/codegen_metal.{cc,h}` (storage-only FP8 helpers) | `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}` (mirror) | |
| 27 | +| `tilelang_metal_fp8_vector` | `src/target/codegen_metal.{cc,h}` (vector cast lanes 2/3/4) | `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}` (mirror) | |
| 28 | + |
| 29 | +To file: each becomes 2 PRs: |
| 30 | +- One to `tile-ai/tilelang` (supermodule half) |
| 31 | +- One to `TileLang/tvm` (where the vendored TVM fork lives, per `.gitmodules:url = https://github.com/TileLang/tvm`) |
| 32 | + |
| 33 | +`tilelang_metal_shared_dyn` is the no-op investigation artifact, not a |
| 34 | +code PR — stays as `docs/` evidence only. |
| 35 | + |
| 36 | +## Stacking dependency on jorgecurious's branch |
| 37 | + |
| 38 | +PRs #3-#6 explicitly state in their bodies they stack on `jorgecurious/tilelang:metal-gemm-upstream-rebase` (PR #2130) at HEAD `971c17b`. That branch in turn stacks on: |
| 39 | + |
| 40 | +- [PR #1869 (oraluben)](https://github.com/tile-ai/tilelang/pull/1869) — base Metal GEMM landing |
| 41 | +- [PR #2118 (cklxx)](https://github.com/tile-ai/tilelang/pull/2118) — Metal scalar fallback for T.gemm |
| 42 | +- [PR #2121 (SiriusNEO)](https://github.com/tile-ai/tilelang/pull/2121) — multi-backend CodeGen refactor |
| 43 | +- [PR #2130 (jorgecurious)](https://github.com/tile-ai/tilelang/pull/2130) — metal-gemm-upstream-rebase |
| 44 | + |
| 45 | +All four are currently OPEN at time of filing (2026-05-04). When upstream merges that 4-PR chain into `tile-ai/tilelang:main`, our PRs #2139-#2142 become independently reviewable against main. |
| 46 | + |
| 47 | +## What each PR depends on (textually, for stacked review) |
| 48 | + |
| 49 | +``` |
| 50 | + ┌─────────────────────────────────────────────────────┐ |
| 51 | + │ Apple Metal landing chain (existing OPEN PRs) │ |
| 52 | + │ #1869 → #2118 → #2121 → #2130 │ |
| 53 | + └─────────────────────────────────────────────────────┘ |
| 54 | + │ |
| 55 | + ▼ |
| 56 | + ┌─────────────────────────────────────────────────────┐ |
| 57 | + │ Our 4 filed PRs (independent of each other) │ |
| 58 | + │ #2139 mixed-dtype │ |
| 59 | + │ #2140 FP8 gemm software path │ |
| 60 | + │ #2141 pipelined 3D buffer │ |
| 61 | + │ #2142 T.fp8_scaled_matmul │ |
| 62 | + └─────────────────────────────────────────────────────┘ |
| 63 | +``` |
| 64 | + |
| 65 | +(The 4 are independent — they all branch from #2130 directly, not from each other. Independent review.) |
| 66 | + |
| 67 | +## Path C blocker tracker |
| 68 | + |
| 69 | +For future Path C unblocks see `docs/upstream/_path_c_blockers_tracker.md`. 5 future patches identified (A: 32×32 pipelined extension; B: fused FP8 scaled scheduler; C: e8m0 blockscaled DSL; D: simdgroup_reduce primitive; E: chunked bwd language extension). |
0 commit comments