Skip to content

Commit 811393c

Browse files
committed
docs(upstream): receipt of 6 PRs filed 2026-05-04 by apstenku123
Files documenting the actual PRs we just opened upstream: - PR #1: ml-explore/mlx#3476 — from_dlpack Metal-aware consumer (against main, clean) - PR #2: apache/tvm#19504 — TVM_METAL_STORAGE_MODE env opt-in (against main, clean) - PR #3: tile-ai/tilelang#2139 — mixed-dtype T.gemm via scalar fallback (stacks on PR #2130) - PR #4: tile-ai/tilelang#2140 — FP8-input T.gemm scalar fallback routing (stacks on PR #2130) - PR #5: tile-ai/tilelang#2141 — T.Pipelined num_stages>1 3D buffer fix (stacks on PR #2130) - PR #6: tile-ai/tilelang#2142 — T.fp8_scaled_matmul DSL intrinsic (stacks on PR #2130) Deferred (split into companion PRs needed): tilelang_metal_fp8 and tilelang_metal_fp8_vector each touch both tilelang supermodule and the TileLang/tvm vendored submodule. These need 2 PRs each — one to tile-ai/tilelang, one to TileLang/tvm — separate filing round. PRs #3-#6 are independent of each other; each branches directly from jorgecurious/tilelang:metal-gemm-upstream-rebase HEAD 971c17b, so they can be reviewed in any order. They DO depend on the upstream 4-PR Apple Metal landing chain (#1869, #2118, #2121, #2130) merging first; if any of those land separately, ours can be retargeted at main.
1 parent 8a1b3bb commit 811393c

1 file changed

Lines changed: 69 additions & 0 deletions

File tree

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Filed upstream PRs — 2026-05-04
2+
3+
GitHub user: `apstenku123` (David Gornshtein, davidgornshtein@gmail.com)
4+
5+
## Filed (6 of 9 artifacts)
6+
7+
| # | Repo | PR | Title | Status |
8+
|---|---|---|---|---|
9+
| 1 | `ml-explore/mlx` | [#3476](https://github.com/ml-explore/mlx/pull/3476) | [Python] Add mx.from_dlpack(obj) Metal-aware consumer | filed against `main` |
10+
| 2 | `apache/tvm` | [#19504](https://github.com/apache/tvm/pull/19504) | [Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers | filed against `main` |
11+
| 3 | `tile-ai/tilelang` | [#2139](https://github.com/tile-ai/tilelang/pull/2139) | [Metal] allow mixed-dtype T.gemm via scalar fallback | filed against `main`, **stacks on PR #2130** |
12+
| 4 | `tile-ai/tilelang` | [#2140](https://github.com/tile-ai/tilelang/pull/2140) | [Metal] route FP8-input T.gemm to scalar fallback | filed against `main`, **stacks on PR #2130** |
13+
| 5 | `tile-ai/tilelang` | [#2141](https://github.com/tile-ai/tilelang/pull/2141) | [Metal] thread stage dim through T.access_ptr for T.Pipelined num_stages>1 | filed against `main`, **stacks on PR #2130** |
14+
| 6 | `tile-ai/tilelang` | [#2142](https://github.com/tile-ai/tilelang/pull/2142) | tilelang: T.fp8_scaled_matmul DSL intrinsic + Metal lowering | filed against `main`, **stacks on PR #2130** |
15+
16+
## Deferred (split needed: tilelang supermodule + TileLang/tvm submodule)
17+
18+
These two patches modify both `src/target/codegen_metal.{cc,h}` (tilelang
19+
supermodule) and `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}`
20+
(vendored TileLang/tvm submodule). They cannot be filed as a single
21+
tilelang PR because the supermodule and submodule have separate review
22+
chains. Split into companion PRs needed:
23+
24+
| Artifact | Tilelang side | TileLang/tvm side |
25+
|---|---|---|
26+
| `tilelang_metal_fp8` | `src/target/codegen_metal.{cc,h}` (storage-only FP8 helpers) | `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}` (mirror) |
27+
| `tilelang_metal_fp8_vector` | `src/target/codegen_metal.{cc,h}` (vector cast lanes 2/3/4) | `3rdparty/tvm/src/target/source/codegen_metal.{cc,h}` (mirror) |
28+
29+
To file: each becomes 2 PRs:
30+
- One to `tile-ai/tilelang` (supermodule half)
31+
- One to `TileLang/tvm` (where the vendored TVM fork lives, per `.gitmodules:url = https://github.com/TileLang/tvm`)
32+
33+
`tilelang_metal_shared_dyn` is the no-op investigation artifact, not a
34+
code PR — stays as `docs/` evidence only.
35+
36+
## Stacking dependency on jorgecurious's branch
37+
38+
PRs #3-#6 explicitly state in their bodies they stack on `jorgecurious/tilelang:metal-gemm-upstream-rebase` (PR #2130) at HEAD `971c17b`. That branch in turn stacks on:
39+
40+
- [PR #1869 (oraluben)](https://github.com/tile-ai/tilelang/pull/1869) — base Metal GEMM landing
41+
- [PR #2118 (cklxx)](https://github.com/tile-ai/tilelang/pull/2118) — Metal scalar fallback for T.gemm
42+
- [PR #2121 (SiriusNEO)](https://github.com/tile-ai/tilelang/pull/2121) — multi-backend CodeGen refactor
43+
- [PR #2130 (jorgecurious)](https://github.com/tile-ai/tilelang/pull/2130) — metal-gemm-upstream-rebase
44+
45+
All four are currently OPEN at time of filing (2026-05-04). When upstream merges that 4-PR chain into `tile-ai/tilelang:main`, our PRs #2139-#2142 become independently reviewable against main.
46+
47+
## What each PR depends on (textually, for stacked review)
48+
49+
```
50+
┌─────────────────────────────────────────────────────┐
51+
│ Apple Metal landing chain (existing OPEN PRs) │
52+
│ #1869 → #2118 → #2121 → #2130 │
53+
└─────────────────────────────────────────────────────┘
54+
55+
56+
┌─────────────────────────────────────────────────────┐
57+
│ Our 4 filed PRs (independent of each other) │
58+
│ #2139 mixed-dtype │
59+
│ #2140 FP8 gemm software path │
60+
│ #2141 pipelined 3D buffer │
61+
│ #2142 T.fp8_scaled_matmul │
62+
└─────────────────────────────────────────────────────┘
63+
```
64+
65+
(The 4 are independent — they all branch from #2130 directly, not from each other. Independent review.)
66+
67+
## Path C blocker tracker
68+
69+
For future Path C unblocks see `docs/upstream/_path_c_blockers_tracker.md`. 5 future patches identified (A: 32×32 pipelined extension; B: fused FP8 scaled scheduler; C: e8m0 blockscaled DSL; D: simdgroup_reduce primitive; E: chunked bwd language extension).

0 commit comments

Comments
 (0)