Commit 9d11e85
Add out parameter to kbit_scalar_gemv_tiled for CUDA graph compat
Adds kbit_scalar_gemv_tiled_ op that writes to a pre-allocated output
buffer, eliminating the allocate+copy in the kbit_linear dispatch path.
The CUDA kernel already accepted an output pointer — this just wires
it through the torch.library op layer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 8eadb50 commit 9d11e85
3 files changed
+67
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
783 | 783 | | |
784 | 784 | | |
785 | 785 | | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1332 | 1332 | | |
1333 | 1333 | | |
1334 | 1334 | | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1299 | 1299 | | |
1300 | 1300 | | |
1301 | 1301 | | |
1302 | | - | |
1303 | | - | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
| 1302 | + | |
| 1303 | + | |
| 1304 | + | |
1307 | 1305 | | |
1308 | 1306 | | |
1309 | 1307 | | |
| |||
0 commit comments