Commit 3fffa55
[PyTorch] Debug CPU offloading in grouped linear and grouped MLP (#3047)
* Support selective offload for fused grouped MLP
Signed-off-by: hongbinl <hongbinl@nvidia.com>
* Add no_offload_activation to grouped MLP ops
Signed-off-by: hongbinl <hongbinl@nvidia.com>
* Use offload_activation API for activation offload control
Signed-off-by: hongbinl <hongbinl@nvidia.com>
* Fix CPU offloading correctness in ops layer
- Revert per-module offload_activation API added in commits 376d28c
and 933d64b; that belongs in a separate PR.
- ops/basic/grouped_linear: add start_offload on input tensors before
the GEMM, and mark_activation_offload / mark_not_offload in
fuser_forward_save_ctx for both the split-quantize and grouped-tensor
paths.
- ops/fused/forward_grouped_mlp: remove no_offload_activation attribute
lookups and the activation mark_not_offload calls that gated on them;
add start_offload + mark_activation_offload for all saved activation
tensors (grouped_fc1_x, activation_in, saved_grouped_fc2_x) and keep
mark_not_offload only for weight tensors. Document why grouped_fc1_x
is repacked into GroupedTensorStorage.
- ops/basic/basic_linear: no change needed beyond the existing
mark_activation_offload — unlike te.Linear there is no persistent
weight cache, so the quantized weight workspace can be freely
offloaded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Construct internal grouped tensors within grouped linear and grouped MLP
GroupedTensor should only be used when exposed externally. Otherwise GroupedTensorStorage has less CPU overhead. There also seems to be some issue with CPU offloading that has not yet been root-caused.
Signed-off-by: Tim Moon <tmoon@nvidia.com>
---------
Signed-off-by: hongbinl <hongbinl@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>1 parent 0dd1af2 commit 3fffa55
5 files changed
Lines changed: 108 additions & 20 deletions
File tree
- transformer_engine/pytorch
- module
- ops
- basic
- fused
- tensor/storage
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
| |||
135 | 138 | | |
136 | 139 | | |
137 | 140 | | |
138 | | - | |
139 | | - | |
140 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
141 | 144 | | |
142 | 145 | | |
143 | 146 | | |
| |||
154 | 157 | | |
155 | 158 | | |
156 | 159 | | |
157 | | - | |
| 160 | + | |
158 | 161 | | |
159 | 162 | | |
160 | 163 | | |
161 | 164 | | |
162 | 165 | | |
163 | | - | |
| 166 | + | |
164 | 167 | | |
165 | 168 | | |
166 | 169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1050 | 1050 | | |
1051 | 1051 | | |
1052 | 1052 | | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
1053 | 1056 | | |
1054 | 1057 | | |
1055 | 1058 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
783 | 784 | | |
784 | 785 | | |
785 | 786 | | |
786 | | - | |
| 787 | + | |
787 | 788 | | |
788 | 789 | | |
789 | 790 | | |
| |||
800 | 801 | | |
801 | 802 | | |
802 | 803 | | |
803 | | - | |
| 804 | + | |
804 | 805 | | |
805 | 806 | | |
806 | 807 | | |
| |||
814 | 815 | | |
815 | 816 | | |
816 | 817 | | |
817 | | - | |
| 818 | + | |
818 | 819 | | |
819 | 820 | | |
820 | 821 | | |
| |||
866 | 867 | | |
867 | 868 | | |
868 | 869 | | |
869 | | - | |
870 | | - | |
| 870 | + | |
| 871 | + | |
871 | 872 | | |
872 | 873 | | |
873 | 874 | | |
| |||
888 | 889 | | |
889 | 890 | | |
890 | 891 | | |
891 | | - | |
| 892 | + | |
892 | 893 | | |
893 | 894 | | |
894 | 895 | | |
| |||
1026 | 1027 | | |
1027 | 1028 | | |
1028 | 1029 | | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
1029 | 1049 | | |
1030 | 1050 | | |
1031 | 1051 | | |
| |||
1110 | 1130 | | |
1111 | 1131 | | |
1112 | 1132 | | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
1113 | 1137 | | |
1114 | 1138 | | |
1115 | 1139 | | |
| |||
1205 | 1229 | | |
1206 | 1230 | | |
1207 | 1231 | | |
1208 | | - | |
| 1232 | + | |
1209 | 1233 | | |
1210 | 1234 | | |
1211 | 1235 | | |
| |||
1215 | 1239 | | |
1216 | 1240 | | |
1217 | 1241 | | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
1218 | 1245 | | |
1219 | 1246 | | |
1220 | 1247 | | |
| |||
1238 | 1265 | | |
1239 | 1266 | | |
1240 | 1267 | | |
1241 | | - | |
| 1268 | + | |
1242 | 1269 | | |
1243 | 1270 | | |
1244 | 1271 | | |
| |||
1566 | 1593 | | |
1567 | 1594 | | |
1568 | 1595 | | |
1569 | | - | |
| 1596 | + | |
1570 | 1597 | | |
1571 | 1598 | | |
1572 | 1599 | | |
| |||
1602 | 1629 | | |
1603 | 1630 | | |
1604 | 1631 | | |
1605 | | - | |
| 1632 | + | |
1606 | 1633 | | |
1607 | 1634 | | |
1608 | 1635 | | |
| |||
Lines changed: 44 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
316 | 318 | | |
317 | 319 | | |
318 | 320 | | |
| 321 | + | |
319 | 322 | | |
320 | 323 | | |
321 | 324 | | |
322 | 325 | | |
323 | 326 | | |
324 | 327 | | |
325 | 328 | | |
326 | | - | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
327 | 359 | | |
328 | 360 | | |
329 | 361 | | |
| |||
587 | 619 | | |
588 | 620 | | |
589 | 621 | | |
590 | | - | |
| 622 | + | |
591 | 623 | | |
592 | 624 | | |
593 | 625 | | |
| |||
616 | 648 | | |
617 | 649 | | |
618 | 650 | | |
619 | | - | |
| 651 | + | |
620 | 652 | | |
621 | 653 | | |
622 | 654 | | |
| |||
695 | 727 | | |
696 | 728 | | |
697 | 729 | | |
| 730 | + | |
698 | 731 | | |
699 | 732 | | |
700 | 733 | | |
| |||
716 | 749 | | |
717 | 750 | | |
718 | 751 | | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
719 | 759 | | |
720 | 760 | | |
721 | 761 | | |
| |||
755 | 795 | | |
756 | 796 | | |
757 | 797 | | |
758 | | - | |
| 798 | + | |
759 | 799 | | |
760 | 800 | | |
761 | 801 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
387 | 387 | | |
388 | 388 | | |
389 | 389 | | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
390 | 405 | | |
391 | 406 | | |
392 | 407 | | |
| |||
0 commit comments