Commit ebaa187
Eliminate GPU sync overhead and CPU→GPU transfers across LTX2 pipeline (#13564)
* Remove unnecessary CUDA synchronization points and avoid CPU→GPU tensor creation
across the LTX2 pipeline, transformer, scheduler, and connector logic.
- Add set_begin_index(0) to schedulers to eliminate DtoH sync in _init_step_index
- Replace torch.tensor(..., device=...) with on-device tensor construction for decode scaling
- Move RoPE-related tensor creation to GPU to avoid memcpy overhead
- Refactor connector padding logic using vectorized masking instead of list-based ops
* Apply style fixes
* Revert low-impact CUDA synchronization changes and remove redundant `hasattr` check
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>1 parent 4ca8633 commit ebaa187
2 files changed
Lines changed: 13 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | 5 | | |
7 | 6 | | |
8 | 7 | | |
| |||
295 | 294 | | |
296 | 295 | | |
297 | 296 | | |
298 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
299 | 300 | | |
300 | 301 | | |
301 | 302 | | |
302 | 303 | | |
303 | 304 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
311 | 309 | | |
312 | | - | |
313 | | - | |
| 310 | + | |
| 311 | + | |
314 | 312 | | |
315 | 313 | | |
316 | 314 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1189 | 1189 | | |
1190 | 1190 | | |
1191 | 1191 | | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
1192 | 1196 | | |
1193 | 1197 | | |
1194 | 1198 | | |
| |||
0 commit comments