You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TPU v7x exposes dual chiplets as two JAX devices. For `ulysses_ring`, expose only the total sequence sharding through `context`; the attention kernel derives a private ring and Ulysses split from that axis.
191
+
TPU v7x exposes dual chiplets as two JAX devices. For `ulysses_ring`, expose only the total sequence sharding through `context`; the attention kernel derives a private ring and Ulysses split from that axis. To tune that split explicitly, set `ulysses_ring_ulysses_parallelism`; ring shards are derived as `ici_context_parallelism / ulysses_ring_ulysses_parallelism`.
192
192
-`4x4` uses tensor `4`, so the dual-chip pairing is still inside the Ulysses side.
193
193
194
194
The plain `ring` baseline has no Ulysses group, so it cannot preserve that property by construction.
Copy file name to clipboardExpand all lines: docs/tpu_wan_bench_guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,7 +143,7 @@ Set in `src/maxdiffusion/configs/base_wan_27b.yml` or overridden on the command
143
143
**Parallelism rule**: product of all ICI axes must equal 8 (chips per host):
144
144
-`ici_dp × ici_fsdp × ici_cp × ici_tp = 8`
145
145
146
-
For `ulysses_ring`, set the desired total sequence shards with `ici_context_parallelism`; the internal ring and Ulysses split is selected by the attention kernel.
146
+
For `ulysses_ring`, set the desired total sequence shards with `ici_context_parallelism`; the internal ring and Ulysses split is selected by the attention kernel. To tune it manually, set `ulysses_ring_ulysses_parallelism=<ulysses_shards>` and the ring shard count is derived as `ici_context_parallelism / ulysses_ring_ulysses_parallelism`.
0 commit comments