Skip to content

Commit 3b12a0b

Browse files
committed
update
1 parent 4505645 commit 3b12a0b

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

src/diffusers/models/_modeling_parallel.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ class ContextParallelConfig:
5050
for long sequences with limited memory/bandwidth. Number of devices to use for ring attention within a
5151
context parallel region. Must be a divisor of the total number of devices in the context parallel mesh.
5252
ulysses_degree (`int`, *optional*, defaults to `1`):
53-
Number of devices to use for Ulysses Attention. Sequence split across devices. Each device computes local
54-
QKV, then all-gathers all KV chunks to compute full attention in one pass. Higher memory (stores all KV),
55-
requires high-bandwidth all-to-all communication, but lower latency. Best for moderate sequences with good
56-
interconnect bandwidth.
53+
Number of devices to use for Ulysses Attention. Sequence split is across devices. Each device computes
54+
local QKV, then all-gathers all KV chunks to compute full attention in one pass. Higher memory (stores all
55+
KV), requires high-bandwidth all-to-all communication, but lower latency. Best for moderate sequences with
56+
good interconnect bandwidth.
5757
convert_to_fp32 (`bool`, *optional*, defaults to `True`):
5858
Whether to convert output and LSE to float32 for ring attention numerical stability.
5959
rotate_method (`str`, *optional*, defaults to `"allgather"`):

0 commit comments

Comments
 (0)