Skip to content

Fix transformer sharding and cross-attention flash block sizes #2957

Fix transformer sharding and cross-attention flash block sizes

Fix transformer sharding and cross-attention flash block sizes #2957