The causal (FAR) 3D Transformer used by AnyFlowFARPipeline —
the FAR variant of AnyFlow (Yuchao Gu, Guian Fang et al., NUS
ShowLab × NVIDIA). It extends the v0.35.1 Wan2.1 backbone with three additions:
- FAR causal block-mask via
torch.nn.attention.flex_attention, supporting frame-level autoregressive generation as introduced in FAR (Gu et al., 2025). - Compressed-frame patch embedding (
far_patch_embedding) for context (already-generated) frames, warm-started from the full-resolutionpatch_embeddingat construction time via trilinear interpolation. - Dual-timestep flow-map embedding (same as
AnyFlowTransformer3DModel) — every forward call conditions on both the source timesteptand the target timestepr.
The chunk schedule (chunk_partition) is not baked into the model config. It is a per-call argument to
forward, so the same checkpoint handles different num_frames configurations without retraining.
from diffusers import AnyFlowFARTransformer3DModel
# Causal AnyFlow checkpoint (FAR):
transformer = AnyFlowFARTransformer3DModel.from_pretrained(
"nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", subfolder="transformer"
)[[autodoc]] AnyFlowFARTransformer3DModel
[[autodoc]] models.transformers.transformer_anyflow_far.AnyFlowFARTransformerOutput