⚠️ SeedVR2 optimizations check: Flash Attention ❌ | Triton ❌
💡 For best performance: pip install flash-attn triton
...
got prompt
[13:40:20.649]
[13:40:20.650] ╔══════════════════════════════════════════════════════════╗
[13:40:20.651] ║ ███████ ███████ ███████ ██████ ██ ██ ██████ ███████ ║
[13:40:20.651] ║ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ║
[13:40:20.651] ║ ███████ █████ █████ ██ ██ ██ ██ ██████ █████ ║
[13:40:20.652] ║ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ║
[13:40:20.652] ║ ███████ ███████ ███████ ██████ ████ ██ ██ ███████ ║
[13:40:20.652] ║ v2.5.10 © ByteDance Seed · NumZ · AInVFX ║
[13:40:20.652] ╚══════════════════════════════════════════════════════════╝
[13:40:20.652]
[13:40:20.653] 🔧 Validating seedvr2_ema_3b-Q3_K_M.gguf...
[13:40:32.326] 🏃 Creating new runner: DiT=seedvr2_ema_3b-Q3_K_M.gguf, VAE=ema_vae_fp16.safetensors
[13:40:32.417] 🚀 Creating DiT model structure on meta device
[13:40:32.966] 🎨 Creating VAE model structure on meta device
[13:40:33.363]
[13:40:33.364] 🎬 Starting upscaling generation...
[13:40:33.364] 🎬 Input: 1 frame, 512x512px → Output: 576x576px (shortest edge: 576px)
[13:40:33.364] 🎬 Batch size: 1, Seed: 42, Channels: RGB
[13:40:33.364]
[13:40:33.365] ━━━━━━━━ Phase 1: VAE encoding ━━━━━━━━
[13:40:33.365] 🎨 Materializing VAE weights to CPU (offload device): /home/hum/comfy-0_3_68/ComfyUI-0.3.68/models/SEEDVR2/ema_vae_fp16.safetensors
[13:40:36.782] 🎨 Encoding batch 1/1
[13:40:36.814] 📹 Sequence of 1 frames
[13:40:40.843]
[13:40:40.843] ━━━━━━━━ Phase 2: DiT upscaling ━━━━━━━━
[13:40:40.863] 🚀 Materializing DiT weights to CPU (offload device): /home/hum/comfy-0_3_68/ComfyUI-0.3.68/models/SEEDVR2/seedvr2_ema_3b-Q3_K_M.gguf
[13:40:41.289] 🔀 BlockSwap: 32 transformer blocks + I/O components offloaded to CPU
[13:40:41.360] 🎬 Upscaling batch 1/1
EulerSampler: 0%| | 0/1 [00:00<?, ?it/s][13:40:42.822] ⚠️ [WARNING] Flash Attention failed for blocks.0.attn.attn, using original: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
[13:40:42.824] ⚠️ [WARNING] Flash Attention failed for blocks.0.attn, using original: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
[13:40:42.884] ⚠️ [WARNING] Flash Attention failed for blocks.0.attn.attn, using original: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
[13:40:42.887] ❌ [ERROR] Forward pass error: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
[13:40:42.887] ℹ️ torch.float16 model - no conversion applied
[13:40:42.887] ❌ [ERROR] Error in Phase 2 (Upscaling): CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
!!! Exception during processing !!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Traceback (most recent call last):
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 437, in flash_attention_forward
return self._sdpa_attention_forward(original_forward, module, *args, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 472, in _sdpa_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 154, in forward
return pytorch_varlen_attention(
q, k, v, cu_seqlens_q, cu_seqlens_k,
max_seqlen_q, max_seqlen_k, **kwargs
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 50, in pytorch_varlen_attention
output_i = F.scaled_dot_product_attention(
q_i, k_i, v_i,
dropout_p=dropout_p if not deterministic else 0.0,
is_causal=causal
)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 437, in flash_attention_forward
return self._sdpa_attention_forward(original_forward, module, *args, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 472, in _sdpa_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/nablocks/attention/mmattn.py", line 245, in forward
out = self.attn(
~~~~~~~~~^
q=concat_win(vid_q, txt_q),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<9 lines>...
max_seqlen_k=cache_win("vid_max_seqlen_k", lambda: all_len_win.max()),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
).type_as(vid_q)
^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 441, in flash_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 154, in forward
return pytorch_varlen_attention(
q, k, v, cu_seqlens_q, cu_seqlens_k,
max_seqlen_q, max_seqlen_k, **kwargs
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 50, in pytorch_varlen_attention
output_i = F.scaled_dot_product_attention(
q_i, k_i, v_i,
dropout_p=dropout_p if not deterministic else 0.0,
is_causal=causal
)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 437, in flash_attention_forward
return self._sdpa_attention_forward(original_forward, module, *args, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 472, in _sdpa_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 154, in forward
return pytorch_varlen_attention(
q, k, v, cu_seqlens_q, cu_seqlens_k,
max_seqlen_q, max_seqlen_k, **kwargs
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 50, in pytorch_varlen_attention
output_i = F.scaled_dot_product_attention(
q_i, k_i, v_i,
dropout_p=dropout_p if not deterministic else 0.0,
is_causal=causal
)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/execution.py", line 510, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/execution.py", line 324, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/execution.py", line 298, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/execution.py", line 286, in process_inputs
result = f(**inputs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/comfy_api/internal/__init__.py", line 149, in wrapped_func
return method(locked_class, **inputs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/comfy_api/latest/_io.py", line 1270, in EXECUTE_NORMALIZED
to_return = cls.execute(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/interfaces/video_upscaler.py", line 569, in execute
raise e
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/interfaces/video_upscaler.py", line 481, in execute
ctx = upscale_all_batches(
runner,
...<5 lines>...
cache_model=dit_cache
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/core/generation_phases.py", line 715, in upscale_all_batches
upscaled_latents = runner.inference(
noises=noises,
conditions=conditions,
**ctx['text_embeds'],
)
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/core/infer.py", line 365, in inference
latents = self.sampler.sample(
x=latents,
...<22 lines>...
),
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/common/diffusion/samplers/euler.py", line 61, in sample
pred = f(SamplerModelArgs(x, t, i))
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/core/infer.py", line 367, in <lambda>
f=lambda args: classifier_free_guidance_dispatcher(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
pos=lambda: self.dit(
^^^^^^^^^^^^^^^^^^^^^
...<19 lines>...
rescale=self.config.diffusion.cfg.rescale,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
),
^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/common/diffusion/utils.py", line 76, in classifier_free_guidance_dispatcher
return pos()
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/core/infer.py", line 368, in <lambda>
pos=lambda: self.dit(
~~~~~~~~^
vid=torch.cat([args.x_t, latents_cond], dim=-1),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
timestep=args.t.repeat(batch_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
).vid_sample,
^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 598, in forward
return self.dit_model(*args, **kwargs)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/nadit.py", line 222, in forward
vid, txt, vid_shape, txt_shape = gradient_checkpointing(
~~~~~~~~~~~~~~~~~~~~~~^
enabled=(self.gradient_checkpointing and self.training),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<6 lines>...
cache=cache,
^^^^^^^^^^^^
)
^
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/nadit.py", line 32, in gradient_checkpointing
return module(*args, **kwargs)
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/blockswap.py", line 438, in wrapped_forward
output = original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/nablocks/mmsr_block.py", line 112, in forward
vid_attn, txt_attn = self.attn(vid_attn, txt_attn, vid_shape, txt_shape, cache)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 441, in flash_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/nablocks/attention/mmattn.py", line 245, in forward
out = self.attn(
~~~~~~~~~^
q=concat_win(vid_q, txt_q),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<9 lines>...
max_seqlen_k=cache_win("vid_max_seqlen_k", lambda: all_len_win.max()),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
).type_as(vid_q)
^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/hum/comfy-0_3_68/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/optimization/compatibility.py", line 441, in flash_attention_forward
return original_forward(*args, **kwargs)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 154, in forward
return pytorch_varlen_attention(
q, k, v, cu_seqlens_q, cu_seqlens_k,
max_seqlen_q, max_seqlen_k, **kwargs
)
File "/home/hum/comfy-0_3_68/ComfyUI-0.3.68/custom_nodes/ComfyUI-SeedVR2-VideoUpscaler/src/models/dit_3b/attention.py", line 50, in pytorch_varlen_attention
output_i = F.scaled_dot_product_attention(
q_i, k_i, v_i,
dropout_p=dropout_p if not deterministic else 0.0,
is_causal=causal
)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Prompt executed in 23.46 seconds
EulerSampler: 0%| | 0/1 [00:03<?, ?it/s]
Trying to upscale a single image with an old low-spec system, no luck getting Phase 2 to work at all. Tried with a Q3_K_M and the official Q4_K_M of SeedVR2, with and without offload to CPU, it's the same error every time.
Error log of an attempt with Q3, full offload
Partial log of another try with Q4 and no offload
nvidia-smi
Some relevant installed packages
Pytorch config
My PyTorch is a bit of a hacky thing due to the official builds not supporting systems without AVX, However, I am able to run most models in ComfyUI without any problems; Flux/Chroma, Qwen-Image-Edit, Wan 2.2 5B/14B all work, and so do LLMs, VLMs, LoRA training, image upscalers, detector/detailers, etc.
So far only FlashVSR has been a no-go because block-sparse-attention or something required newer compute, and now I don't know about SeedVR2, is my hardware entirely insufficient for it or should I build a new PyTorch and try enabling some library which is now disabled (fbgemm/xnnpack/mkl-dnn)?
Or is there some SDPA implementation which I could try to enable or disable according to the following?
From the comments in torch/nn/functional.py