Skip to content

Add clip-vit-large-patch14 model.safetensors to supported weights (unblocks XLabs Flux IP-Adapter) #331

Description

@ianrothfuss

Problem

The XLabs LoadFluxIPAdapter node (from x-flux-comfyui) requires a 768-dim CLIP ViT-L/14 vision encoder. cog-comfyui currently ships clip-vit-large-patch14.bin in supported weights, but PyTorch 2.6 rejects .bin (pickle) files with weights_only=True:

_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the 
default value of the `weights_only` argument in `torch.load` from `False` to `True`.
WeightsUnpickler error: Unsupported operand 168

The code path is:

x-flux-comfyui/nodes.py:517 → load_clip_vision(path_clip)
  → comfy/clip_vision.py:160 → torch.load(ckpt, weights_only=True)

The only .safetensors CLIP vision models available (clip_vision_h.safetensors, IPAdapter_image_encoder_sd15.safetensors) output 1024-dim embeddings, causing a dimension mismatch:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1024 and 768x16384)

Request

Please add the safetensors version of openai/clip-vit-large-patch14 to supported weights:

This is the safetensors equivalent of the clip-vit-large-patch14.bin you already host — same model, safe format that works with PyTorch 2.6+.

Impact

This unblocks the XLabs Flux IP-Adapter (both v1 and v2) for all cog-comfyui users. IP-Adapter is essential for image-conditioned generation (e.g., clothing/product transfer in e-commerce photography). Currently the only workaround is to strip IP-Adapter from the workflow entirely.

Environment

  • cog-comfyui with x-flux-comfyui custom node
  • flux-ip-adapter.safetensors (v1) or flux-ip-adapter-v2.safetensors (v2) in xlabs/ipadapters/
  • PyTorch 2.6+ (the version in current cog-comfyui builds)

Reproduction

  1. Create a workflow using LoadFluxIPAdapter + ApplyFluxIPAdapter nodes
  2. Set clip_vision to clip-vit-large-patch14.bin → fails with pickle error
  3. Set clip_vision to clip_vision_h.safetensors → fails with dimension mismatch (1024 vs 768)

No available weight in the current supported list provides a 768-dim .safetensors CLIP vision encoder.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions