Skip to content

Poor performance of image embedding model #111

@bmachek

Description

@bmachek

Hi there,

I thinking of using uform as a replacement in my application for CLIP.
Since uform supports onnx out of the box it would be a great addition to my existing onnx based stack.

However performance seems bad on my Mac M4 Pro 24GB.

I'm using the following code for generating a lot of image embeddings:

def generate_embeddings():
    """ PHASE 1: Generate and save embeddings using the UForm model. """
    if os.path.exists(EMBEDDINGS_FILE):
        print(f"Embeddings file already exists at {EMBEDDINGS_FILE}. Skipping.")
        return

    print("--- Starting Phase 1: Embedding Generation (UForm) ---")

    processors, models = get_model(
            'unum-cloud/uform3-image-text-multilingual-base',
            device=None,
            modalities=[Modality.IMAGE_ENCODER],
            backend="onnx",
        )
    
    model_image = models[Modality.IMAGE_ENCODER]
    processor_image = processors[Modality.IMAGE_ENCODER]
    
    embedding_dim = 256
    ava_dataset = AVADataset(AVA_LABELS_FILE, AVA_IMAGES_DIR)
    
    all_image_paths, all_scores_list, all_genres_list = [], [], []
    for path, score, genres in tqdm(ava_dataset, desc="Collecting valid dataset items"):
        all_image_paths.append(path)
        all_scores_list.append(score)
        all_genres_list.append(genres)

    print(f"Generating embeddings for {len(all_image_paths)} images...")
    
    all_embeds = []
    
    for i in tqdm(range(0, len(all_image_paths), EMBEDDING_BATCH_SIZE), desc="Generating embeddings in batches"):
        batch_paths = all_image_paths[i:i+EMBEDDING_BATCH_SIZE]
        batch_images = []
        for image_path in batch_paths:
            try:
                image = Image.open(image_path).convert("RGB")
                batch_images.append(image)
            except Exception as e:
                print(f"Error processing image {image_path}: {e}")
                continue

        if not batch_images:
            continue

        image_data = processor_image(batch_images)
        # The model returns features and pooled embeddings, we use the embeddings
        _, image_embeddings = model_image.encode(image_data, return_features=True)
        all_embeds.extend(image_embeddings)

    all_embeds = np.array(all_embeds)

    print(f"Saving {len(all_embeds)} items to {EMBEDDINGS_FILE}...")
    np.savez_compressed(
        EMBEDDINGS_FILE,
        embeddings=all_embeds,
        scores=np.array(all_scores_list),
        genres=np.array(all_genres_list),
        embedding_dim=embedding_dim)
    print("--- Phase 1 Finished ---")

I tried different EMBEDDING_BATCH_SIZE from 1-256, but I cannot seem to get past generating ~ 1 embedding/s.
The images from the AVA dataset are small so to my understanding the process should be faster. With open clip I got speeds from 8 - 16 emb/s with similar sized models.

This an example output from my script:

--- Starting Phase 1: Embedding Generation (UForm) ---
2025-10-09 09:35:34.196 python[39166:1255416] 2025-10-09 09:35:34.196442 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 112 number of nodes in the graph: 1056 number of nodes supported by CoreML: 739
Loading AVA labels...
Collecting valid dataset items: 255508it [00:01, 243766.45it/s]
Generating embeddings for 255508 images...
Generating embeddings in batches:   0%|                                                                                                                                              | 0/31939 [00:00<?, ?it/s]Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Generating embeddings in batches:   0%|                                                                                                                                   | 3/31939 [00:05<15:00:16,  1.69s/it]

As you can see CoreML is used, which is fine for my Mac. If I look at asitop, I can see only the CPU cores from my M4 are used, no ANE and no GPU load is generated.

Any ideas? (aka Help me Obi-wan Kenobi) 😄

Should the CoreML / ONNX warnings give me a hint?

Am I doing sth. wrong?

Best regards,
Bastian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions