Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
pointing_up.jpg	pointing_up.jpg
vision_litellm_backend.py	vision_litellm_backend.py
vision_ollama_chat.py	vision_ollama_chat.py
vision_openai_examples.py	vision_openai_examples.py

Name

Last commit message

Last commit date

README.md

pointing_up.jpg

vision_litellm_backend.py

vision_ollama_chat.py

vision_openai_examples.py

Vision/Multimodal Examples

This directory contains examples for working with vision-language models that can process both images and text.

Files

vision_ollama_chat.py

Demonstrates using vision models through Ollama backend with chat interface.

Key Features:

Loading and processing images
Using vision models for image understanding
Chat-based interaction with images

vision_openai_examples.py

Shows how to use OpenAI-compatible vision models (including local VLLM servers).

vision_litellm_backend.py

Examples using LiteLLM backend for vision model access.

Supporting Files

pointing_up.jpg

Sample image used in the examples for testing vision capabilities.

Concepts Demonstrated

Multimodal Input: Combining text and images in prompts
Vision Understanding: Asking questions about image content
Backend Flexibility: Using different backends (Ollama, OpenAI, LiteLLM) for vision
Image Processing: Loading and formatting images for LLM consumption

Basic Usage

from mellea import start_session
from mellea.stdlib.components import Message

# Load image
with open("pointing_up.jpg", "rb") as f:
    image_data = f.read()

# Create session with vision model
m = start_session(model_id="llava:7b")

# Ask about the image
response = m.chat(
    Message(
        role="user",
        content="What do you see in this image?",
        images=[image_data]
    )
)

Supported Models

Ollama: granite3.2-vision, llava, bakllava, llava-phi3, moondream, qwen2.5vl:7b
OpenAI: gpt-4-vision-preview, gpt-4o
LiteLLM: Various vision models through unified interface

Prerequisites

Pull a vision-capable model before running these examples:

ollama pull granite3.2-vision    # ~2.4 GB — primary recommended model
ollama pull qwen2.5vl:7b         # ~4.7 GB — used in vision_openai_examples.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Vision/Multimodal Examples

Files

vision_ollama_chat.py

vision_openai_examples.py

vision_litellm_backend.py

Supporting Files

pointing_up.jpg

Concepts Demonstrated

Basic Usage

Supported Models

Prerequisites

Related Documentation

FilesExpand file tree

image_text_models

Directory actions

More options

Directory actions

More options

Latest commit

History

image_text_models

Folders and files

parent directory

README.md

Vision/Multimodal Examples

Files

vision_ollama_chat.py

vision_openai_examples.py

vision_litellm_backend.py

Supporting Files

pointing_up.jpg

Concepts Demonstrated

Basic Usage

Supported Models

Prerequisites

Related Documentation