This guide provides step-by-step instructions to quickly deploy and test the Multimodal Embedding Serving microservice.
Before you begin, confirm the following:
- System Requirements: Your system meets the minimum requirements.
- Docker Installed: Install Docker if needed. See Get Docker.
This guide assumes basic familiarity with Docker commands and terminal usage.
- EMBEDDING_MODEL_NAME - The model to use (e.g., "CLIP/clip-vit-b-16"). Refer to the Supported Models list for additional choices.
- EMBEDDING_DEVICE - Device for inference (CPU/GPU, default: CPU)
- EMBEDDING_USE_OV - Enable OpenVINO optimization (true/false, default: false)
- EMBEDDING_OV_MODELS_DIR - Directory for OpenVINO models (default: ./ov-models)
- INFER_BATCH_SIZE - Batch size for inference (default: 64). Compiles model to accept fixed batch input. Padding or split is done to accommodate dynamic input sizes.
- PREPROCESS_WORKERS - Number of parallel preprocessing workers (default: min(16, cpu_count * 2)). Higher is better but yields diminishing returns if > number of CPU cores.
These variables control the video frame extraction pipeline performance and memory usage.
- VIDEO_FRAME_BATCH_SIZE - Batch size for video frame extraction (default: 64)
- VIDEO_FRAME_DECODER_WORKERS - Number of workers for video frame decoding (default: 8)
- VIDEO_FRAME_QUEUE_SIZE - Queue size for frame extraction pipeline (default: 32)
- VIDEO_FRAME_SHM_POOL_BLOCK_SIZE - Shared memory block size in bytes (default: 192010803 = 6,220,800 bytes for 1080p RGB)
- VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER - Multiplier for total shared memory blocks (default: 2)
- Total blocks = VIDEO_FRAME_BATCH_SIZE × VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER
- VIDEO_FRAME_LOG_LEVEL - Logging level for video frame extraction (DEBUG/INFO/WARNING/ERROR/CRITICAL, default: INFO)
Set the required environment variables before launching the service.
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32Refer to the Supported Models list for additional choices.
NOTE: You can change the model, OpenVINO conversion, device, or tokenization parameters by editing
setup.sh.
export REGISTRY_URL=intel
export TAG=latestBasic CPU setup (default):
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32GPU acceleration with OpenVINO:
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32
export EMBEDDING_DEVICE=GPU
export EMBEDDING_USE_OV=trueHigh Performance Video Processing:
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32
export VIDEO_FRAME_BATCH_SIZE=256
export VIDEO_FRAME_DECODER_WORKERS=8
export VIDEO_FRAME_SHM_POOL_BLOCK_SIZE=$((1920 * 1080 * 3)) # 6MB for 1080p
export VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER=2
export INFER_BATCH_SIZE=64
export PREPROCESS_WORKERS=16Memory-Constrained Environment:
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32
export VIDEO_FRAME_BATCH_SIZE=64
export VIDEO_FRAME_DECODER_WORKERS=4
export VIDEO_FRAME_SHM_POOL_BLOCK_SIZE=$((1280 * 720 * 3)) # 2.8MB for 720p
export VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER=2With OpenVINO Optimization on GPU:
export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16
export EMBEDDING_USE_OV=true
export EMBEDDING_DEVICE=GPU
export INFER_BATCH_SIZE=64
export PREPROCESS_WORKERS=16Debug Mode with Detailed Logging:
export VIDEO_FRAME_LOG_LEVEL=INFO
export EMBEDDING_DEVICE=CPU- Increase
VIDEO_FRAME_BATCH_SIZE(trades memory for throughput) - Increase
VIDEO_FRAME_DECODER_WORKERS(limited by CPU cores) - Increase
VIDEO_FRAME_QUEUE_SIZEif frames are being dropped
- Decrease
VIDEO_FRAME_BATCH_SIZE - Decrease
VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER - Reduce
VIDEO_FRAME_SHM_POOL_BLOCK_SIZEif processing lower resolutions
- Increase
INFER_BATCH_SIZEandPREPROCESS_WORKERS - Enable OpenVINO:
EMBEDDING_USE_OV=true - Use GPU if available:
EMBEDDING_DEVICE=GPU
Set the environment with default values by running the below command. Note that this needs to be run anytime the environment variables are changed. For example: if running on GPU, additional environment variables will need to be set.
source setup.shYou can build the Docker image or pull a prebuilt image from the configured registry and tag. For prebuilt image, the setup script will configure the necessary variables to pull the right version of the image.
docker compose -f docker/compose.yaml up -dVerify the deployment by running the below command. The user should see a healthy status printed on the console.
curl --location --request GET 'http://localhost:9777/health'# Automatic GPU selection
export EMBEDDING_DEVICE=GPU
# Specific GPU index (if applicable)
export EMBEDDING_DEVICE=GPU.0source setup.shNote: When
EMBEDDING_DEVICE=GPUis set,setup.shapplies GPU-friendly defaults, including settingEMBEDDING_USE_OV=true.
docker compose -f docker/compose.yaml up -d# Check service health
curl --location --request GET 'http://localhost:9777/health'
# Inspect active model capabilities
curl --location --request GET 'http://localhost:9777/model/capabilities'docker compose -f docker/compose.yaml downThe following samples mirror the accompanying Postman collection. All requests target http://localhost:9777.
curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"input": {
"type": "text",
"text": "Sample input text1"
},
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float"
}'curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"input": {
"type": "text",
"text": ["Sample input text1", "Sample input text2"]
},
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float"
}'curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"input": {
"type": "image_url",
"image_url": "https://i.ytimg.com/vi/H_8J2YfMpY0/sddefault.jpg"
},
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float"
}'curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float",
"input": {
"type": "image_base64",
"image_base64": "<image base64 value here>"
}
}'curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float",
"input": {
"type": "video_frames",
"video_frames": [
{
"type": "image_url",
"image_url": "https://i.ytimg.com/vi/H_8J2YfMpY0/sddefault.jpg"
},
{
"type": "image_base64",
"image_base64": "<image base64 value here>"
}
]
}
}'curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float",
"input": {
"type": "video_url",
"video_url": "https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_10mb.mp4",
"segment_config": {
"startOffsetSec": 0,
"clip_duration": -1,
"num_frames": 64,
"frame_indexes": [1, 10, 20]
}
}
}'set num_frames: 0 to process all the frames.
curl --location 'http://localhost:9777/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"model": "CLIP/clip-vit-b-32",
"encoding_format": "float",
"input": {
"type": "video_base64",
"segment_config": {
"startOffsetSec": 0,
"clip_duration": -1,
"num_frames": 64
},
"video_base64": "<video base64 value here>"
}
}'# List all available models
curl --location --request GET 'http://localhost:9777/models'
# Inspect the currently loaded model
curl --location --request GET 'http://localhost:9777/model/current'
# View modality support for the active model
curl --location --request GET 'http://localhost:9777/model/capabilities'-
Docker container fails to start
- Run
docker logs multimodal-embedding-servingto inspect failures. - Ensure required ports (default
9777) are available.
- Run
-
Cannot access the microservice
-
Confirm the containers are running:
docker ps
-
Verify
EMBEDDING_MODEL_NAMEpoints to a supported entry and rerunsource setup.shif you make changes.
-
-
GPU runtime errors
-
Check Intel GPU device nodes:
ls -la /dev/dri
-
Confirm
EMBEDDING_USE_OV=truefor best performance with OpenVINO on GPU.
-