This is a demo application that uses power of AI (Large Video Language model) for live video stream captioning for retail loss prevention scenario and Visual RAG (text-to-image retrieval) using Intel VDMS vector database.
- Ubuntu 24.04 LTS
- Docker (https://docs.docker.com/engine/install/ubuntu/)
- Miniforge Conda (https://conda-forge.org/download/)
- Python 3.10+
- tmux
- Target Hardware:
- Intel® Core™ Processors Platform (Alderlake, Raptor Lake, Arrow Lake, etc)
- Intel® Arc™ A-series or Intel® Arc™ B-series graphics card (A770/B580)
- Intel® Core™ Ultra Series Processor Platform (Lunar Lake, Arrow Lake-H)
- Additional installation guide, see link.
- Refer to the pre-requisite section, and follow the instructions from the website to install docker, Miniforge Conda.
- Install other ubuntu dependencies
sudo apt update
sudo apt install tmux ffmpeg
- Pull the code
mkdir -p $HOME/work
cd $HOME/work
git clone https://github.com/intel/edge-developer-kit-reference-scripts edge-ai-devkit
cp -rf edge-ai-devkit/usecases/ai/video_summarization ./video_summarization
cd $HOME/work/video_summarization
- Create conda environment and install conda packages.
conda create -n openvino-env python=3.11
conda activate openvino-env
conda update --all
conda install -c conda-forge openvino=2026.0.0
pip install -r requirements.txt
Convert and Quantized MiniCPM-v-2_6 to OpenVINO IR format (INT8)
mkdir -p $HOME/work
cd $HOME/work
optimum-cli export openvino -m openbmb/MiniCPM-V-2_6 --trust-remote-code --weight-format int8 MiniCPM_INT8
NOTE: If it prompt for access issue to the model, follow the link in the terminal and request access. Then run this to set your huggingface token
export HF_TOKEN=<token>
This demo app is comprising of the following modules/components:
-
VLM API service (port 8000)
-
Retriver API service (port 8001)
-
Live Summarizer UI (port 8888)
-
Video RAG UI (port 9999)
-
Simple-RTSP-server (port 8554)
-
VDMS Vector DB (port 55555)
-
Name of the docker containers and its associated ports. Please make sure the ports are not used by any other locally hosted services.
Container Name Exposed Ports vlm_api_service 8000 rag_api_service 8001 vector_store 55555 rtmp_server 8554 summarizer_ui 8888 retriever_ui 9999 -
create chunks folder (folder to hold the video chunks file), and change permission of the chunks folder.
mkdir -p ../chunks chmod 777 ../chunks -
Launch linux terminal from desktop and cd to project directory:
cd $HOME/work/video_summarization -
Run the command below:
docker compose build docker compose up -d -
Start virtual camera stream. You may use the utility script below to do that.
./start_virtual_rtsp_cam0.sh /path/to/video.mp4 #replace /path/to/video.mp4 with the absolute path to your own video fileNote: The utility script only create one video stream, please copy and edit the script so more stream can be created. Make sure the video is posted to the URL as shown in the table below:
Camera Name URL CAM0 rtsp://localhost:8554/live CAM1 rtsp://localhost:8554/live1 CAM2 rtsp://localhost:8554/live2 -
clear_database.sh - use this script to clear the VDMS vector store
-
start_virtual_rtsp_cam0.sh - use this script to create a virtual camera stream. Expected input video format. Resolution: 1920x1080, Framerate: 15fps.
-
-
Access to web browser http://127.0.0.1:8888 to view summarize UI
-
Access to web brower http://127.0.0.1:9999 to view retriever UI
-
To stop the demo:
docker compose down -
Other useful command
tmux ls # use this command to check if virtual camera stream is running docker compose ls # use this command to check if all services of the demo is running docker compose top # use this command to check the container name docker compose logs [container_name] # use this command to retrieve the runtime logs of a specific container -
The docker-compose.yml uses environment variables to pass additional configuration parameters to the containers. Change this from
| Environment Variables | Containers | Default Value | Sample Value |
|---|---|---|---|
| Proxy settings: HTTP_PROXY, HTTPS_PROXY, NO_PROXY | vlm_api_service, rag_api_service, summarizer_ui, retriever_ui | None | http://proxy.domain.com:8080 |
| * AI backend: DEVICE | vlm_api_service | GPU | GPU.1, GPU.0, GPU, CPU, NPU |
*Note:
- DEVICE=NPU is not supported yet
- You may also use this command to identify all AI accelerator supported in your HW.
python -c "import openvino as ov; print(ov.Core().available_devices)"
Pre-requisites
sudo apt install cargo pkg-config libudev-dev
-
Install qmassa. Follow instruction in https://github.com/ulissesf/qmassa.
-
Run qmassa in command line to view the GPU utilization.
sudo $HOME/.cargo/bin/qmassaNote: If it doesn't correctly show the gpu utilization for your intel GPU device, pass the parameter "-d bus:device:func" to qmassa. You may look up for the BDF (bus:device:func) of your Intel GPU card using the command 'lspci'.
