Skip to content

Latest commit

 

History

History
180 lines (124 loc) · 6 KB

File metadata and controls

180 lines (124 loc) · 6 KB

Introduction

This is a demo application that uses power of AI (Large Video Language model) for live video stream captioning for retail loss prevention scenario and Visual RAG (text-to-image retrieval) using Intel VDMS vector database.

img

Pre-requisite:

  • Ubuntu 24.04 LTS
  • Docker (https://docs.docker.com/engine/install/ubuntu/)
  • Miniforge Conda (https://conda-forge.org/download/)
  • Python 3.10+
  • tmux
  • Target Hardware:
    • Intel® Core™ Processors Platform (Alderlake, Raptor Lake, Arrow Lake, etc)
    • Intel® Arc™ A-series or Intel® Arc™ B-series graphics card (A770/B580)
    • Intel® Core™ Ultra Series Processor Platform (Lunar Lake, Arrow Lake-H)
  • Additional installation guide, see link.

Environment Setup

  1. Refer to the pre-requisite section, and follow the instructions from the website to install docker, Miniforge Conda.
  2. Install other ubuntu dependencies
sudo apt update
sudo apt install tmux ffmpeg
  1. Pull the code
mkdir -p $HOME/work
cd $HOME/work
git clone https://github.com/intel/edge-developer-kit-reference-scripts edge-ai-devkit
cp -rf edge-ai-devkit/usecases/ai/video_summarization ./video_summarization
cd $HOME/work/video_summarization
  1. Create conda environment and install conda packages.
conda create -n openvino-env python=3.11
conda activate openvino-env
conda update --all
conda install -c conda-forge openvino=2026.0.0
pip install -r requirements.txt

Model Preparation

Convert and Quantized MiniCPM-v-2_6 to OpenVINO IR format (INT8)

mkdir -p $HOME/work
cd $HOME/work
optimum-cli export openvino -m openbmb/MiniCPM-V-2_6 --trust-remote-code --weight-format int8 MiniCPM_INT8

NOTE: If it prompt for access issue to the model, follow the link in the terminal and request access. Then run this to set your huggingface token export HF_TOKEN=<token>

Starting Demo App

This demo app is comprising of the following modules/components:

  • VLM API service (port 8000)

  • Retriver API service (port 8001)

  • Live Summarizer UI (port 8888)

  • Video RAG UI (port 9999)

  • Simple-RTSP-server (port 8554)

  • VDMS Vector DB (port 55555)

Start using docker

  1. Name of the docker containers and its associated ports. Please make sure the ports are not used by any other locally hosted services.

    Container Name Exposed Ports
    vlm_api_service 8000
    rag_api_service 8001
    vector_store 55555
    rtmp_server 8554
    summarizer_ui 8888
    retriever_ui 9999
  2. create chunks folder (folder to hold the video chunks file), and change permission of the chunks folder.

    mkdir -p ../chunks
    chmod 777 ../chunks
    
  3. Launch linux terminal from desktop and cd to project directory:

    cd $HOME/work/video_summarization
    
  4. Run the command below:

    docker compose build
    docker compose up -d
    
  5. Start virtual camera stream. You may use the utility script below to do that.

    ./start_virtual_rtsp_cam0.sh /path/to/video.mp4   #replace /path/to/video.mp4 with the absolute path to your own video file
    

    Note: The utility script only create one video stream, please copy and edit the script so more stream can be created. Make sure the video is posted to the URL as shown in the table below:

    Camera Name URL
    CAM0 rtsp://localhost:8554/live
    CAM1 rtsp://localhost:8554/live1
    CAM2 rtsp://localhost:8554/live2
    • clear_database.sh - use this script to clear the VDMS vector store

    • start_virtual_rtsp_cam0.sh - use this script to create a virtual camera stream. Expected input video format. Resolution: 1920x1080, Framerate: 15fps.

  6. Access to web browser http://127.0.0.1:8888 to view summarize UI

  7. Access to web brower http://127.0.0.1:9999 to view retriever UI

  8. To stop the demo:

    docker compose down
    
  9. Other useful command

    tmux ls   # use this command to check if virtual camera stream is running
    docker compose ls   # use this command to check if all services of the demo is running
    docker compose top # use this command to check the container name
    docker compose logs [container_name]  # use this command to retrieve the runtime logs of a specific container
    
  10. The docker-compose.yml uses environment variables to pass additional configuration parameters to the containers. Change this from

Environment Variables Containers Default Value Sample Value
Proxy settings: HTTP_PROXY, HTTPS_PROXY, NO_PROXY vlm_api_service, rag_api_service, summarizer_ui, retriever_ui None http://proxy.domain.com:8080
* AI backend: DEVICE vlm_api_service GPU GPU.1, GPU.0, GPU, CPU, NPU

*Note:

  1. DEVICE=NPU is not supported yet
  2. You may also use this command to identify all AI accelerator supported in your HW. python -c "import openvino as ov; print(ov.Core().available_devices)"

Performance

Pre-requisites

sudo apt install cargo pkg-config libudev-dev
  1. Install qmassa. Follow instruction in https://github.com/ulissesf/qmassa.

  2. Run qmassa in command line to view the GPU utilization.

    sudo $HOME/.cargo/bin/qmassa
    

    Note: If it doesn't correctly show the gpu utilization for your intel GPU device, pass the parameter "-d bus:device:func" to qmassa. You may look up for the BDF (bus:device:func) of your Intel GPU card using the command 'lspci'.