This application provides AI-powered video summarization to generate concise summaries of key events and enables real-time interaction and queries via a chatbot interface.
-
AI Video Summarization: Automatically extract and summarize key events from video streams using OpenCV for frame analysis and Vision-Language Models (VLM) for semantic understanding. Supports generating concise textual summaries and highlights for efficient review.
-
AI Chatbot: Engage in real-time conversations, ask questions about video content, and receive instant insights through an interactive Gradio interface.
-
Embedding Storage with ChromaDB: Store and manage vector embeddings efficiently using ChromaDB, enabling fast semantic search and retrieval for downstream analytics and querying.
- CPU: 13th Gen Intel(R) Core(TM) i9-13900K
- GPU: Intel® Arc™ Pro B-Series Graphics
- RAM: 32GB
- Disk: 256GB
| Service | Port | Use |
|---|---|---|
| Main Application | 5999 | Gradio web interface |
| Qwen2.5-VL-7B-Instruct | 5776 | Vision-Language Model server |
| Qwen3-8B | 5778 | Text generation model server |
| Fastapi | 5777 | API backend service and MCP server |
Before proceeding with the installation, ensure the following system requirements are met:
- A compatible operating system (Ubuntu 24.04 or Windows 11) must be installed and running.
- Intel GPU driver must be installed and properly configured on the system.
Run the provided PowerShell script to start the servers and application:
.\run_app.ps1Alternatively, you can use the batch script:
.\run_app.batOnce running, open http://localhost:5999 in your browser.
Before installing Python dependencies, ensure you have Python and FFmpeg installed:
sudo apt update
sudo apt install python3 python3-pip python3-venv ffmpegRun the provided bash script to start the servers and application:
./run_app.shOnce running, open http://localhost:5999 in your browser.
Choose the appropriate setup method for your operating system:
Make sure you have Python 3.8 and higher installed. Then, install the required Python packages.
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txtDownload the pre-compiled Windows binaries for llama.cpp with Vulkan or SYCL support from the llama.cpp b7223 release page. Place the extracted llama-b7223-bin-win-vulkan-x64.zip folder in your project directory.
Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5776) is optional if you have previously run it and already have the database.
Qwen3-8B (port 5778):
.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinjaQwen2.5-VL-7B-Instruct (port 5776, optional):
.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinjapython app.pyMake sure you have Python 3.8 and higher installed. Then, install the required Python packages.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtYou can either compile llama.cpp with SYCL backend or use the precompiled Vulkan binary:
Option A: Compile llama.cpp with SYCL backend
Follow SYCL backend instructions:
-
Install oneAPI Base Toolkit (download link):
sudo apt update sudo apt install -y gpg-agent wget wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list sudo apt update sudo apt install intel-deep-learning-essentials
-
Set up environment:
source /opt/intel/oneapi/<oneapi-version>/oneapi-vars.sh
Note: To verify SYCL installation, run:
sycl-ls
-
Build llama.cpp:
git clone https://github.com/ggml-org/llama.cpp cd llama.cpp sed -i 's/-DLLAMA_CURL=OFF/-DLLAMA_CURL=ON/g' ./examples/sycl/build.sh sudo apt install curl libcurl4-openssl-dev cmake build-essential ./examples/sycl/build.sh
Option B: Use Precompiled Vulkan Binary
Download the precompiled Vulkan binary for Linux from the llama.cpp b7223 release page. Extract and place the binary in your project directory for use with Vulkan.
Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5777) is optional if you have previously run it and already have the database.
Qwen3-8B (port 5778):
ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinjaQwen2.5-VL-7B-Instruct (port 5776, optional):
ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinjaOnce dependencies and the server are ready, run the script:
python3 app.pyOnce started, open http://localhost:5999 in your browser.
The application comes with pre-configured scenarios (Traffic, Retail, Manufacturing), but you can customize them by modifying the config.json file. Each scenario contains the following configurable parameters:
To change the video file, collection name, or system prompt for any scenario:
- Open the
config.jsonfile in your project directory - Locate the scenario you want to modify (e.g., "Traffic", "Retail", "Manufacturing")
- Update the following fields as needed:
{
"YourScenario": {
"video_path": "path/to/your/video.mp4",
"video_label": "Your Custom Video Label",
"collection_name": "your_collection_name",
"header": "Your Custom Header Title",
"description": "Your custom description for the scenario",
"system_prompt": "Your custom system prompt that defines the AI assistant's behavior and analysis focus."
}
}video_path: Path to your video file (relative to the project directory)video_label: Display name for the video in the interfacecollection_name: Name for the ChromaDB collection (used for storing embeddings)header: Title displayed at the top of the web interfacedescription: Description text shown in the interfacesystem_prompt: Instructions that define how the AI assistant should analyze and respond to video content
{
"Security": {
"video_path": "assets/security-footage.mp4",
"video_label": "Security Monitoring",
"collection_name": "security",
"header": "Smart Security Intelligence: AI Video Summarization + Interactive Chat",
"description": "Intelligent security monitoring system for detecting and analyzing suspicious activities.",
"system_prompt": "You are a security monitoring assistant. Analyze the video for any suspicious activities, unauthorized access, or security incidents. Provide detailed descriptions of people, their actions, and any potential security concerns."
}
}- Save the file and restart the application for changes to take effect
- Place your video file in the specified path (typically in the
assets/folder)
The application uses three different AI models for various tasks:
- Qwen3-8B (Port 5778): Text generation and chatbot responses
- Qwen2.5-VL-7B-Instruct (Port 5776): Vision-language model for video frame analysis
- BAAI/bge-small-en-v1.5: Embedding model for vector storage (used in both
video_summarization.pyandai_chatbot.py) - BAAI/bge-reranker-base: Reranker model for improving search results (used in
ai_chatbot.py)
-
Modify the model in app.py:
- Open
app.pyand locate thestart_llamacpp_server()function - Replace the model names in the
-hfparameter:
# For text generation model (port 5778) "-hf", "your-organization/your-text-model-GGUF:quantization" # For vision-language model (port 5776) "-hf", "your-organization/your-vision-model-GGUF:quantization"
- Open
-
Update API endpoints (if using different ports):
- In
ai_chatbot.py, modify theapi_baseURL (line 19) for the text model - In
video_summarization.py, modify thebase_url(line 70) for the vision model
- In
-
Modify the embedding model in both files:
- In
video_summarization.py(line 140): Replace the model name for video analysis embedding - In
ai_chatbot.py(line 42): Replace the model name for chatbot query embedding
# In both files embed_model = HuggingFaceEmbedding(model_name="your-preferred-embedding-model")
- In
-
Modify the reranker model in ai_chatbot.py:
- Open
ai_chatbot.py - Locate line 56 and replace the reranker model:
rerank = SentenceTransformerRerank(top_n=1, model="your-preferred-reranker-model")
- Open
- GGUF Format: LLM models must be in GGUF format for llama.cpp compatibility
- Hugging Face: Models should be available on Hugging Face Hub
- Quantization: Choose appropriate quantization (e.g., Q4_K_M, Q5_K_M, Q8_0)
- Hardware: Ensure your hardware can handle the model size and requirements
# In app.py - Replace with Llama 3.1 models
"-hf", "unsloth/Llama-3.1-8B-Instruct-GGUF:Q4_K_M" # Text model
"-hf", "openbmb/MiniCPM-V-4_5-gguf:Q8_0" # Vision model
# In both video_summarization.py and ai_chatbot.py - Use different embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
# In ai_chatbot.py - Use different reranker model
rerank = SentenceTransformerRerank(top_n=1, model="BAAI/bge-reranker-large")Note: After changing models, restart the application and allow time for the new models to download on first use.
