AI Video Summarization & Interactive Chat

This application provides AI-powered video summarization to generate concise summaries of key events and enables real-time interaction and queries via a chatbot interface.

Features

AI Video Summarization: Automatically extract and summarize key events from video streams using OpenCV for frame analysis and Vision-Language Models (VLM) for semantic understanding. Supports generating concise textual summaries and highlights for efficient review.
AI Chatbot: Engage in real-time conversations, ask questions about video content, and receive instant insights through an interactive Gradio interface.
Embedding Storage with ChromaDB: Store and manage vector embeddings efficiently using ChromaDB, enabling fast semantic search and retrieval for downstream analytics and querying.

Requirements

Validated Hardware

CPU: 13th Gen Intel(R) Core(TM) i9-13900K
GPU: Intel® Arc™ Pro B-Series Graphics
RAM: 32GB
Disk: 256GB

Application Ports

Service	Port	Use
Main Application	5999	Gradio web interface
Qwen2.5-VL-7B-Instruct	5776	Vision-Language Model server
Qwen3-8B	5778	Text generation model server
Fastapi	5777	API backend service and MCP server

Prerequisites

Before proceeding with the installation, ensure the following system requirements are met:

A compatible operating system (Ubuntu 24.04 or Windows 11) must be installed and running.
Intel GPU driver must be installed and properly configured on the system.

Quick Start

Windows

Run the provided PowerShell script to start the servers and application:

.\run_app.ps1

Alternatively, you can use the batch script:

.\run_app.bat

Once running, open http://localhost:5999 in your browser.

Linux

Before installing Python dependencies, ensure you have Python and FFmpeg installed:

sudo apt update
sudo apt install python3 python3-pip python3-venv ffmpeg

Run the provided bash script to start the servers and application:

./run_app.sh

Once running, open http://localhost:5999 in your browser.

Manual Setup Instructions

Choose the appropriate setup method for your operating system:

Windows Setup

1. Install Python Dependencies

Make sure you have Python 3.8 and higher installed. Then, install the required Python packages.

python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

2. Use Pre-compiled Llama.cpp Binaries

Download the pre-compiled Windows binaries for llama.cpp with Vulkan or SYCL support from the llama.cpp b7223 release page. Place the extracted llama-b7223-bin-win-vulkan-x64.zip folder in your project directory.

3. Start Llama Servers

Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5776) is optional if you have previously run it and already have the database.

Qwen3-8B (port 5778):

.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinja

Qwen2.5-VL-7B-Instruct (port 5776, optional):

.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinja

4. Run the Gradio Application

python app.py

Linux Setup

1. Install Python Dependencies

Make sure you have Python 3.8 and higher installed. Then, install the required Python packages.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Prepare llama.cpp

You can either compile llama.cpp with SYCL backend or use the precompiled Vulkan binary:

Option A: Compile llama.cpp with SYCL backend

Follow SYCL backend instructions:

Install oneAPI Base Toolkit (download link):

sudo apt update
sudo apt install -y gpg-agent wget
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
  | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install intel-deep-learning-essentials

Set up environment:

source /opt/intel/oneapi/<oneapi-version>/oneapi-vars.sh

Note: To verify SYCL installation, run:
sycl-ls

Build llama.cpp:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
sed -i 's/-DLLAMA_CURL=OFF/-DLLAMA_CURL=ON/g' ./examples/sycl/build.sh
sudo apt install curl libcurl4-openssl-dev cmake build-essential
./examples/sycl/build.sh

Option B: Use Precompiled Vulkan Binary

Download the precompiled Vulkan binary for Linux from the llama.cpp b7223 release page. Extract and place the binary in your project directory for use with Vulkan.

3. Start Llama Servers

Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5777) is optional if you have previously run it and already have the database.

Qwen3-8B (port 5778):

ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinja

Qwen2.5-VL-7B-Instruct (port 5776, optional):

ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinja

4. Run the Gradio Application

Once dependencies and the server are ready, run the script:

python3 app.py

Once started, open http://localhost:5999 in your browser.

FAQ

How do I change the video file, collection name, or system prompt?

The application comes with pre-configured scenarios (Traffic, Retail, Manufacturing), but you can customize them by modifying the config.json file. Each scenario contains the following configurable parameters:

Customizing Video Files and Settings

To change the video file, collection name, or system prompt for any scenario:

Open the config.json file in your project directory
Locate the scenario you want to modify (e.g., "Traffic", "Retail", "Manufacturing")
Update the following fields as needed:

{
    "YourScenario": {
        "video_path": "path/to/your/video.mp4",
        "video_label": "Your Custom Video Label",
        "collection_name": "your_collection_name",
        "header": "Your Custom Header Title",
        "description": "Your custom description for the scenario",
        "system_prompt": "Your custom system prompt that defines the AI assistant's behavior and analysis focus."
    }
}

Parameter Descriptions:

video_path: Path to your video file (relative to the project directory)
video_label: Display name for the video in the interface
collection_name: Name for the ChromaDB collection (used for storing embeddings)
header: Title displayed at the top of the web interface
description: Description text shown in the interface
system_prompt: Instructions that define how the AI assistant should analyze and respond to video content

Example: Adding a Custom Scenario

{
    "Security": {
        "video_path": "assets/security-footage.mp4",
        "video_label": "Security Monitoring",
        "collection_name": "security",
        "header": "Smart Security Intelligence: AI Video Summarization + Interactive Chat",
        "description": "Intelligent security monitoring system for detecting and analyzing suspicious activities.",
        "system_prompt": "You are a security monitoring assistant. Analyze the video for any suspicious activities, unauthorized access, or security incidents. Provide detailed descriptions of people, their actions, and any potential security concerns."
    }
}

Save the file and restart the application for changes to take effect
Place your video file in the specified path (typically in the assets/ folder)

How do I change the AI models used in the application?

The application uses three different AI models for various tasks:

Current Models:

Qwen3-8B (Port 5778): Text generation and chatbot responses
Qwen2.5-VL-7B-Instruct (Port 5776): Vision-language model for video frame analysis
BAAI/bge-small-en-v1.5: Embedding model for vector storage (used in both video_summarization.py and ai_chatbot.py)
BAAI/bge-reranker-base: Reranker model for improving search results (used in ai_chatbot.py)

Changing the LLM or VLM:

Modify the model in app.py:

Open app.py and locate the start_llamacpp_server() function
Replace the model names in the -hf parameter:

# For text generation model (port 5778)
"-hf", "your-organization/your-text-model-GGUF:quantization"

# For vision-language model (port 5776)  
"-hf", "your-organization/your-vision-model-GGUF:quantization"

Update API endpoints (if using different ports):
- In ai_chatbot.py, modify the api_base URL (line 19) for the text model
- In video_summarization.py, modify the base_url (line 70) for the vision model

Changing the Embedding Model:

Modify the embedding model in both files:
- In video_summarization.py (line 140): Replace the model name for video analysis embedding
- In ai_chatbot.py (line 42): Replace the model name for chatbot query embedding
```
# In both files
embed_model = HuggingFaceEmbedding(model_name="your-preferred-embedding-model")
```

Changing the Reranker Model:

Modify the reranker model in ai_chatbot.py:
- Open ai_chatbot.py
- Locate line 56 and replace the reranker model:
```
rerank = SentenceTransformerRerank(top_n=1, model="your-preferred-reranker-model")
```

Requirements for Model Changes:

GGUF Format: LLM models must be in GGUF format for llama.cpp compatibility
Hugging Face: Models should be available on Hugging Face Hub
Quantization: Choose appropriate quantization (e.g., Q4_K_M, Q5_K_M, Q8_0)
Hardware: Ensure your hardware can handle the model size and requirements

Example: Using Different Models

# In app.py - Replace with Llama 3.1 models
"-hf", "unsloth/Llama-3.1-8B-Instruct-GGUF:Q4_K_M"  # Text model
"-hf", "openbmb/MiniCPM-V-4_5-gguf:Q8_0"      # Vision model

# In both video_summarization.py and ai_chatbot.py - Use different embedding model  
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# In ai_chatbot.py - Use different reranker model
rerank = SentenceTransformerRerank(top_n=1, model="BAAI/bge-reranker-large")

Note: After changing models, restart the application and allow time for the new models to download on first use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Video Summarization & Interactive Chat

Features

Requirements

Validated Hardware

Application Ports

Prerequisites

Quick Start

Windows

Linux

Manual Setup Instructions

Windows Setup

1. Install Python Dependencies

2. Use Pre-compiled Llama.cpp Binaries

3. Start Llama Servers

4. Run the Gradio Application

Linux Setup

1. Install Python Dependencies

2. Prepare llama.cpp

3. Start Llama Servers

4. Run the Gradio Application

FAQ

How do I change the video file, collection name, or system prompt?

Customizing Video Files and Settings

Parameter Descriptions:

Example: Adding a Custom Scenario

How do I change the AI models used in the application?

Current Models:

Changing the LLM or VLM:

Changing the Embedding Model:

Changing the Reranker Model:

Requirements for Model Changes:

Example: Using Different Models

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AI Video Summarization & Interactive Chat

Features

Requirements

Validated Hardware

Application Ports

Prerequisites

Quick Start

Windows

Linux

Manual Setup Instructions

Windows Setup

1. Install Python Dependencies

2. Use Pre-compiled Llama.cpp Binaries

3. Start Llama Servers

4. Run the Gradio Application

Linux Setup

1. Install Python Dependencies

2. Prepare llama.cpp

3. Start Llama Servers

4. Run the Gradio Application

FAQ

How do I change the video file, collection name, or system prompt?

Customizing Video Files and Settings

Parameter Descriptions:

Example: Adding a Custom Scenario

How do I change the AI models used in the application?

Current Models:

Changing the LLM or VLM:

Changing the Embedding Model:

Changing the Reranker Model:

Requirements for Model Changes:

Example: Using Different Models