VLM Guard

This system demonstrates real-time video streaming with dangerous action detection using Vision-Language Models LLava, and RS485 device control.

Project Structure

vlm_demo/
├── app.py                 # Main application entry point
├── web_ui.py              # Web interface implementation
├── pyproject.toml         # Project dependencies and metadata
├── start_demo.sh          # Startup script
├── README.md              # This file
├── models/                # Model implementations
│   ├── __init__.py        # Package initialization
│   ├── video_streamer.py  # Video streaming and analysis
│   ├── database.py        # Database models and initialization
│   ├── rs485_controller.py     # RS485 controller (integrated light control and sensor reading)
│   ├── rs485_sensor_data_sender.py  # RS485 sensor data sender
│   └── data_visualizer_receiver.py  # Data visualization receiver
├── services/              # Service layer implementations
│   ├── __init__.py        # Package initialization
│   ├── app_service.py     # Application service layer
│   └── config.py          # Configuration management
├── templates/             # Web UI templates
│   └── web_ui.html        # Main web interface
└── data/                  # Data directory (database and images)
    └── vlm_demo.db        # SQLite database for storing analysis records

Prerequisites

Python 3.9+
Ollama with required models
Required Python packages (see pyproject.toml)
Need Jetson or GPU have ARM>8GB

Installation

Clone or download this repository

git clone https://github.com/Seeed-Projects/VLM-Guard.git

Create a virtual environment and activate it:

cd VLM-Guard
pip install uv

Install required dependencies:

uv sync 
source .venv/bin/activate

Model Requirements

The system requires the following models to be available in Ollama:

gemma3:4b - For image analysis and dangerous behavior detection

To install this model:

# Install Ollama from https://ollama.com/
ollama pull gemma3:4b
ollama serve

Quick Start

Using the startup script (recommended)

# Start with default settings
./start_demo.sh

# Or with custom parameters
./start_demo.sh --port 5005 --web-port 5006

# Or using environment variables
PORT=5005 WEB_PORT=5006 ./start_demo.sh

# Without RS485 support
./start_demo.sh --no-rs485

# With video file as source
./start_demo.sh --video-source /path/to/video.mp4

# View all options
./start_demo.sh --help

Once started, open your browser and navigate to http://localhost:5001 to access the web interface.

Detailed Usage

Command Line Options

start_demo.sh Options

./start_demo.sh --help

Options:
  --port PORT              UDP端口 (默认: 5000)
  --host HOST              主机地址 (默认: localhost)
  --web-port WEB_PORT      Web界面端口 (默认: 5001)
  --chart-port CHART_PORT  图表数据端口 (默认: 5002)
  --rs485-port RS485_PORT  RS485串口设备 (默认: /dev/ttyTHS1)
  --rs485-baud RS485_BAUD  RS485波特率 (默认: 9600)
  --lux-sensor-addr ADDR   光照传感器地址 (默认: 0x0B)
  --light-control-addr ADDR 灯光控制地址 (默认: 0x01)
  --description-interval SECONDS 分析间隔 (默认: 10)
  --model MODEL            Ollama模型名称 (默认: gemma3:4b)
  --video-source SOURCE    视频源 (默认: 0)
  --vllm-url URL           vLLM API URL (默认: http://localhost:11434/v1/completions)
  --no-rs485               禁用RS485设备支持
  --help                   显示帮助信息

Environment variables can also be used for configuration:

PORT - UDP port for data transfer
HOST - Host address
WEB_PORT - Web interface port
CHART_PORT - Chart data port
RS485_PORT - RS485 serial device
RS485_BAUD - RS485 baud rate
LUX_SENSOR_ADDR - Light sensor address
LIGHT_CONTROL_ADDR - Light control device address
DESCRIPTION_INTERVAL - Analysis interval in seconds
MODEL - Ollama model name
VIDEO_SOURCE - Video source
VLLM_URL - vLLM API URL

Manual startup

# Terminal 1: Start the video streamer
python app.py --port 5000 --host localhost

# Terminal 2: Start the web interface
python web_ui.py --port 5000 --host localhost --web-port 5001

With direct RS485 device support:

# Terminal 1: Start the video streamer with direct RS485 support
python app.py --enable-rs485-direct --rs485-port /dev/ttyTHS1 --rs485-baud 9600

# Terminal 2: Start the web interface
python web_ui.py --port 5000 --host localhost --web-port 5001 --chart-port 5002

Web Interface

The web interface consists of three main sections:

Video Stream: Real-time video display from the camera or video file
Analysis Results: Current and historical analysis results with danger indicators
vLLM Chat: Interactive chat interface to communicate with the vLLM model
- The chat interface automatically includes the latest 20 analysis records as context
- Users can ask questions about the video analysis history

Video Source Options

The system supports multiple video sources:

Default Camera: --video-source 0 (default)
Specific Camera: --video-source 1 (for second camera)
Video File: --video-source /path/to/video.mp4

System Architecture

Data Flow

Video Capture: Video is captured from camera or video file
Frame Analysis: Every 5 seconds (configurable), a frame is sent to LLaVA model for analysis
Danger Detection: LLaVA model detects dangerous behaviors and sends results via UDP
vLLM Interaction: Analysis results are stored in a database for chat context
RS485 Data: Light sensor readings are processed via Modbus RTU protocol
RS485 Control: Light control commands are sent via Modbus RTU protocol based on:
- Sensor data (ambient light levels)
- vLLM analysis results (danger detection)
Data Display: All data is displayed in real-time on the web interface
User Interaction: Users can chat with vLLM through the web interface

Architecture Layers

The system follows a layered architecture pattern with clear separation of concerns:

Application Layer (app.py): Entry point and main application logic
Service Layer (services/): Business logic and coordination between components
Model Layer (models/): Data models and core functionality implementations
Presentation Layer (web_ui.py, templates/): Web interface and user interaction

This architecture improves maintainability, testability, and scalability of the system.

RS485 Device Control

The system supports RS485 devices including:

Light Sensor: Reads ambient light levels in Lux
Light Control Device: Controls RGB lighting via Modbus commands

Automatic Light Control Logic

When using the --enable-rs485-direct option, the system automatically controls the RS485 lights based on two conditions:

Sensor-based Control:
- When ambient light level is below 50 Lux, the light turns red
- When ambient light level is 50 Lux or above, the light turns green
vLLM Analysis-based Control:
- When vLLM detects dangerous behavior, the light turns yellow
- When vLLM determines the scene is safe, the light turns green

The vLLM-based control takes precedence over sensor-based control when both conditions are active.

Ports

Port 5000: UDP data transfer (video frames, analysis results, and vLLM responses)
Port 5001: Web interface for viewing video and analysis
Port 5002: Data visualization

Troubleshooting

Common Issues

"无法打开摄像头" (Cannot open camera):
- Ensure you have a camera connected
- Check camera permissions
- Try different camera indices (0, 1, 2, etc.)
Connection errors:
- Make sure Ollama is running (ollama serve)
- Ensure required models are downloaded (ollama pull gemma3:4b)
- Check that vLLM URL is correct
Web interface doesn't load:
- Check that both components are running
- Verify ports 5000 and 5001 are not blocked by firewall
- Check browser console for JavaScript errors
RS485 device connection issues:
- Check serial port permissions: ls -l /dev/ttyTHS1
- Ensure correct device addresses are configured
- Verify baud rate settings match your devices

Debugging Tips

Enable detailed logging:

export PYTHON_LOG_LEVEL=DEBUG
python app.py --port 5000 --host localhost

Check Ollama status:

ollama list  # List installed models
ollama ps    # Show running models

Monitor UDP traffic:

sudo tcpdump -i lo udp port 5000  # On Linux

Performance Optimization

Video Quality:
- Adjust JPEG compression quality in video_streamer.py
- Consider using lower resolution for faster processing
Model Performance:
- Use GPU acceleration with Ollama if available
- Consider using smaller models for faster inference
Network Optimization:
- Use localhost for minimal latency
- Ensure sufficient bandwidth for video streaming
RS485 Communication:
- Optimize polling intervals for sensor readings
- Use appropriate baud rates for your hardware

Result

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Ollama for providing the LLM infrastructure
LLaVA for the vision-language model
OpenCV for computer vision capabilities
Flask for web interface framework
PyModbus for Modbus communication
SQLAlchemy for database ORM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM Guard

Project Structure

Prerequisites

Installation

Model Requirements

Quick Start

Using the startup script (recommended)

Detailed Usage

Command Line Options

start_demo.sh Options

Manual startup

Web Interface

Video Source Options

System Architecture

Data Flow

Architecture Layers

RS485 Device Control

Automatic Light Control Logic

Ports

Troubleshooting

Common Issues

Debugging Tips

Performance Optimization

Result

Contributing

License

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

VLM Guard

Project Structure

Prerequisites

Installation

Model Requirements

Quick Start

Using the startup script (recommended)

Detailed Usage

Command Line Options

start_demo.sh Options

Manual startup

Web Interface

Video Source Options

System Architecture

Data Flow

Architecture Layers

RS485 Device Control

Automatic Light Control Logic

Ports

Troubleshooting

Common Issues

Debugging Tips

Performance Optimization

Result

Contributing

License

Acknowledgments