Skip to content

Latest commit

 

History

History
306 lines (231 loc) · 9.86 KB

File metadata and controls

306 lines (231 loc) · 9.86 KB

VLM Guard

This system demonstrates real-time video streaming with dangerous action detection using Vision-Language Models LLava, and RS485 device control.

Project Structure

vlm_demo/
├── app.py                 # Main application entry point
├── web_ui.py              # Web interface implementation
├── pyproject.toml         # Project dependencies and metadata
├── start_demo.sh          # Startup script
├── README.md              # This file
├── models/                # Model implementations
│   ├── __init__.py        # Package initialization
│   ├── video_streamer.py  # Video streaming and analysis
│   ├── database.py        # Database models and initialization
│   ├── rs485_controller.py     # RS485 controller (integrated light control and sensor reading)
│   ├── rs485_sensor_data_sender.py  # RS485 sensor data sender
│   └── data_visualizer_receiver.py  # Data visualization receiver
├── services/              # Service layer implementations
│   ├── __init__.py        # Package initialization
│   ├── app_service.py     # Application service layer
│   └── config.py          # Configuration management
├── templates/             # Web UI templates
│   └── web_ui.html        # Main web interface
└── data/                  # Data directory (database and images)
    └── vlm_demo.db        # SQLite database for storing analysis records

Prerequisites

  1. Python 3.9+
  2. Ollama with required models
  3. Required Python packages (see pyproject.toml)
  4. Need Jetson or GPU have ARM>8GB

Installation

  1. Clone or download this repository
git clone https://github.com/Seeed-Projects/VLM-Guard.git
  1. Create a virtual environment and activate it:
cd VLM-Guard
pip install uv 
  1. Install required dependencies:
uv sync 
source .venv/bin/activate

Model Requirements

The system requires the following models to be available in Ollama:

  • gemma3:4b - For image analysis and dangerous behavior detection

To install this model:

# Install Ollama from https://ollama.com/
ollama pull gemma3:4b
ollama serve

Quick Start

Using the startup script (recommended)

# Start with default settings
./start_demo.sh

# Or with custom parameters
./start_demo.sh --port 5005 --web-port 5006

# Or using environment variables
PORT=5005 WEB_PORT=5006 ./start_demo.sh

# Without RS485 support
./start_demo.sh --no-rs485

# With video file as source
./start_demo.sh --video-source /path/to/video.mp4

# View all options
./start_demo.sh --help

Once started, open your browser and navigate to http://localhost:5001 to access the web interface.

Detailed Usage

Command Line Options

start_demo.sh Options

./start_demo.sh --help

Options:
  --port PORT              UDP端口 (默认: 5000)
  --host HOST              主机地址 (默认: localhost)
  --web-port WEB_PORT      Web界面端口 (默认: 5001)
  --chart-port CHART_PORT  图表数据端口 (默认: 5002)
  --rs485-port RS485_PORT  RS485串口设备 (默认: /dev/ttyTHS1)
  --rs485-baud RS485_BAUD  RS485波特率 (默认: 9600)
  --lux-sensor-addr ADDR   光照传感器地址 (默认: 0x0B)
  --light-control-addr ADDR 灯光控制地址 (默认: 0x01)
  --description-interval SECONDS 分析间隔 (默认: 10)
  --model MODEL            Ollama模型名称 (默认: gemma3:4b)
  --video-source SOURCE    视频源 (默认: 0)
  --vllm-url URL           vLLM API URL (默认: http://localhost:11434/v1/completions)
  --no-rs485               禁用RS485设备支持
  --help                   显示帮助信息

Environment variables can also be used for configuration:

  • PORT - UDP port for data transfer
  • HOST - Host address
  • WEB_PORT - Web interface port
  • CHART_PORT - Chart data port
  • RS485_PORT - RS485 serial device
  • RS485_BAUD - RS485 baud rate
  • LUX_SENSOR_ADDR - Light sensor address
  • LIGHT_CONTROL_ADDR - Light control device address
  • DESCRIPTION_INTERVAL - Analysis interval in seconds
  • MODEL - Ollama model name
  • VIDEO_SOURCE - Video source
  • VLLM_URL - vLLM API URL

Manual startup

# Terminal 1: Start the video streamer
python app.py --port 5000 --host localhost

# Terminal 2: Start the web interface
python web_ui.py --port 5000 --host localhost --web-port 5001

With direct RS485 device support:

# Terminal 1: Start the video streamer with direct RS485 support
python app.py --enable-rs485-direct --rs485-port /dev/ttyTHS1 --rs485-baud 9600

# Terminal 2: Start the web interface
python web_ui.py --port 5000 --host localhost --web-port 5001 --chart-port 5002

Web Interface

The web interface consists of three main sections:

  1. Video Stream: Real-time video display from the camera or video file
  2. Analysis Results: Current and historical analysis results with danger indicators
  3. vLLM Chat: Interactive chat interface to communicate with the vLLM model
    • The chat interface automatically includes the latest 20 analysis records as context
    • Users can ask questions about the video analysis history

Video Source Options

The system supports multiple video sources:

  • Default Camera: --video-source 0 (default)
  • Specific Camera: --video-source 1 (for second camera)
  • Video File: --video-source /path/to/video.mp4

System Architecture

Data Flow

  1. Video Capture: Video is captured from camera or video file
  2. Frame Analysis: Every 5 seconds (configurable), a frame is sent to LLaVA model for analysis
  3. Danger Detection: LLaVA model detects dangerous behaviors and sends results via UDP
  4. vLLM Interaction: Analysis results are stored in a database for chat context
  5. RS485 Data: Light sensor readings are processed via Modbus RTU protocol
  6. RS485 Control: Light control commands are sent via Modbus RTU protocol based on:
    • Sensor data (ambient light levels)
    • vLLM analysis results (danger detection)
  7. Data Display: All data is displayed in real-time on the web interface
  8. User Interaction: Users can chat with vLLM through the web interface

Architecture Layers

The system follows a layered architecture pattern with clear separation of concerns:

  1. Application Layer (app.py): Entry point and main application logic
  2. Service Layer (services/): Business logic and coordination between components
  3. Model Layer (models/): Data models and core functionality implementations
  4. Presentation Layer (web_ui.py, templates/): Web interface and user interaction

This architecture improves maintainability, testability, and scalability of the system.

RS485 Device Control

The system supports RS485 devices including:

  1. Light Sensor: Reads ambient light levels in Lux
  2. Light Control Device: Controls RGB lighting via Modbus commands

Automatic Light Control Logic

When using the --enable-rs485-direct option, the system automatically controls the RS485 lights based on two conditions:

  1. Sensor-based Control:

    • When ambient light level is below 50 Lux, the light turns red
    • When ambient light level is 50 Lux or above, the light turns green
  2. vLLM Analysis-based Control:

    • When vLLM detects dangerous behavior, the light turns yellow
    • When vLLM determines the scene is safe, the light turns green

The vLLM-based control takes precedence over sensor-based control when both conditions are active.

Ports

  • Port 5000: UDP data transfer (video frames, analysis results, and vLLM responses)
  • Port 5001: Web interface for viewing video and analysis
  • Port 5002: Data visualization

Troubleshooting

Common Issues

  1. "无法打开摄像头" (Cannot open camera):

    • Ensure you have a camera connected
    • Check camera permissions
    • Try different camera indices (0, 1, 2, etc.)
  2. Connection errors:

    • Make sure Ollama is running (ollama serve)
    • Ensure required models are downloaded (ollama pull gemma3:4b)
    • Check that vLLM URL is correct
  3. Web interface doesn't load:

    • Check that both components are running
    • Verify ports 5000 and 5001 are not blocked by firewall
    • Check browser console for JavaScript errors
  4. RS485 device connection issues:

    • Check serial port permissions: ls -l /dev/ttyTHS1
    • Ensure correct device addresses are configured
    • Verify baud rate settings match your devices

Debugging Tips

  1. Enable detailed logging:

    export PYTHON_LOG_LEVEL=DEBUG
    python app.py --port 5000 --host localhost
  2. Check Ollama status:

    ollama list  # List installed models
    ollama ps    # Show running models
  3. Monitor UDP traffic:

    sudo tcpdump -i lo udp port 5000  # On Linux

Performance Optimization

  1. Video Quality:

    • Adjust JPEG compression quality in video_streamer.py
    • Consider using lower resolution for faster processing
  2. Model Performance:

    • Use GPU acceleration with Ollama if available
    • Consider using smaller models for faster inference
  3. Network Optimization:

    • Use localhost for minimal latency
    • Ensure sufficient bandwidth for video streaming
  4. RS485 Communication:

    • Optimize polling intervals for sensor readings
    • Use appropriate baud rates for your hardware

Result

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Ollama for providing the LLM infrastructure
  • LLaVA for the vision-language model
  • OpenCV for computer vision capabilities
  • Flask for web interface framework
  • PyModbus for Modbus communication
  • SQLAlchemy for database ORM