🚀 Try it now: https://real-time-web-rtc-vlm-multi-object-pi.vercel.app/
An intelligent real-time video streaming and object detection system that combines WebRTC technology with Vision Language Models (VLM) for advanced multi-object detection and analysis.
- Laptop (Viewer) creates a session and displays a QR code
- Phone scans the QR code and grants camera access
- WebRTC establishes peer-to-peer connection for real-time streaming
- AI Detection Service analyzes video frames for object detection
- Real-time overlay displays detected objects with bounding boxes and labels
- Live analytics provides detection statistics and performance metrics
- Frontend: Next.js (TypeScript) - Real-time video streaming interface
- Backend: Node.js with Socket.IO - WebRTC signaling server
- Detection Service: Python FastAPI - YOLOv5 object detection engine
- WebRTC: Peer-to-peer video streaming with frame extraction
- AI Models: YOLOv5 for real-time multi-object detection
cd backend
npm install
npm run devcd detection-service
pip install -r requirements.txt
python detection_server.pycd frontend
npm install
npm run devVisit http://localhost:3000 to start intelligent video streaming with object detection!
Real-time-WebRTC-VLM-Multi-Object-Detection/
├── backend/ # Node.js WebRTC signaling server
│ ├── server.js # Main signaling server
│ ├── package.json # Backend dependencies
│ └── test-backend.js # Server testing utilities
├── frontend/ # Next.js TypeScript application
│ ├── src/
│ │ ├── app/ # Next.js App Router pages
│ │ ├── components/ # React components for detection overlay
│ │ └── utils/ # WebRTC and detection client utilities
│ ├── package.json # Frontend dependencies
│ └── next.config.js # Next.js configuration
├── detection-service/ # Python AI detection service
│ ├── detection_server.py # FastAPI detection server
│ ├── yolo_detector.py # YOLOv5 detection engine
│ ├── requirements.txt # Python dependencies
│ ├── yolov5n.pt # Pre-trained YOLO model
│ └── test_detector.py # Detection testing utilities
└── README.md # Project documentation
- Connect your GitHub repo to your preferred platform (Render/Heroku)
- Create a new Web Service
- Set build command:
npm install - Set start command:
npm start - Add environment variable:
FRONTEND_URL=https://your-deployment-url.com
- Connect your GitHub repo to Vercel/Netlify
- Set root directory to
frontend/ - Add environment variables:
NEXT_PUBLIC_SIGNALING_SERVER_URL=https://your-backend-url.comNEXT_PUBLIC_DETECTION_SERVER_URL=https://your-detection-service-url.com
- Deploy!
- Deploy to cloud platform supporting Python (Google Cloud Run/AWS Lambda/Heroku)
- Install dependencies:
pip install -r requirements.txt - Start service:
python detection_server.py - Ensure service is accessible via HTTP/HTTPS
NEXT_PUBLIC_SIGNALING_SERVER_URL=http://localhost:3001
NEXT_PUBLIC_DETECTION_SERVER_URL=http://localhost:5000PORT=3001
FRONTEND_URL=http://localhost:3000
NODE_ENV=developmentPORT=5000
MODEL_PATH=./yolov5n.pt
DETECTION_THRESHOLD=0.5
MAX_DETECTIONS=100- Create Session: Visit the web app and click "Start New Detection Session"
- Scan QR Code: Use your phone to scan the displayed QR code
- Grant Permissions: Allow camera and microphone access on your phone
- Enable AI Detection: Toggle object detection to start AI analysis
- Start Streaming: Watch live video with real-time object detection overlays
- Analyze Results: View detection statistics, confidence scores, and FPS metrics
- ✅ Real-time Object Detection - YOLOv5-powered multi-object recognition
- ✅ Live Video Streaming - WebRTC peer-to-peer video transmission
- ✅ Detection Overlays - Bounding boxes with confidence scores
- ✅ QR Code Session Joining - Easy mobile device connection
- ✅ Performance Metrics - Real-time FPS and detection statistics
- ✅ Mobile-Optimized Interface - Responsive design for all devices
- ✅ Camera Switching - Front/back camera toggle support
- ✅ Automatic Reconnection - Robust connection handling
- ✅ Session Management - Secure temporary session handling
- ✅ Multi-Object Support - Detect multiple objects simultaneously
- ✅ Configurable Thresholds - Adjustable detection confidence levels
- ✅ Export Detection Results - Save detection data and statistics
- HTTPS Required: Camera access requires secure connection (except localhost)
- Peer-to-Peer: Video streams directly between devices (not through server)
- AI Processing: Detection runs on dedicated service, no data retention
- Temporary Sessions: Sessions are automatically cleaned up
- No Recording: No video data is stored on servers
- Secure Detection: Object detection data is processed in real-time only
- Node.js 18+
- Python 3.8+
- Modern browser with WebRTC support
- HTTPS for production (camera access requirement)
- Start detection service:
cd detection-service && python detection_server.py - Start backend:
cd backend && npm run dev - Start frontend:
cd frontend && npm run dev - Visit
http://localhost:3000
- ✅ Chrome (Desktop & Mobile) - Full WebRTC + Detection support
- ✅ Firefox (Desktop & Mobile) - Full WebRTC + Detection support
- ✅ Safari (Desktop & Mobile) - WebRTC support with detection
- ✅ Edge (Desktop) - Full feature support
- ❌ Internet Explorer (not supported)
Object Detection not working?
- Ensure detection service is running on port 5000
- Check detection service health endpoint
- Verify model file (yolov5n.pt) is present
- Check detection service logs for errors
Camera not working?
- Ensure HTTPS connection in production
- Check browser permissions
- Try refreshing the page
Connection issues?
- Check network connectivity
- Verify environment variables are set correctly
- Check browser console for WebRTC errors
QR code not scanning?
- Ensure good lighting conditions
- Try manual URL entry
- Check if QR scanner app is working properly
Poor detection performance?
- Adjust detection threshold settings
- Check lighting conditions
- Ensure stable network connection
- Monitor detection service CPU/memory usage
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.