🚌 Crowd Detection & Dynamic Bus Dispatch

AI-Powered Smart City Crowd Management

Deep Learning system that counts crowds at transit stops and automatically dispatches extra buses when capacity is exceeded.

Problem • Solution • Screenshots • Why Density Maps • Architecture • How It Works • Get Started

🚨 The Problem

Urban public transport stops face a silent crisis: unmonitored overcrowding.

When 80 people pile up at a bus stop designed for 50, there's no automated system to detect the overload. Dispatchers rely on manual reports or complaints — by the time they react, citizens have already waited 30+ minutes in frustration.

Traditional approaches fail for two key reasons:

Object detection models (YOLO, Faster R-CNN) collapse in dense crowds. When dozens of people overlap, only partial features — a head, a shoulder — are visible. Bounding-box algorithms can't draw boxes around what they can't isolate.

Scale variation breaks detectors. People close to the camera appear 5x larger than those in the back. A single frame contains massive scale differences that standard detectors weren't designed to handle.

The problem isn't a lack of cameras — it's the lack of intelligent counting behind them.

💡 Our Solution

Crowd Detection doesn't try to draw boxes around people — it predicts density.

We built a Density Map Regression system powered by CSRNet (Congested Scene Recognition Network). Instead of isolating individuals, the model generates a continuous heatmap where each pixel represents crowd density. The total count is simply the integral (sum) of the entire map.

This approach is fundamentally more robust against occlusion, scale variation, and real-world noise.

┌──────────────────────────────────────────────────────────────────┐
│                                                                  │
│   Object Detection (YOLO)          Density Map (CSRNet)          │
│   ───────────────────────          ────────────────────          │
│                                                                  │
│   ┌─────────────────────┐          ┌─────────────────────┐      │
│   │ ┌──┐  ┌──┐          │          │ ░░▒▒▓▓██████▓▓▒▒░░  │      │
│   │ │??│  │??│ ┌──┐     │          │ ░▒▒▓▓████████▓▓▒░░  │      │
│   │ └──┘  └──┘ │??│     │          │ ░░▒▓▓██████████▓▒░  │      │
│   │   ┌──┐     └──┘     │          │ ░░▒▒▓▓████████▓▒░░  │      │
│   │   │??│  Missed: 60% │          │ ░░░▒▒▓▓██████▓▒░░░  │      │
│   │   └──┘               │          │                     │      │
│   └─────────────────────┘          │  Σ pixels = 73 👥    │      │
│                                    └─────────────────────┘      │
│   ❌ Detected: 12/73               ✅ Estimated: 73 (±3)        │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

The Pipeline

Upload → Citizen or camera feed sends a crowd image
Preprocess → Image is normalized using ImageNet standards
Predict → CSRNet generates a 2D density heatmap
Count → Sum all pixel values = estimated crowd size
Decide → If count > 50 → 🚌 Dispatch Extra Bus alert triggered
Visualize → Original image + heatmap displayed side-by-side

📸 Screenshots

🧠 Why Density Map Regression?

Challenge	Object Detection	Density Map Regression
Severe Occlusion	❌ Fails — can't draw boxes around overlapping people	✅ Robust — predicts density per pixel, no isolation needed
Scale Variation	❌ Struggles — same frame has tiny & large people	✅ Handles natively — dilated convolutions capture multi-scale context
Counting Accuracy	❌ Misses 40-60% in dense scenes	✅ ±5% error in ShanghaiTech benchmarks
Speed	⚡ Fast, but inaccurate	⚡ Fast AND accurate
Real-world Noise	❌ False positives from bags, signs, shadows	✅ Learned density patterns from 400+ annotated scenes

Bottom line: In a crowd of 73 people, YOLO might detect 12. CSRNet estimates 73 ± 3.

🏗️ Technical Architecture

┌────────────────────────────────────────────────────────────────┐
│                   Crowd Detection Architecture                  │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                   STREAMLIT FRONTEND                      │  │
│  │                                                          │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐  │  │
│  │  │   Image     │  │   Density    │  │   Dispatch     │  │  │
│  │  │  Uploader   │  │   Heatmap    │  │   Alert UI     │  │  │
│  │  └──────┬──────┘  └──────┬───────┘  └───────┬────────┘  │  │
│  │         │                │                   │           │  │
│  │  ┌──────┴────────────────┴───────────────────┴────────┐  │  │
│  │  │              Analysis Engine                        │  │  │
│  │  │  ┌───────────────────────────────────────────────┐  │  │  │
│  │  │  │  1. ImageNet Normalization                    │  │  │  │
│  │  │  │  2. CSRNet Inference (CPU)                    │  │  │  │
│  │  │  │  3. Density Map → Crowd Count                 │  │  │  │
│  │  │  │  4. Threshold Decision (> 50 = Alert)         │  │  │  │
│  │  │  └───────────────────────────────────────────────┘  │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    CSRNet MODEL                           │  │
│  │                                                          │  │
│  │  Input Image (RGB)                                       │  │
│  │       │                                                  │  │
│  │  ┌────▼─────────────────────────────┐                    │  │
│  │  │  FRONTEND — VGG-16 (Layer 0-23)  │                    │  │
│  │  │  Pre-trained ImageNet weights    │                    │  │
│  │  │  Output: 512-ch feature maps     │                    │  │
│  │  └────┬─────────────────────────────┘                    │  │
│  │       │                                                  │  │
│  │  ┌────▼─────────────────────────────┐                    │  │
│  │  │  BACKEND — Dilated Convolutions  │                    │  │
│  │  │  6× Conv2d (dilation=2)          │                    │  │
│  │  │  512 → 512 → 512 → 256 → 128 → 64│                   │  │
│  │  │  Preserves spatial resolution    │                    │  │
│  │  └────┬─────────────────────────────┘                    │  │
│  │       │                                                  │  │
│  │  ┌────▼─────────────────────────────┐                    │  │
│  │  │  OUTPUT — Conv2d(64, 1, 1×1)     │                    │  │
│  │  │  → 2D Density Map (Heatmap)      │                    │  │
│  │  │  → Σ(pixels) = Crowd Count       │                    │  │
│  │  └──────────────────────────────────┘                    │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                  DEPLOYMENT TARGET                        │  │
│  │          Hugging Face Spaces (Streamlit SDK)              │  │
│  │          CPU-only inference via map_location              │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Model — CSRNet (Congested Scene Recognition Network)

Component	Details	Purpose
Frontend	VGG-16, layers 0–23, pre-trained on ImageNet	High-level feature extraction from crowd images
Backend	6× Dilated Conv2d (`dilation=2`, `padding=2`)	Expands receptive field without losing spatial resolution
Output	Conv2d(64, 1, kernel_size=1)	Single-channel density map — sum = crowd count
Training Data	ShanghaiTech Part B	400+ annotated crowd scenes with dot-labeled heads
Inference	CPU-optimized (`map_location='cpu'`)	Runs on free-tier Hugging Face Spaces

Application — Streamlit

Component	Technology	Purpose
Framework	Streamlit 1.57	Interactive web interface for image upload & visualization
Deep Learning	PyTorch 2.0+	Model inference engine
Backbone	torchvision (VGG-16)	Pre-trained feature extractor weights
Visualization	Matplotlib (`jet` colormap)	Heatmap rendering of density maps
Image Processing	Pillow, NumPy	Image loading, tensor operations
Deployment	Hugging Face Spaces	Free CPU-tier cloud hosting

⚡ How It Works

Step 1: Image Upload

The user uploads a JPG/PNG photo of a transit stop through the Streamlit interface.

Step 2: Preprocessing

The image is converted to RGB and normalized using ImageNet standards:

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Step 3: Density Map Prediction

CSRNet processes the image through VGG-16 frontend → Dilated Conv backend → 1×1 output layer, producing a 2D density heatmap.

Step 4: Crowd Counting

crowd_count = max(0, int(np.sum(density_map)))

Step 5: Dispatch Decision

┌─────────────────────────────────────────────────┐
│                                                 │
│   crowd_count ≤ 50    →  ✅ Normal level        │
│   crowd_count > 50    →  🚨 EXTRA BUS ALERT     │
│                                                 │
└─────────────────────────────────────────────────┘

The system displays the original image alongside the density heatmap and shows a clear dispatch decision.

🚀 Getting Started

Prerequisites

Python 3.10+ (Install Guide)
Git (Install Guide)

1. Clone the Repository

git clone https://github.com/ErenBalkis/Smart_Transit_AI.git
cd Smart_Transit_AI

2. Create a Virtual Environment (Recommended)

python -m venv venv

# Linux / macOS
source venv/bin/activate

# Windows
venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Ensure Model Weights

Make sure the trained best_model.pth file is in the project root directory.

5. Run the Application

streamlit run app.py

The app will open at http://localhost:8501.

📁 Project Structure

crowd_detection/
├── app.py                 # Streamlit application + CSRNet model definition
│                          # ├─ CSRNet class (VGG-16 frontend + dilated backend)
│                          # ├─ Model loading with CPU mapping
│                          # ├─ Image preprocessing pipeline
│                          # └─ UI: upload, heatmap, metrics, dispatch alert
├── best_model.pth         # Trained CSRNet weights (ShanghaiTech Part B)
├── requirements.txt       # Python dependencies
├── info.md                # Technical specification document
└── README.md              # This file

🔬 Technical Deep Dive: Dilated Convolutions

The secret sauce behind CSRNet's accuracy is dilated (atrous) convolutions. Standard pooling layers downsample the feature map, destroying spatial information critical for density estimation. Dilated convolutions solve this by expanding the receptive field without reducing resolution:

Standard Conv (3×3, dilation=1)       Dilated Conv (3×3, dilation=2)
┌───┬───┬───┬───┬───┐                ┌───┬───┬───┬───┬───┐
│ ■ │ ■ │ ■ │   │   │                │ ■ │   │ ■ │   │ ■ │
├───┼───┼───┼───┼───┤                ├───┼───┼───┼───┼───┤
│ ■ │ ■ │ ■ │   │   │                │   │   │   │   │   │
├───┼───┼───┼───┼───┤                ├───┼───┼───┼───┼───┤
│ ■ │ ■ │ ■ │   │   │                │ ■ │   │ ■ │   │ ■ │
├───┼───┼───┼───┼───┤                ├───┼───┼───┼───┼───┤
│   │   │   │   │   │                │   │   │   │   │   │
├───┼───┼───┼───┼───┤                ├───┼───┼───┼───┼───┤
│   │   │   │   │   │                │ ■ │   │ ■ │   │ ■ │
└───┴───┴───┴───┴───┘                └───┴───┴───┴───┴───┘

Receptive field: 3×3               Receptive field: 5×5
Parameters: 9                      Parameters: 9 (same!)

Same computational cost, wider field of view — allowing the model to understand both local details and crowd-level patterns simultaneously.

🌐 Live Demo & Deployment

This project is deployed and running on Hugging Face Spaces:

Uses Streamlit SDK with map_location=torch.device('cpu') for seamless CPU-tier inference
Zero-config deployment — push to HF repo and it builds automatically
The YAML frontmatter in this README configures the Space (SDK version, app file, etc.)

Built with 🚌 by Eren Balkış
Smart cities start with smarter infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚌 Crowd Detection & Dynamic Bus Dispatch

AI-Powered Smart City Crowd Management

🚨 The Problem

💡 Our Solution

The Pipeline

📸 Screenshots

🧠 Why Density Map Regression?

🏗️ Technical Architecture

Model — CSRNet (Congested Scene Recognition Network)

Application — Streamlit

⚡ How It Works

Step 1: Image Upload

Step 2: Preprocessing

Step 3: Density Map Prediction

Step 4: Crowd Counting

Step 5: Dispatch Decision

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

4. Ensure Model Weights

5. Run the Application

📁 Project Structure

🔬 Technical Deep Dive: Dilated Convolutions

🌐 Live Demo & Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚌 Crowd Detection & Dynamic Bus Dispatch

AI-Powered Smart City Crowd Management

🚨 The Problem

💡 Our Solution

The Pipeline

📸 Screenshots

🧠 Why Density Map Regression?

🏗️ Technical Architecture

Model — CSRNet (Congested Scene Recognition Network)

Application — Streamlit

⚡ How It Works

Step 1: Image Upload

Step 2: Preprocessing

Step 3: Density Map Prediction

Step 4: Crowd Counting

Step 5: Dispatch Decision

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

4. Ensure Model Weights

5. Run the Application

📁 Project Structure

🔬 Technical Deep Dive: Dilated Convolutions

🌐 Live Demo & Deployment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages