ResoMap implements multiple CNN architectures from src/models.py (443 lines), each optimized for different performance/accuracy tradeoffs. All models support variable input resolutions via adaptive pooling.
Training Status:
- ✅ Trained: simple_cnn, tiny_cnn (10 experiments completed over 2 days)
- 📋 Available: VGG, ResNet, MobileNet families (ready for future experiments)
View Results: https://dagshub.com/Y-R-A-V-R-5/ResoMap/experiments
VGG models use stacked convolutional layers organized into stages, followed by adaptive global average pooling and fully connected layers.
Input (variable resolution)
↓
Stages (repeating Conv+ReLU blocks)
├─ Stage 1: 3→64 channels
├─ Stage 2: 64→128 channels
├─ Stage 3: 128→256 channels
├─ Stage 4: 256→512 channels
└─ Stage 5: 512→512 channels
↓
AdaptiveAvgPool2d (7×7) ← Handles any resolution!
↓
Classifier (FC layers + Dropout)
↓
Output (num_classes)
Class: VGG(nn.Module) in src/models.py
Key Features:
- Modular stage-based construction from config
- Adaptive average pooling for any resolution
- Configurable FC layer sizes (default: 4096→4096→num_classes)
- Dropout regularization (default: 0.5)
- ReLU activations throughout
Variants:
vgg11: [1, 1, 2, 2, 2] # 11 conv layers
vgg13: [2, 2, 2, 2, 2] # 13 conv layers
vgg16: [2, 2, 3, 3, 3] # 16 conv layers (ImageNet standard)
vgg19: [2, 2, 3, 4, 4] # 19 conv layers (deeper)Numbers represent how many Conv layers per stage.
Parameters:
- vgg11: ~128M
- vgg13: ~133M
- vgg16: ~138M (most commonly trained)
Code Reference:
def forward(self, x):
# x shape: (batch, 3, H, W) - H and W can be any size
for stage in self.stages.values():
x = stage(x) # Each stage ends with MaxPool
x = self.avgpool(x) # (batch, 512, 7, 7) for any input
x = x.view(x.size(0), -1) # Flatten
x = self.classifier(x) # FC layers
return xWhy Adaptive Pooling Works:
- After all MaxPool operations, spatial dimensions are reduced
AdaptiveAvgPool2d((7,7))guarantees 7×7 output regardless of input- This enables fixed FC layer input (512×7×7 = 25,088 features)
Best For:
- ✅ Explainability research (simple, interpretable)
- ✅ Understanding layer-wise behavior
- ✅ GPU comparison (baseline architecture)
- ❌ Mobile inference (too many parameters)
ResNet uses residual connections (skip connections) to train very deep networks without gradient degradation.
Input (variable resolution)
↓
Initial Conv (7×7, stride=2)
↓
Layer1: Multiple Blocks with residuals
Layer2: Multiple Blocks with residuals
Layer3: Multiple Blocks with residuals
Layer4: Multiple Blocks with residuals
↓
AdaptiveAvgPool2d (1×1)
↓
FC layer (num_features → num_classes)
↓
Output
BasicBlock (ResNet18, ResNet34):
Input
↓
Conv 3×3 → BatchNorm → ReLU
↓
Conv 3×3 → BatchNorm
↓
Add with skip connection
↓
ReLU
↓
Output
Bottleneck (ResNet50, ResNet101):
Input
↓
Conv 1×1 (reduce)
↓
Conv 3×3 (main)
↓
Conv 1×1 (expand)
↓
BatchNorm → Add with skip → ReLU
↓
Output
Class: ResNet(nn.Module) in src/models.py
Key Features:
- Block repetition counts configurable per layer
- Bottleneck blocks for depth (ResNet50+)
- Batch normalization throughout
- Identity skip connections (straight path)
- Projected skip connections when spatial dims change
- Stride-2 in first block of layer 2-4 (downsampling)
Variants:
resnet18: [2, 2, 2, 2] blocks per layer + BasicBlock
resnet34: [3, 4, 6, 3] blocks per layer + BasicBlock
resnet50: [3, 4, 6, 3] blocks per layer + Bottleneck
resnet101: [3, 4, 23, 3] blocks per layer + BottleneckParameters:
- resnet18: ~11M (lightweight!)
- resnet34: ~21M
- resnet50: ~25M
- resnet101: ~44M
Code Reference:
class Bottleneck(nn.Module):
def forward(self, x):
identity = x
out = self.conv1(x) # 1×1 reduce
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out) # 3×3 main
out = self.bn2(out)
out = F.relu(out)
out = self.conv3(out) # 1×1 expand
out = self.bn3(out)
out += identity # Skip connection!
out = F.relu(out)
return outWhy Skip Connections Matter:
- Gradients can flow directly through the skip path
- Allows training very deep networks (101+ layers)
- Each block learns residual (difference) not absolute mapping
- Identity initialization: early training benefits from identity path
Best For:
- ✅ Accuracy vs depth analysis
- ✅ Computational efficiency comparison
- ✅ Transfer learning (great pre-trained models available)
- ✅ Mobile deployment (resnet18 is very efficient)
- ✅ Balanced accuracy/speed tradeoff
MobileNet uses depthwise separable convolutions to achieve high accuracy with minimal parameters - designed for mobile/edge devices.
Input (variable resolution)
↓
Conv 3×3 (32 filters)
↓
MobileBlock 1: Depthwise + Pointwise (expansion=1)
MobileBlock 2: Depthwise + Pointwise (expansion=6)
... (multiple blocks with different configs)
↓
AdaptiveAvgPool (1×1)
↓
FC (1000 → num_classes)
↓
Output
Standard Convolution:
- Input: (batch, in_channels, H, W)
- Kernel: (out_channels, in_channels, 3, 3)
- Computation: in_channels × H × W × out_channels × 9 operations
Depthwise Separable:
- Depthwise: (in_channels, 1, 3, 3) - one filter per channel
- Pointwise: (out_channels, in_channels, 1, 1) - cross-channel mixing
Benefit: ~8-9x fewer operations!
Input (expansion=6 for middle blocks)
↓
1×1 Conv (expand by 6x)
↓
Depthwise Conv 3×3 (ReLU6)
↓
1×1 Conv (project back)
↓
Skip connection (only if stride=1)
↓
Output
Why "inverted"? Traditional ResNet: wide→narrow→wide. MobileNet: narrow→wide→narrow.
Class: MobileNetV2(nn.Module) in src/models.py
Key Features:
- Configurable expansion factor (default=6)
- Width multiplier (default=1.0, can reduce to 0.75 for smaller models)
- ReLU6 activations
- Batch normalization throughout
- Stride control for spatial downsampling
MobileNetV3 Additions:
- Squeeze-and-Excitation (SE) blocks for channel attention
- Hard Swish activation (more efficient)
- More efficient block design
Parameters:
- mobilenet_v2 (width=1.0): ~3.5M
- mobilenet_v2_small (width=0.75): ~2.2M
- mobilenet_v3_small: ~2.5M
- mobilenet_v3_large: ~5.4M
Width Multiplier Effect:
width=1.0: all_channels × 1.0 → full model
width=0.75: all_channels × 0.75 → 50% parameters
width=0.5: all_channels × 0.5 → 25% parameters
Best For:
- ✅ Mobile/edge device deployment
- ✅ Efficiency vs accuracy analysis
- ✅ Finding smallest model for target accuracy
- ✅ Latency-critical applications
- ✅ Memory-constrained scenarios
Minimal architecture for quick experimentation and debugging:
Input (variable resolution)
↓
Conv 3×3 (3→32) + ReLU + MaxPool 2×2
↓
Conv 3×3 (32→64) + ReLU + MaxPool 2×2
↓
Conv 3×3 (64→128) + ReLU + MaxPool 2×2
↓
AdaptiveAvgPool (4×4)
↓
FC (128×16 → 128) + ReLU + Dropout
↓
FC (128 → num_classes)
↓
Output
Parameters: <1M Use Cases:
- ✅ Quick debugging
- ✅ Testing pipeline functionality
- ✅ Small dataset experiments
Even smaller baseline:
Input
↓
Conv 3×3 (3→16) + ReLU
↓
MaxPool 2×2
↓
Conv 3×3 (16→32) + ReLU
↓
AdaptiveAvgPool (2×2)
↓
FC (32×4 → 10)
↓
Output
Parameters: <0.5M
from src.models import build_model
from src.utils import load_config
config = load_config('configs/models.yaml')
model_cfg = config['models']['vgg11']
model = build_model(model_cfg, num_classes=7)
# Returns: VGG model for skin lesion classification (7 classes)from src.models import load_model_from_config
model = load_model_from_config('resnet50', num_classes=10)
# Automatically loads architecture from configs/models.yamlfrom src.models import ResNet, Bottleneck
model = ResNet(
block=Bottleneck,
block_counts=[3, 4, 6, 3], # ResNet50 config
num_classes=7
)| Model | Params | Size | Latency | Accuracy* | Explainability | Use Case |
|---|---|---|---|---|---|---|
| simple_cnn | <1M | <5MB | ⚡⚡⚡⚡⚡ | ⭐⭐ | ⭐⭐⭐⭐⭐ | Debugging |
| tiny_cnn | <0.5M | <2MB | ⚡⚡⚡⚡⚡ | ⭐⭐ | ⭐⭐⭐⭐⭐ | Baseline |
| mobilenet_v2_small | 2.2M | 10MB | ⚡⚡⚡⚡ | ⭐⭐⭐ | ⭐⭐⭐ | Mobile |
| mobilenet_v2 | 3.5M | 14MB | ⚡⚡⚡⚡ | ⭐⭐⭐ | ⭐⭐⭐ | Mobile |
| resnet18 | 11M | 45MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Balanced |
| resnet50 | 25M | 100MB | ⚡⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Production |
| vgg16 | 138M | 500MB | ⚡ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Research |
*Approximate accuracy on ImageNet (100 class)
All models use AdaptiveAvgPool2d() or AdaptiveMaxPool2d() instead of fixed pooling:
# Standard approach (fixed size input)
self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1)
# Works only for 224×224, breaks for other sizes
# Adaptive approach (any size input)
self.avgpool = nn.AdaptiveAvgPool2d(output_size=(7, 7))
# Works for 64×64, 224×224, 512×512, anything!How it works:
- Calculates stride/kernel dynamically: stride = input_size / output_size
- For 224×224 input → stride ≈ 32, kernel ≈ 32
- For 512×512 input → stride ≈ 73, kernel ≈ 73
- Always outputs exactly 7×7 (or specified size)
All models use proper weight initialization:
for m in model.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)- Kaiming (He) for convolutional layers
- Normal distribution for FC layers
- BatchNorm initialized to identity
Choose VGG if:
- Studying layer-wise behavior
- Need maximum interpretability
- Have sufficient GPU memory
- Analyzing Grad-CAM visualizations
Choose ResNet if:
- Want best accuracy/parameter tradeoff
- Training on limited GPU memory
- Need transfer learning models
- Production deployment planned
Choose MobileNet if:
- Deploying to mobile/edge devices
- Optimizing for inference speed
- Memory is critical constraint
- Need real-time performance
Choose Custom CNNs if:
- Debugging the pipeline
- Quick experimentation
- Establishing baseline
- Research into architecture basics
Next: TRAINING_EXECUTION.md - How to train these models
Back: PROJECT_SUMMARY.md - Project overview