Skip to content

BehrozKarim/gpu_accelerated_image_blurring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Accelerated Image Blurring

A comprehensive CUDA-based Image blur implementation with multiple optimization strategies, batch processing capabilities, and extensive performance benchmarking.

Multiple GPU Implementations:

  • Naive (Global Memory Only)

  • Tiled (Shared Memory with Halo)

  • Separable (Two 1D Passes)

  • Separable + Constant Memory

  • Separable + Shared Memory (Best Performance)

  • CPU Baseline: Multi-threaded OpenMP implementation for comparison

  • Batch Processing: CUDA Streams for overlapping computation and memory transfers

  • Image Format Support: PNG, JPG, BMP, PPM (via stb_image)

  • Comprehensive Benchmarking: Performance metrics, GFLOPS, bandwidth utilization

Project Structure

/
├── src/
│   ├── main.cu              # Main program and benchmarking
│   ├── kernels.cu           # All GPU kernel implementations
│   ├── support.cu           # Utility functions and timing
│   ├── image_io.cpp         # Image loading/saving
│   └── cpu_blur.cpp         # CPU baseline implementation
├── include/
│   ├── support.h            # Support function headers
│   ├── image_io.h           # Image I/O headers
│   ├── kernels.cuh          # Kernel declarations
│   ├── stb_image.h          # Image loading library
│   └── stb_image_write.h    # Image saving library
├── blur_real_images_5674317.out            # Sample output file
├── Makefile                 # Build configuration
├── run_mahti.sh             # SLURM batch script for CSC Mahti
└── README.md                # This file

Running the Code

Compilation

make clean
make

Run Default Benchmark

./blur_benchmark
# Default: 1024x1024 synthetic image, 7x7 kernel, sigma=2.0, 10 images

Process Your Own Image

# Process a real image (PNG, JPG, BMP, etc.)
./blur_benchmark path/to/image.png [kernel_size] [sigma] [batch_size]

# Examples:
./blur_benchmark photo.jpg 7 2.0 10           # Blur photo.jpg
./blur_benchmark input.png 11 3.0 5           # Strong blur

# Output saved to: output_images/output_blurred.png

Benchmark with Synthetic Images

./blur_benchmark [width] [height] [kernel_size] [sigma] [batch_size]

# Examples:
./blur_benchmark 512 512 5 1.5 5          # Small test
./blur_benchmark 2048 2048 9 2.5 20       # Large test
./blur_benchmark 3840 2160 7 2.0 10       # 4K test

Run Test Suites

make test_small    # 512x512, kernel=5, batch=5
make test_large    # 2048x2048, kernel=9, batch=20
make test_4k       # 3840x2160, kernel=7, batch=10

Implementation Details

Gaussian Blur Algorithm

Separable Gaussian blur is implemented using two 1D convolution passes:

  1. Horizontal Pass: Apply 1D Gaussian kernel along rows
  2. Vertical Pass: Apply 1D Gaussian kernel along columns

Complexity: O(n) per pixel instead of O(n²) for 2D convolution

Memory Optimizations

  1. Shared Memory Tiling: Reduces global memory accesses
  2. Constant Memory: Stores kernel coefficients for fast access
  3. Boundary Clamping: Handles edge cases efficiently
  4. Coalesced Access: Optimized memory access patterns

CUDA Streams

Batch processing uses multiple streams to overlap:

  • Host-to-Device memory transfers
  • Kernel execution
  • Device-to-Host memory transfers

Performance Metrics

The benchmark reports:

  • Kernel Execution Time: Pure GPU computation time
  • Total Time: Including memory transfers
  • Speedup vs CPU: GPU performance relative to multi-threaded CPU
  • GFLOPS: Floating-point operations per second
  • Throughput: Images processed per second
  • Verification: Correctness check against CPU baseline

Note

  • First run may be slower due to CUDA initialization
  • Performance scales with image size (better GPU utilization)
  • Larger kernels benefit more from separable approach
  • Batch processing shows significant benefits with streams

Authors

Course: GPU Programming

Project: GPU Accelerated Image Blurring

Group Members: Behroz Karim, Bhawish Raj, Hasnain Ajmal, Talha Rizwan, Zafeer ul Haq.

Acknowledgement

The stb_image and stb_image_write libraries were sourced from nothings/stb. AI tools were utilized to research optimization techniques for image blurring and to assist in developing the benchmarking infrastructure.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors