Skip to content

Mofazzal874/gpu_share

Repository files navigation

GPUShare: Campus GPU Pooling & Leasing

An OMNeT++ 6.2.0 simulation project modeling GPU resource sharing across VLAN-like networks. This project implements a control plane for GPU sharing without using the INET framework, building L2/L3 emulation from scratch using cSimpleModule.

Table of Contents


Project Overview

GPUShare simulates a distributed GPU scheduling system where:

  • GPU hosts periodically advertise availability via beacon messages
  • A central scheduler grants short-term leases to incoming jobs
  • Job clients generate requests and measure Job Completion Time (JCT)
  • Two VLAN-like LANs bridged by a simple Router enable cross-VLAN resource sharing
  • Background traffic flows simulate realistic network congestion
  • Metrics track GPU utilization, JCT, and system performance

Key Features:

  • Pure OMNeT++ implementation (no INET framework)
  • VLAN-based network segmentation with custom routing
  • Lease-based GPU scheduling with configurable policies
  • Background traffic simulation for network realism
  • Comprehensive signal-based instrumentation

Quick Start

Build

Navigate to the src directory and build with parallel jobs:

cd d:\omnetpp-6.2.0\samples\gpu_share\src
make clean
opp_makemake -f --deep -I.
make -j16

From OMNeT++ IDE:

  1. Right-click project → Refresh (F5)
  2. Project → Clean → Clean all projects
  3. Project → Build All (Ctrl+B)

Run

Latest Test (Phase 6 - With Background Traffic):

cd d:\omnetpp-6.2.0\samples\gpu_share\simulations\gpu_share_background
..\..\src\gpu_share.exe -f omnetpp.ini -c NoBackground -u Qtenv

From OMNeT++ IDE:

  1. Navigate to simulations/gpu_share_background/omnetpp.ini
  2. Right-click → Run As → OMNeT++ Simulation
  3. Select configuration (NoBackground, LightBackground, MediumBackground, HeavyBackground)
  4. Choose Qtenv (graphical) or Cmdenv (batch)
  5. Click Run

Clean

cd d:\omnetpp-6.2.0\samples\gpu_share\src
make clean

Project Status

Phase Feature Status
Phase 0 Project scaffold & build system ✅ Complete
Phase 1 VLAN bus implementation with serialization delays ✅ Complete
Phase 2 GPUHost with periodic beacons ✅ Complete
Phase 3 Central Scheduler & JobClient (end-to-end job lifecycle) ✅ Complete
Phase 4 Two VLANs + Router (cross-VLAN routing) ✅ Complete
Phase 5 Background TCP-like flows (network congestion) ✅ Complete
Phase 6 Instrumentation & metrics analysis ✅ Complete
Phase 7 No-sharing baseline comparison ⏳ Planned
Phase 8 Optional DHCP/DNS/RIP/OSPF/NAT stubs ⏳ Planned

Current Implementation: Full multi-VLAN GPU scheduling system with background traffic simulation and comprehensive metrics collection.


Architecture

Package Structure

gpu_share/                          # Root package
├── src/
│   ├── gpu/                       # Core GPU sharing subsystem
│   │   ├── messages/              # Protocol message definitions (.msg)
│   │   │   └── Lan.msg           # LanFrame, Beacon, JobRequest, LeaseGrant, etc.
│   │   └── modules/              # Module implementations (.ned, .cc)
│   │       ├── VlanBus.ned/cc    # VLAN broadcast bus with serialization
│   │       ├── GPUHost.ned/cc    # GPU resource provider
│   │       ├── Scheduler.ned/cc  # Central job scheduler
│   │       ├── JobClient.ned/cc  # Job request generator
│   │       ├── Router.ned/cc     # Cross-VLAN routing
│   │       └── BackgroundFlow.ned/cc  # Network traffic generator
│   └── package.ned               # Root package declaration
└── simulations/                   # Test scenarios
    ├── scaffold_test/             # Phase 0: Basic connectivity
    ├── vlan_smoke/                # Phase 1: VLAN broadcast test
    ├── gpu_host_test/             # Phase 2: Beacon generation
    ├── gpu_share_min/             # Phase 3: End-to-end job scheduling
    ├── gpu_share_two_vlan/        # Phase 4: Multi-VLAN routing
    └── gpu_share_background/      # Phase 5+6: With background traffic

Message Protocol

All network communication uses LanFrame base class (defined in Lan.msg):

Message Type Type ID Size Purpose
Beacon 1 ~64 bytes GPU host availability advertisements
JobRequest 2 ~96 bytes Client job submissions
LeaseGrant 3 ~80 bytes Scheduler lease approvals
JobStart 4 ~72 bytes Job execution start notifications
JobDone 5 ~72 bytes Job completion notifications
DataPkt 6 Variable Background traffic packets

Key Modules

  • VlanBus: Hub-like broadcast bus with serialization delays
  • GPUHost: GPU resource provider with beacon generation
  • Scheduler: Central job scheduler with lease management
  • JobClient: Job request generator with JCT tracking
  • Router: Cross-VLAN packet forwarding
  • BackgroundFlow: Network traffic generator

Documentation

Phase Documentation

Each phase has detailed implementation documentation:

Testing & Validation

Build & Configuration

Phase Summaries

Bug Fixes & Troubleshooting

Analysis


Available Simulations

Phase 6: Background Traffic (Latest)

cd simulations\gpu_share_background

# Baseline (no background traffic)
..\..\src\gpu_share.exe -f omnetpp.ini -c NoBackground -u Qtenv

# Light background traffic
..\..\src\gpu_share.exe -f omnetpp.ini -c LightBackground -u Qtenv

# Medium background traffic
..\..\src\gpu_share.exe -f omnetpp.ini -c MediumBackground -u Qtenv

# Heavy background traffic (near saturation)
..\..\src\gpu_share.exe -f omnetpp.ini -c HeavyBackground -u Qtenv

# Batch mode (all configurations, all repetitions)
..\..\src\gpu_share.exe -f omnetpp.ini -u Cmdenv

Phase 5: Background Traffic Tests

cd simulations\gpu_share_background
..\..\src\gpu_share.exe -f omnetpp.ini -u Qtenv

Phase 4: Two-VLAN Tests

cd simulations\gpu_share_two_vlan
..\..\src\gpu_share.exe -f omnetpp.ini -c TwoVlan_Basic -u Qtenv

Phase 3: GPU Share Min (Basic GPU sharing)

cd simulations\gpu_share_min
..\..\src\gpu_share.exe -f omnetpp.ini -c GPUShareMin_Basic -u Qtenv

Phase 2: GPU Host Test

cd simulations\gpu_host_test
..\..\src\gpu_share.exe -f omnetpp.ini -c GPUHost_Basic -u Qtenv

Phase 1: VLAN Smoke Test

cd simulations\vlan_smoke
..\..\src\gpu_share.exe -f omnetpp.ini -c VlanSmoke_Basic -u Qtenv

Development Environment

Requirements:

  • OMNeT++ 6.2.0
  • Windows 11 (primary development platform)
  • C++ compiler (MinGW/MSVC via OMNeT++ installation)
  • Git (for version control)

Project Configuration:

  • No INET Framework - Pure OMNeT++ implementation
  • Signal-based metrics - All statistics use OMNeT++ signals
  • Package naming - Uses underscores (gpu_share), not hyphens
  • Build system - Standard OMNeT++ makefile generation

Contributing

This project follows a phased development approach:

  1. Each phase must be fully verified before proceeding to the next
  2. All code must compile without warnings or errors
  3. Verification checklist (per phase):
    • ✅ Build succeeds without errors
    • ✅ All .msg files generate _m.h/_m.cc correctly
    • ✅ Simulation runs for specified duration
    • ✅ Expected event log messages appear
    • ✅ Statistics signals emit correct values
    • ✅ Result files (.vec, .sca) contain expected data
    • ✅ Numbers match expectations (e.g., frame counts, delays)

Code Conventions:

  • Module registration: Define_Module(ClassName) at global scope
  • Debug logging: EV << "message" << endl
  • Parameter access: par("paramName") in initialize()
  • Memory management: Always delete messages after processing
  • Serialization delay: Calculate before sendDelayed()

License

This project is an educational simulation for Computer Networks coursework.


Acknowledgments

Built with OMNeT++ 6.2.0 discrete event simulation framework.

About

Impletementation of GPU-sharing in a campus environment with Omnet++. Part Of CSE4106:Computer Networks Laboratory

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors