Skip to content

Latest commit

 

History

History
165 lines (131 loc) · 5.71 KB

File metadata and controls

165 lines (131 loc) · 5.71 KB

Phase 3 Fix - Address Filtering

Problem Identified

The simulation was showing these issues:

  1. ❌ Wrong clients receiving LeaseGrants (Client2 getting grants for Client1's jobs)
  2. ❌ Multiple hosts receiving same lease (both Host1 and Host2 starting same job)
  3. ❌ Duplicate warnings about unknown messages
  4. ❌ Host slots being over-allocated

Root Cause

The VlanBus broadcasts all messages to all connected nodes (hub-like behavior). This is correct for a VLAN/LAN emulation, but each module must filter messages to only process those addressed to them.

The original implementation did NOT check the destAddr field, so:

  • All hosts processed all LeaseGrants
  • All clients processed all LeaseGrants
  • Everyone logged "unknown message" for messages not meant for them

Solution Applied

Added address filtering to all modules to check destAddr before processing messages.

1. GPUHost.cc (Lines 88-94)

else if (LeaseGrant *lease = dynamic_cast<LeaseGrant*>(msg)) {
    // Check if this lease is addressed to this host
    if (lease->getDestAddr() == hostId || lease->getDestAddr() == -1) {
        // Received a lease grant from scheduler
        handleLeaseGrant(lease);
    }
    delete msg;
}

Change: Only process LeaseGrant if destAddr == hostId or destAddr == -1 (broadcast)

2. JobClient.cc (Lines 108-114)

else if (LeaseGrant *lease = dynamic_cast<LeaseGrant*>(msg)) {
    // Check if this lease is addressed to this client
    if (lease->getDestAddr() == clientId || lease->getDestAddr() == -1) {
        // Received a lease grant from scheduler
        handleLeaseGrant(lease);
    }
    delete msg;
}

Change: Only process LeaseGrant if destAddr == clientId or destAddr == -1 (broadcast)

3. Scheduler.cc (Lines 106-118)

if (Beacon *beacon = dynamic_cast<Beacon*>(msg)) {
    // Beacons are broadcast (-1), scheduler should process them
    if (beacon->getDestAddr() == -1 || beacon->getDestAddr() == schedulerId) {
        handleBeacon(beacon);
    }
    delete msg;
}
else if (JobRequest *request = dynamic_cast<JobRequest*>(msg)) {
    // JobRequests are broadcast (-1), scheduler should process them
    if (request->getDestAddr() == -1 || request->getDestAddr() == schedulerId) {
        handleJobRequest(request);
    }
    delete msg;
}

Change: Only process Beacon/JobRequest if addressed to scheduler or broadcast

4. All Modules

Removed debug logging for "unknown messages" - now modules silently discard unaddressed messages.

How Address Filtering Works

Message Types and Addressing:

Message Type Sender destAddr Who Should Process
Beacon GPUHost -1 (broadcast) Scheduler only
JobRequest JobClient -1 (broadcast) Scheduler only
LeaseGrant (to client) Scheduler clientId Specific client only
LeaseGrant (to host) Scheduler hostId Specific host only
JobStart GPUHost -1 (broadcast) All (informational)
JobDone GPUHost -1 (broadcast) All clients (filtered by jobId)

Broadcast Bus Behavior:

  1. VlanBus receives message on port X
  2. VlanBus duplicates and sends to all other ports (Y, Z, W...)
  3. Each recipient checks: "Is this for me?"
    • If destAddr == myId OR destAddr == -1: Process it
    • Otherwise: Discard silently

This emulates real Ethernet/VLAN behavior where all nodes see all frames but only process relevant ones.

Expected Behavior After Fix

No More Duplicate Processing:

  • ✅ Only assigned host processes its LeaseGrant
  • ✅ Only requesting client processes its LeaseGrant
  • ✅ Only scheduler processes Beacons and JobRequests
  • ✅ No "unknown message" warnings in logs

Clean Event Log:

✓ JobClient1 submitted job #1000
✓ Scheduler100 received JobRequest #1000 from client 1
✓ GPUHost1 sending beacon, freeSlots=2/2
✓ Scheduler100 received beacon from host 1
✓ Scheduler100 granted lease for job #1000 to host 1
✓ JobClient1 received LeaseGrant for job #1000, assignedHost=1  ← Only Client1
✓ GPUHost1 received LeaseGrant for job 1000                     ← Only Host1
✓ GPUHost1 started job 1000, freeSlots now=1/2
✓ GPUHost1 completed job 1000
✓ JobClient1 job #1000 completed, JCT=3.6s

Correct Resource Allocation:

  • ✅ Each job assigned to ONE host (no duplicates)
  • ✅ Host free slots correctly tracked
  • ✅ No over-allocation warnings

Build Instructions

After applying these fixes:

cd d:\omnetpp-6.2.0\samples\gpu_share\src
make clean
make -j16

Then re-run the simulation:

cd ..\simulations\gpu_share_min
..\..\src\gpu_share.exe -f omnetpp.ini -u Qtenv -c GPUShareMin_Basic

Verification Checklist

After re-running, verify:

  • No warnings about "unknown message"
  • No warnings about "no free slots"
  • No warnings about "unknown job" from wrong client
  • Each job assigned to exactly ONE host
  • Each client only receives grants for its own jobs
  • Host free slots decrease/increase correctly
  • Queue length goes to 0 after grants
  • JCT values are reasonable (3-7 seconds)

Architecture Note

This fix maintains the broadcast bus architecture (Phase 1 requirement) while adding proper message filtering at the application layer. This is how real Ethernet LANs work:

  • Physical Layer: Hub/switch broadcasts frames to all ports
  • Data Link Layer: NICs filter frames by MAC address
  • Application Layer: Processes filter by higher-level addressing

Our implementation:

  • VlanBus: Broadcasts (hub-like, no filtering)
  • Module handleMessage(): Filters by destAddr (NIC-like)
  • Module handlers: Process message content

This is a realistic emulation of VLAN behavior without using INET framework.