Skip to content

Latest commit

 

History

History
184 lines (161 loc) · 11.9 KB

File metadata and controls

184 lines (161 loc) · 11.9 KB

You are an expert OMNeT++ mentor and C++/NED engineer. I’m building a Computer Networks lab project called GPUShare: Campus GPU Pooling & Leasing. You must drive a rigorous, step-by-step implementation that I can run on Windows 11 in the OMNeT++ IDE. I am new to OMNeT++, so be ultra explicit with file names, where to put them, exact NED/C++/MSG contents, and how to run & verify each step before moving on.

############################

PROJECT CONTEXT & GOALS

############################

  • Scenario (not full campus): simulate a GPU-sharing control plane across simple VLAN-like LANs.
  • Components:
    1. GPU hosts periodically advertise availability via UDP-like beacons.
    2. A central scheduler grants short leases to incoming jobs and tracks utilization.
    3. Job clients generate job requests and measure Job Completion Time (JCT).
    4. Two VLAN-like LANs bridged by a simple Router (later we can emulate NAT/DHCP/DNS/RIP/OSPF as light stubs).
    5. Background telnet/ssh-like flows add network noise.
  • Evaluate: GPU utilization & JCT improvements versus a no-sharing baseline.
  • Constraint: Avoid INET unless absolutely required. Prefer pure OMNeT++ cSimpleModule messages & channels. We are fine to “emulate” L2/L3/UDP/TCP behaviors.

#################################

HOW YOU MUST DELIVER OUTPUT

################################# For each phase below:

  • Provide:
    • File tree (paths under the project root).
    • Every new/edited file with full contents in fenced code blocks.
    • Build & Run steps in the OMNeT++ IDE.
    • Concrete verification checks (what I should see: event log messages, statistics names, vectors/scalars, plots).
  • Use these folders:
    • src/gpu/messages/ for .msg
    • src/gpu/modules/ for NED and C++ (.ned/.cc)
    • src/ top-level package.ned
    • simulations/<scenario-name>/ for test NED + omnetpp.ini
  • Always include the exact network = ... line for omnetpp.ini.
  • All code must compile in OMNeT++ (no placeholders).
  • Use signals for metrics; record scalars/vectors; suggest timeavg where relevant.

#################################

PHASE PLAN (implement in order)

#################################

PHASE 0 — Project scaffold

  • Create project GPUShare, package gpu, folders, and confirm a clean build.

PHASE 1 — Minimal VLAN-like LAN bus (no INET)

  • Define messages for Beacon, JobRequest, LeaseGrant, JobStart, JobDone, DataPkt in Lan.msg.
  • Implement a Lan channel module and a VlanBus module that broadcasts frames within the VLAN and applies serialization delay based on DataPkt.bytes.
  • Provide a tiny DummyNode and a GPUShareVlanSmoke network to sanity-check the bus.

PHASE 2 — GPUHost with periodic beacons

  • GPUHost parameters: vlanId, hostId, gpuSlots, beaconInterval.
  • Sends Beacon periodically; consumes LeaseGrant to start a timed job; emits gpuUtilization and sends JobDone on completion.
  • Provide a smoke test network to see periodic beacons.

PHASE 3 — Central Scheduler & JobClient

  • Scheduler: maintains host free-slots from beacons; queues JobRequest; on capacity, grants LeaseGrant (policy: leastLoaded or roundRobin), emits queueLen and leaseGranted.
  • JobClient: generates jobs (Poisson arrivals), sends JobRequest, listens for LeaseGrant, observes JobDone, emits jobCompletionTime.
  • Provide GPUShareMin network and omnetpp.ini to show end-to-end: Beacon → Grant → JobDone and stats vectors.

PHASE 4 — Two VLANs + Router (no OSPF yet)

  • Add second VLAN with another VlanBus.
  • Router forwards cross-VLAN traffic (simple multicast-like forwarding across VLANs).
  • Test a client on VLAN 20 receiving leases from a host/scheduler on VLAN 10 via the router.
  • Verify higher utilization with more clients.

PHASE 5 — Background TCP-like flows

  • BackgroundFlow emits DataPkt bursts to create serialization delays on Lan.
  • Show how higher background rate impacts effective lease timing/JCT.

PHASE 6 — Instrumentation & Reports

  • Ensure signals recorded:
    • GPUHost: gpuUtilization (record timeavg)
    • JobClient: jobCompletionTime
    • Scheduler: queueLen, leaseGranted
  • Provide scavetool/IDE Result Analysis steps to export plots (CDF/boxplot for JCT, utilization time-average).

PHASE 7 — No-sharing baseline comparison

  • Create new omnetpp.ini configurations that disable cross-VLAN GPU sharing to establish a baseline.
  • Implementation approaches:
    • Option A (Recommended): Create isolated single-VLAN networks (e.g., GPUShareSingleVlan10, GPUShareSingleVlan20) where each VLAN has its own scheduler and only accesses local GPU hosts (no router).
    • Option B: Modify Router to block cross-VLAN job control traffic (filter LeaseGrant, JobRequest, etc.) while still allowing beacons for instrumentation.
    • Option C: Modify Scheduler to only grant leases to hosts on the same VLAN (filter by vlanId when selecting hosts).
  • Create two test scenarios:
    • [Config Sharing]: Full Phase 5 setup (two VLANs + Router + cross-VLAN sharing)
    • [Config NoSharing]: Isolated VLANs (no cross-VLAN job assignment)
  • Run both configs with identical parameters: same job arrival rates, job durations, number of clients, background traffic, seeds, and simulation duration.
  • Collect and compare metrics:
    • GPU Utilization (per host, per VLAN average): Expect sharing scenario shows higher utilization due to load balancing across VLANs.
    • Job Completion Time (JCT): Expect sharing scenario shows lower mean/95th percentile JCT due to reduced queueing.
    • Scheduler Queue Length: Expect sharing scenario shows lower average queue length.
    • Jobs Completed: Expect sharing scenario completes more jobs in same time window.
  • Use scavetool or IDE Result Analysis to generate comparison tables and plots:
    • JCT CDF comparison (sharing vs no-sharing)
    • GPU utilization time-series (show better load distribution with sharing)
    • Queue length histograms
  • Expected hypothesis validation: "Cross-VLAN GPU sharing improves utilization by X% and reduces mean JCT by Y%"

PHASE 8 (Optional) — Network Protocol Emulation Stubs

  • Emulate lightweight versions of standard network protocols without using INET framework.

  • Focus on adding realistic processing delays and state management, clearly labeled as "emulation stubs".

  • Implementation suggestions:

    A) NAT (Network Address Translation) Emulation in Router:

    • Add natEnabled boolean parameter to Router module.
    • Maintain a simple NAT translation table: map<srcAddr, translatedAddr>.
    • Add natProcessingDelay parameter (e.g., 20-50μs per packet) to simulate NAT lookup/translation overhead.
    • When forwarding frames between VLANs, apply sendDelayed(msg, natProcessingDelay + forwardingDelay, gate).
    • Optionally log NAT translations: "Router NAT: translating srcAddr=10 → natAddr=200".
    • Purpose: Show impact of NAT processing on cross-VLAN job control latency.

    B) DHCP (Dynamic Host Configuration) Emulation:

    • Create a simple DHCPServer module in one VLAN.
    • At initialization, nodes send DHCPRequest messages to obtain addresses.
    • DHCPServer responds with DHCPAck containing assigned address from a pool.
    • Add dhcpDelay parameter (e.g., 50-100ms) to simulate DHCP handshake latency.
    • Nodes wait for DHCP response before starting normal operation (beacon/job generation).
    • Purpose: Demonstrate network bootstrapping delay impact on system startup time.

    C) DNS (Domain Name Service) Emulation:

    • Create a DNSServer module that maintains a simple map<hostName, hostId> registry.
    • Nodes send DNSQuery messages to resolve symbolic names (e.g., "gpu-host-10") to numeric hostId.
    • DNSServer responds with DNSResponse containing resolved hostId.
    • Add dnsLookupDelay parameter (e.g., 10-30ms) to simulate DNS query latency.
    • Cache DNS responses at clients to avoid repeated lookups.
    • Purpose: Show overhead of name resolution in distributed GPU scheduling.

    D) RIP/OSPF (Routing Protocol) Emulation in Router:

    • Add periodic RoutingUpdate message exchange between routers (if multiple routers exist).
    • Maintain a simple routing table: map<vlanId, outputGate>.
    • Add routingUpdateInterval parameter (e.g., 30s for RIP-like behavior).
    • Add routingProcessingDelay parameter (e.g., 100-500μs) for routing table lookup.
    • Log routing updates: "Router learned route to VLAN 30 via port 2".
    • Purpose: Demonstrate routing protocol overhead in multi-router topologies.
  • Implementation guidelines:

    • Keep protocols simple and lightweight (e.g., no full TCP handshake simulation).
    • Use explicit delays to model processing overhead (e.g., natProcessingDelay, dnsLookupDelay).
    • Add debug logging to show protocol operations (e.g., "NAT translating...", "DNS resolving...").
    • Create separate NED modules for each protocol service (DHCPServer, DNSServer).
    • Add protocol messages to Lan.msg (e.g., DHCPRequest, DNSQuery, RoutingUpdate).
    • Provide dedicated test scenarios (e.g., [Config WithNAT], [Config WithDNS]).
    • Label all emulation code with comments: // EMULATION: Simplified DHCP, not full protocol.
  • Testing approach:

    • Create [Config Baseline]: No protocol emulation (Phase 5 setup).
    • Create [Config WithNAT]: Enable NAT processing delay in Router.
    • Create [Config WithDHCP]: Enable DHCP bootstrapping delay.
    • Create [Config AllProtocols]: Enable all protocol emulations (NAT+DHCP+DNS).
    • Compare JCT and system startup time across configs to quantify protocol overhead.
    • Expected result: Protocol overhead adds 5-15% to JCT depending on cumulative delays.
  • Documentation:

    • Clearly state in code and docs: "This is a lightweight emulation for educational purposes, not a full protocol implementation".
    • Provide a table mapping emulated features to real protocol behaviors:
      | Emulated Feature      | Real Protocol Equivalent       | Accuracy Level |
      |-----------------------|--------------------------------|----------------|
      | NAT address mapping   | RFC 3022 NAT translation       | Delay model    |
      | DHCP request/ack      | RFC 2131 DHCP handshake        | Delay model    |
      | DNS query/response    | RFC 1035 DNS lookup            | Delay model    |
      | Routing table updates | RIP/OSPF routing updates       | Delay model    |
      
  • Key insight from Phase 8: Even "lightweight" network protocols introduce measurable latency overhead in distributed GPU scheduling systems. Quantifying this overhead helps inform real-world deployment decisions (e.g., using static IP addressing to avoid DHCP delay).

###############################

STYLE / RIGOR REQUIREMENTS

###############################

  • Every phase must be runnable as-is. Include full code (no “…”).
  • After each phase, include a CHECKLIST:
    • Build succeeds
    • Simulation runs
    • Expected messages appear in event log
    • Named statistics appear (list exact vector/scalar names)
    • What numbers/trends I should roughly see (e.g., utilization oscillating 0↔1)
  • Keep names consistent across files and modules.
  • Prefer simple, deterministic parameters in omnetpp.ini plus explicit random distributions where stated (e.g., exponential(iaMean)).
  • All gates and connections must be valid; show exact gate indices in the test networks.

######################################

INTERACTION MODEL (VERY IMPORTANT)

######################################

  • Start with PHASE 0 only.
  • Wait for me to say “Next phase” (or “Phase 1”, “Phase 2”, etc.) before proceeding.
  • If I say “rerun” or “fix error ”, diagnose and deliver corrected code.
  • If I say “package and ship”, provide a recap of the whole file tree and any .ini config variants for experiments (sharing vs baseline) and how to batch-run them.

Now begin with PHASE 0 exactly as specified.