CSCI 599: Network Systems for Cloud Computing — Spring 2026
Advisor: Prof. Ramesh Govindan · USC
Edge inference nodes cannot afford MicroVMs like Firecracker — process
pools backed by Copy-on-Write fork are the more realistic isolation
choice. This project implements a lightweight FaaS gateway in C++ and
Python that significantly reduces cold-start counts at
sub-microsecond control-plane overhead.
| Metric | Value (n=5, 95% CI) |
|---|---|
| Cold-start reduction (vs Reactive baseline) | −31% (48 ± 19 vs 70 ± 1) |
| Cold-start reduction (vs ARIMA baseline) | −26% (48 ± 19 vs 64 ± 12) |
| Worker spin-up speed-up (CoW template) | 9× (~900 ms → ~100 ms) |
| Predictor inference latency p99 | 191 μs (more than two orders of magnitude below ARIMA's 72 ms) |
| Inter-burst sweet spot | W = 20 – 60 s (both fixed and adaptive predictors near zero cold) |
At gateway startup we fork a single Python "template" process that
imports heavyweight dependencies (Pillow, OpenCV, …) once. Every
subsequent worker is created via os.fork() from the template, inheriting
its address space through Linux Copy-on-Write — the fork itself is
nearly free, and only pages that are actually written get copied.
→ No container runtime, no image registry, no VM snapshot.
- EWMA tracks the periodic baseline rate (α = 0.2, τ ≈ 10 s)
- CUSUM detects sustained deviation: it fires during the pre-spike traffic ramp instead of waiting for the spike to land
- Little's Law (
N = ⌈λ × T⌉ + 1) maps the predicted arrival rate to a target worker count
The full predictor is O(1) in time and space, with sub-microsecond per-tick inference cost.
The four figures below come from the paper experiments (n=5 trials, 95% CI). See the Final Report (PDF) for the full analysis.
Dependencies:
sudo apt install build-essential
pip install Pillow statsmodels psutil scipyBuild:
make clean && makeLaunch the server (pick one of five modes):
./server ewma # EWMA + Fixed CUSUM — main method (default)
./server ewma_adaptive # Standardized CUSUM — z-score variant
./server reactive # Reactive baseline (no prediction)
./server static 15 # Static-15 baseline (15 pinned workers)
./server arima # ARIMA(2,1,2) baseline
# Disable the CoW template (for ablation experiments)
./server ewma --no-cowGenerate load (defaults to a 4-cycle Bursty-Ramp workload):
python3 load_tester.pyOptional flags:
python3 load_tester.py --warmup-c234 60 # tune inter-burst interval
python3 load_tester.py --spike-rps 100 # tune spike RPS magnitude
python3 load_tester.py --no-ramp # step workload (no ramp signal)# (1) Main experiment: 5 modes × n=5 trials × CoW={ON,OFF} = 50 runs, ~3 h
./run_multi_trial.sh 5 --ablation
# (2) Warmup sweep: 6 W × 2 modes × n=5 = 60 runs, ~3 h
./run_sweep.sh --trials 5
# (3) Step workload (no ramp): 5 modes × n=3 = 15 runs, ~50 min
./run_multi_trial.sh 3 --no-ramp --tag step
# (4) Aggregate with 95% CI (emits markdown table + summary.csv + per_cycle.csv)
python3 analyze_trials.py logs/<campaign_dir>/
# (5) Render the paper figures (reads summary.csv)
python3 figures/plot_main_n5.py logs/<main_campaign>/
python3 figures/plot_ablation.py logs/<main_campaign>/
python3 figures/plot_pareto_pss.py logs/<main_campaign>/
python3 figures/plot_sweep_n5.py logs/<sweep_campaign>/For the full development trace — proposal, Check-in #1, Check-in #2, class presentation — including design decisions and lessons learned:




