AsyncFlow-Sim
diff --git a/‎docs/internals/server-events-injection.md‎
Lines changed: 203 additions & 0 deletions b/‎docs/internals/server-events-injection.md‎
Lines changed: 203 additions & 0 deletions
diff --git a/‎src/asyncflow/runtime/actors/edge.py‎
Lines changed: 3 additions & 3 deletions b/‎src/asyncflow/runtime/actors/edge.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/asyncflow/runtime/actors/load_balancer.py‎
Lines changed: 27 additions & 24 deletions b/‎src/asyncflow/runtime/actors/load_balancer.py‎
Lines changed: 27 additions & 24 deletions
diff --git a/‎src/asyncflow/runtime/actors/routing/lb_algorithms.py‎
Lines changed: 32 additions & 17 deletions b/‎src/asyncflow/runtime/actors/routing/lb_algorithms.py‎
Lines changed: 32 additions & 17 deletions
@@ -0,0 +1,203 @@
+# Server Event Injection — End-to-End Design & Rationale
+
+This document explains how **server-level events** (planned outages) are modeled and executed across all layers of the simulation stack. It complements the Edge Event Injection design.
+
+---
+
+## 1) Goals
+
+* Hide outage semantics from the load balancer algorithms: **they see only the current set of edges**.
+* Keep **runtime cost O(1)** per transition (down/up).
+* Preserve determinism and fairness when servers rejoin.
+* Centralize event logic; avoid per-server coroutines and ad-hoc flags.
+
+---
+
+## 2) Participants (layers)
+
+* **Schema / Validation (Pydantic)**: validates `EventInjection` objects (pairing, order, target existence).
+* **SimulationRunner**: builds runtimes; owns the **single shared** `OrderedDict[str, EdgeRuntime]` used by the LB (`_lb_out_edges`).
+* **EventInjectionRuntime**: central event engine; builds the **server timeline** and a **reverse index** `server_id → (edge_id, EdgeRuntime)`; mutates `_lb_out_edges` at runtime.
+* **LoadBalancerRuntime**: reads `_lb_out_edges` to select the next edge (RR / least-connections). **No outage logic inside.**
+* **EdgeRuntime (LB→Server edges)**: unaffected by server outages; disappears from the LB’s choice set while the server is down.
+* **ServerRuntime**: unaffected structurally; no extra checks for “am I down?”.
+* **SimPy Environment**: schedules the central outage coroutine.
+* **Metric Collector**: optional; observes effects but is not part of the mechanism.
+
+---
+
+## 3) Data & Structures
+
+* **`_lb_out_edges: OrderedDict[str, EdgeRuntime]`**
+  Single shared map of **currently routable** LB→server edges.
+
+  * Removal/Insertion/Move are **O(1)**.
+  * Aliased into both `LoadBalancerRuntime` and `EventInjectionRuntime`.
+
+* **`_servers_timeline: list[tuple[time, event_id, server_id, mark]]`**
+  Absolute timestamps, sorted by `(time, mark == start, event_id, server_id)` so **END precedes START** when equal.
+
+* **`_edge_by_server: dict[str, tuple[str, EdgeRuntime]]`**
+  Reverse index built from `_lb_out_edges` at initialization.
+
+---
+
+## 4) Build-time Responsibilities
+
+* **SimulationRunner**
+
+  1. Build LB and pass it `_lb_out_edges` (empty at first).
+  2. Build edges; when wiring LB→Server, insert that edge into `_lb_out_edges`.
+  3. Build `EventInjectionRuntime`, passing:
+
+     * validated `events`
+     * `servers` and `edges` (IDs for sanity checks)
+     * aliased `_lb_out_edges`
+
+* **EventInjectionRuntime.**init****
+
+  * Partition events; construct ` _servers_timeline`.
+  * Sort timeline (END before START at equal `time`).
+  * Build ` _edge_by_server` by scanning `_lb_out_edges` (edge target → server\_id).
+
+---
+
+## 5) Run-time Responsibilities
+
+* **EventInjectionRuntime.\_assign\_server\_state()**
+
+  * Iterate the server timeline with absolute→relative waits: `dt = t_event − last_t`, then `yield env.timeout(dt)`.
+  * On `SERVER_DOWN` (START):
+    `lb_out_edges.pop(edge_id, None)`
+  * On `SERVER_UP` (END):
+
+    ```
+    lb_out_edges[edge_id] = edge_runtime
+    lb_out_edges.move_to_end(edge_id)  # fairness on rejoin
+    ```
+
+* **LoadBalancerRuntime**
+
+  * For each request, read `_lb_out_edges` and apply the chosen algorithm. If a server is down, its edge simply **isn’t there**.
+
+* **EdgeRuntime & ServerRuntime**
+
+  * No additional work: outage is reflected entirely by presence/absence of the LB→server edge.
+
+---
+
+## 6) Sequence Overview (all layers)
+
+```
+User YAML ──► Schema/Validation
+                 │  (pairing, ordering, target checks)
+                 ▼
+           SimulationRunner
+                 │  _lb_out_edges: OrderedDict[...]  (shared object)
+                 │  build LB, edges (LB→S inserted into _lb_out_edges)
+                 │  build EventInjectionRuntime(..., lb_out_edges=alias)
+                 │
+                 ├─ _start_events()
+                 │     └─ EventInjectionRuntime.start()
+                 │           └─ start _assign_server_state()  (SimPy proc)
+                 │
+                 ├─ _start_all_processes()
+                 │     ├─ LoadBalancerRuntime.start()
+                 │     ├─ EdgeRuntime.start()     (if any process)
+                 │     └─ ServerRuntime.start()
+                 │
+                 └─ env.run(until=T)
+
+Runtime progression (example):
+t=5s   EventInjectionRuntime: SERVER_DOWN(S1)
+       └─ _edge_by_server[S1] -> (edge-S1, edge_rt)
+       └─ _lb_out_edges.pop("edge-S1")           # O(1)
+
+t=7s   LoadBalancerRuntime picks next edge
+       └─ "edge-S1" not present → never selected
+
+t=10s  EventInjectionRuntime: SERVER_UP(S1)
+       └─ _lb_out_edges["edge-S1"] = edge_rt     # O(1)
+       └─ _lb_out_edges.move_to_end("edge-S1")   # fairness
+
+t>10s  LoadBalancerRuntime now sees edge-S1 again
+       └─ RR/LC proceeds as usual
+```
+
+---
+
+## 7) Correctness & Determinism
+
+* **Exact timing**: absolute→relative conversion ensures transitions happen at precise timestamps.
+* **END before START** at identical times prevents spuriously “stuck down” outcomes for back-to-back events.
+* **Fair rejoin**: `move_to_end` reintroduces the server in a predictable RR position (least recently used).
+  (Least-connections remains deterministic because the edge reappears with its current connection count.)
+* **Availability constraint**: schema can enforce “at least one server up,” avoiding degenerate LB states.
+
+---
+
+## 8) Design Choices & Rationale
+
+* **Mutate the edge set, not the algorithm**
+  Removing/adding the LB→server edge keeps LB code **pure** and reusable; no conditional branches for “down servers”.
+* **Single shared `OrderedDict`**
+
+  * O(1) for remove/insert/rotate.
+  * Aliasing between LB and injector removes the need for signaling or copies.
+* **Centralized coroutine**
+  One SimPy process for server outages scales better than per-server processes; simpler mental model.
+* **Reverse index `server_id → edge`**
+  Constant-time resolution; avoids coupling servers to LB or vice-versa.
+
+---
+
+## 9) Performance
+
+* **Build**:
+
+  * Timeline construction: O(#server-events)
+  * Sort: O(#server-events · log #server-events)
+* **Run**:
+
+  * Each transition: O(1) (pop/set/move)
+  * LB pick: unchanged (RR O(1), LC O(n))
+* **Space**:
+
+  * Reverse index: O(#servers with LB edges)
+  * Timeline: O(#server-events)
+
+---
+
+## 10) Failure Modes & Guards
+
+* Unknown server in an event → rejected by schema (or ignored with a log if you prefer leniency).
+* Concurrent DOWN/UP at same timestamp → resolved by timeline ordering (END first).
+* All servers down → disallowed by schema (or handled by LB guard if you opt in later).
+* Missing reverse mapping (no LB) → injector safely no-ops.
+
+---
+
+## 11) Extensibility
+
+* **Multiple LB instances**: make the reverse index `(lb_id, server_id) → edge_id`, or pass per-LB `lb_out_edges`.
+* **Partial capacity**: instead of removing edges, attach capacity/weight and have the LB respect it (requires extending LB policy).
+* **Dynamic scale-out**: adding new servers at runtime is the same operation as “UP” with a previously unseen edge.
+
+---
+
+## 12) Operational Notes
+
+* Start the **event coroutine** before LB to avoid off-by-one delivery at `t_start`.
+* Keep `_lb_out_edges` the **only source of truth** for routable edges.
+* If you also use edge-level spikes, both coroutines can run concurrently; they are independent.
+
+---
+
+## 13) Summary
+
+We model server outages by **mutating the LB’s live edge set** via a centralized event runtime:
+
+* **O(1)** down/up transitions by `pop`/`set` on a shared `OrderedDict`.
+* LB algorithms remain untouched and deterministic.
+* A single SimPy coroutine drives the timeline; a reverse index resolves targets in constant time.
+* The design is minimal, performant, and easy to extend to richer failure models.
@@ -56,7 +56,7 @@ def __init__( # Noqa: PLR0913
         self.edges_affected = edges_affected
         self.target_box = target_box
         self.rng = rng or np.random.default_rng()
-        self.setting = settings
+        self.settings = settings
         self._edge_enabled_metrics = build_edge_metrics(
             settings.enabled_sample_metrics,
         )
@@ -93,8 +93,8 @@ def _deliver(self, state: RequestState) -> Generator[simpy.Event, None, None]:
         # Logic to add if exists the event injection for the given edge
         spike = 0.0
         if (
-            self.edges_affected
-            and self.edges_spike
+            self.edges_spike
+            and self.edges_affected
             and self.edge_config.id in self.edges_affected
         ):
             spike = self.edges_spike.get(self.edge_config.id, 0.0)
 
@@ -1,16 +1,17 @@
 """Definition of the node represented by the LB in the simulation"""
 
+
+from collections import OrderedDict
 from collections.abc import Generator
-from typing import TYPE_CHECKING
+from typing import (
+    TYPE_CHECKING,
+)
 
 import simpy
 
-from asyncflow.config.constants import LbAlgorithmsName, SystemNodes
+from asyncflow.config.constants import SystemNodes
 from asyncflow.runtime.actors.edge import EdgeRuntime
-from asyncflow.runtime.actors.routing.lb_algorithms import (
-    least_connections,
-    round_robin,
-)
+from asyncflow.runtime.actors.routing.lb_algorithms import LB_TABLE
 from asyncflow.schemas.topology.nodes import LoadBalancer
 
 if TYPE_CHECKING:
@@ -26,29 +27,38 @@ def __init__(
         *,
         env: simpy.Environment,
         lb_config: LoadBalancer,
-        out_edges: list[EdgeRuntime] | None,
+
+        # We use an OrderedDict because, for the RR algorithm,
+        # we rotate elements in O(1) by moving the selected key to the end.
+        # An OrderedDict also lets us remove an element by key in O(1)
+        # without implementing a custom doubly linked list + hashmap.
+        # Keys are the unique edge IDs that connect the LB to the servers.
+        # If multiple LBs are present, the SimulationRunner assigns
+        # the correct dict to each LB. Removals/insertions are performed
+        # by the EventInjectionRuntime.
+
+        lb_out_edges: OrderedDict[str, EdgeRuntime],
         lb_box: simpy.Store,
     ) -> None:
         """
         Descriprion of the instance attributes for the class
         Args:
-            env (simpy.Environment): env of the simulation
-            lb_config (LoadBalancer): input to define the lb in the runtime
-            rqs_state (RequestState): state of the simulation
-            out_edges (list[EdgeRuntime]): list of edges that connects lb with servers
-            lb_box (simpy.Store): store to add the state
-
+            env (simpy.Environment): Simulation environment.
+            lb_config (LoadBalancer): LB configuration for the runtime.
+            out_edges (OrderedDict[str, EdgeRuntime]): Edges connecting
+            the LB to servers.
+            lb_box (simpy.Store): Queue (mailbox) from which the LB
+            consumes request states.
         """
         self.env = env
         self.lb_config = lb_config
-        self.out_edges = out_edges
+        self.lb_out_edges = lb_out_edges
         self.lb_box = lb_box
-        self._round_robin_index: int = 0
+
 
 
     def _forwarder(self) -> Generator[simpy.Event, None, None]:
         """Updtate the state before passing it to another node"""
-        assert self.out_edges is not None
         while True:
             state: RequestState = yield self.lb_box.get()  # type: ignore[assignment]
 
@@ -58,14 +68,7 @@ def _forwarder(self) -> Generator[simpy.Event, None, None]:
                     self.env.now,
                 )
 
-            if self.lb_config.algorithms == LbAlgorithmsName.ROUND_ROBIN:
-                out_edge, self._round_robin_index = round_robin(
-                    self.out_edges,
-                    self._round_robin_index,
-                )
-            else:
-                out_edge = least_connections(self.out_edges)
-
+            out_edge = LB_TABLE[self.lb_config.algorithms](self.lb_out_edges)
             out_edge.transport(state)
 
     def start(self) -> simpy.Process:
 
@@ -1,30 +1,45 @@
 """algorithms to simulate the load balancer during the simulation"""
 
+from collections import OrderedDict
+from collections.abc import Callable
 
-
+from asyncflow.config.constants import LbAlgorithmsName
 from asyncflow.runtime.actors.edge import EdgeRuntime
 
 
-def least_connections(list_edges: list[EdgeRuntime]) -> EdgeRuntime:
-    """We send the state to the edge with less concurrent connections"""
-    concurrent_connections = [edge.concurrent_connections for edge in list_edges]
-
-    idx_min = concurrent_connections.index(min(concurrent_connections))
-
-    return list_edges[idx_min]
-
-def round_robin(edges: list[EdgeRuntime], idx: int) -> tuple[EdgeRuntime, int]:
+def least_connections(
+    edges: OrderedDict[str, EdgeRuntime],
+    ) -> EdgeRuntime:
+    """Return the edge with the fewest concurrent connections"""
+    # Here we use a O(n) operation, considering the amount of edges
+    # for the average simulation it should be ok, however, in the
+    # future we might consider to implement an heap structure to
+    # reduce the time complexity, especially if we will see
+    # during the Montecarlo analysis not good performances
+    name = min(edges, key=lambda k: edges[k].concurrent_connections)
+    return edges[name]
+
+def round_robin(
+    edges: OrderedDict[str, EdgeRuntime],
+    ) -> EdgeRuntime:
     """
     We send states to different server in uniform way by
-    rotating the list of edges that should transport the state
-    to the correct server, we rotate the index and not the list
-    to avoid aliasing since the list is shared by many components
+    rotating the ordered dict, given the pydantic validation
+    we don't have to manage the edge case where the dict
+    is empty
     """
-    idx %= len(edges)
-    chosen = edges[idx]
-    idx = (idx + 1) % len(edges)
-    return chosen, idx
+    # we use iter next creating all time a new iterator
+    # to be sure that we return always the first element
+    key, value = next(iter(edges.items()))
+    edges.move_to_end(key)
+
+    return value
 
 
+LB_TABLE: dict[LbAlgorithmsName,
+               Callable[[OrderedDict[str, EdgeRuntime]], EdgeRuntime]] = {
+    LbAlgorithmsName.LEAST_CONNECTIONS: least_connections,
+    LbAlgorithmsName.ROUND_ROBIN: round_robin,
+}