Skip to content

Commit 1637ac8

Browse files
committed
docs: fix service name example, response payload, method count, and GC coordinator lookup
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
1 parent a1fce1b commit 1637ac8

1 file changed

Lines changed: 14 additions & 13 deletions

File tree

docs/design/multi-agent-runtime-proposal.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -236,12 +236,12 @@ The environment variables injected into the dependent pod's containers point to
236236
**Injection Scope:**
237237
* The dependency endpoints are injected into the `Env` list of **all containers** (including primary, sidecar, and init-containers) defined in the pod spec. This ensures that any multi-container runtime configuration can reliably resolve the endpoints.
238238
239-
For a role with `dependencies: [my-planner]` in namespace `default` (where `my-planner` maps to service name `grp-xyz-my-planner` and exposes a port named `api` at `8080` and `metrics` at `9090`), the dependent pod's containers receive:
239+
For a role with `dependencies: [my-planner]` in namespace `default` (where `my-planner` maps to service name `mar-abcdef12-my-planner` and exposes a port named `api` at `8080` and `metrics` at `9090`), the dependent pod's containers receive:
240240
241241
```
242-
AGENTCUBE_DEP_MY_PLANNER_ENDPOINT = grp-xyz-my-planner.default.svc.cluster.local:8080
243-
AGENTCUBE_DEP_MY_PLANNER_PORT_API_ENDPOINT = grp-xyz-my-planner.default.svc.cluster.local:8080
244-
AGENTCUBE_DEP_MY_PLANNER_PORT_METRICS_ENDPOINT = grp-xyz-my-planner.default.svc.cluster.local:9090
242+
AGENTCUBE_DEP_MY_PLANNER_ENDPOINT = mar-abcdef12-my-planner.default.svc.cluster.local:8080
243+
AGENTCUBE_DEP_MY_PLANNER_PORT_API_ENDPOINT = mar-abcdef12-my-planner.default.svc.cluster.local:8080
244+
AGENTCUBE_DEP_MY_PLANNER_PORT_METRICS_ENDPOINT = mar-abcdef12-my-planner.default.svc.cluster.local:9090
245245
```
246246
247247
Injection happens in-memory inside `createSandboxGroup()` by mutating the pod template before it is passed to `buildSandboxByAgentRuntime()`. The referenced `AgentRuntime` CRD object in the informer cache is never written.
@@ -460,7 +460,7 @@ sequenceDiagram
460460
461461
WM->>Store: SaveAgentGroup(manifest)
462462
WM-->>Router: CreateAgentGroupResponse
463-
Router-->>Client: 200 OK + groupSessionId
463+
Router-->>Client: 200 OK + CreateAgentGroupResponse
464464
```
465465

466466
### Topological Sort and Cycle Detection
@@ -594,7 +594,7 @@ type AgentGroupRole struct {
594594

595595
### Store Interface Additions
596596

597-
Four new methods are added to the `Store` interface in `pkg/store/interface.go`. All existing methods are unchanged.
597+
Five new methods are added to the `Store` interface in `pkg/store/interface.go`. All existing methods are unchanged.
598598

599599
```go
600600
// SaveAgentGroup persists a group manifest keyed by groupSessionID.
@@ -795,11 +795,12 @@ The existing GC in `pkg/workloadmanager/garbage_collection.go` is extended with
795795
Because the Router only proxies external traffic directly to the coordinator, only the coordinator's `LastActivityAt` timestamp in the store is updated during active sessions. Internal worker sandboxes that receive no direct external traffic would otherwise retain static `LastActivityAt` values, causing the GC to prematurely delete them while the coordinator is still active.
796796

797797
To prevent this, the GC evaluates idle timeouts group-wide:
798-
1. When checking if a sandbox is idle, if its `GroupSessionID` is non-empty, the GC retrieves the group manifest once per GC cycle (cached in a `map[string]*AgentGroupManifest` local to the cycle) and looks up the coordinator sandbox from the manifest.
799-
2. The idle duration for **all members of the group** is calculated based on the coordinator's `LastActivityAt` timestamp (or the maximum `LastActivityAt` among all group member sandboxes if the coordinator's timestamp is unavailable).
800-
3. Individual sandboxes in a group are only deleted for inactivity if the group as a whole is determined to be idle.
798+
1. When checking if a sandbox is idle, if its `GroupSessionID` is non-empty, the GC retrieves the group manifest once per GC cycle via `GetAgentGroup()` (result cached in a `map[string]*AgentGroupManifest` local to that cycle). The manifest's `role:*` fields contain the `SessionID` of each member, including the coordinator.
799+
2. The coordinator's `SandboxInfo` (including `LastActivityAt`) is fetched with a single `GetSandbox(coordinatorSessionID)` call. This result is also cached per group per cycle, so it is only fetched once regardless of how many worker sandboxes belong to that group. If the coordinator's `SandboxInfo` is unavailable (e.g., already evicted), the GC falls back to the maximum `LastActivityAt` among all group member sandboxes whose `SandboxInfo` can be retrieved.
800+
3. The idle duration for **all members of the group** is calculated based on the resolved coordinator (or fallback) `LastActivityAt` timestamp.
801+
4. Individual sandboxes in a group are only deleted for inactivity if the group as a whole is determined to be idle.
801802

802-
Caching the manifest per group per GC cycle avoids O(N) redundant store lookups where N is the number of worker sandboxes in the group.
803+
Caching both the group manifest and the coordinator `SandboxInfo` per group per GC cycle reduces the total number of store roundtrips to O(1) per group rather than O(N) per group member.
803804

804805
### Group Metadata Cleanup
805806

@@ -908,9 +909,9 @@ This feature is fully backward compatible. No existing behavior changes unless t
908909
| `pkg/workloadmanager/server.go` | Add 3 new routes under `/v1/multi-agent-runtime` |
909910
| `pkg/workloadmanager/garbage_collection.go` | Group manifest cleanup when last member sandbox is GC'd |
910911
| `pkg/store/interface.go` | Add `SaveAgentGroup`, `GetAgentGroup`, `DeleteAgentGroup`, `DeleteAgentGroupRole`, `UpdateAgentGroupRoleStatus` |
911-
| `pkg/store/store_redis.go` | Implement all 4 group methods |
912+
| `pkg/store/store_redis.go` | Implement all 5 group methods |
912913
| `pkg/store/store_redis_test.go` | Group CRUD tests |
913-
| `pkg/store/store_valkey.go` | Implement all 4 group methods |
914+
| `pkg/store/store_valkey.go` | Implement all 5 group methods |
914915
| `pkg/store/store_valkey_test.go` | Group CRUD tests |
915916
| `pkg/router/session_manager.go` | Add `MultiAgentRuntimeKind` case in endpoint switch |
916917
| `cmd/workload-manager/main.go` | Phase 1: HTTP routes; Phase 4: reconciler wiring |
@@ -933,7 +934,7 @@ Deliverables that satisfy the mentorship expected outcomes on their own.
933934
- Role names must be valid DNS label fragments (lowercase alphanumeric and hyphens, max 63 characters).
934935
- Implement `createSandboxGroup()` with `Atomic` rollback (no `BestEffort` yet).
935936
- Add `GroupSessionID` + `Role` to `SandboxInfo`; propagate through `buildSandboxPlaceHolder()` + `buildSandboxInfo()`.
936-
- Implement all 4 store methods in `store_redis.go` + `store_valkey.go` with full unit test coverage.
937+
- Implement all 5 store methods in `store_redis.go` + `store_valkey.go` with full unit test coverage.
937938
- Add `MultiAgentRuntimeKind` to Router endpoint switch.
938939
- Extend GC to clean up `agentgroup:` manifest keys when last member sandbox is deleted.
939940
- Unit tests: `createSandboxGroup()` with atomic rollback on partial failure, store CRUD, coordinator validation, cycle detection, admission webhook validation.

0 commit comments

Comments
 (0)