|
| 1 | +# Redis Backend for IPP Metrics Aggregation |
| 2 | + |
| 3 | +This guide covers how to deploy, configure, and verify Redis as the external store for IPP cross-replica metrics aggregation. |
| 4 | + |
| 5 | +For the architecture and design rationale, see the [HA Metrics Aggregation design proposal](../../docs/proposals/ha-metrics-aggregation/README.md). |
| 6 | +For background on requirements, see [Issue #79](https://github.com/llm-d/llm-d-inference-payload-processor/issues/79) and [Issue #85](https://github.com/llm-d/llm-d-inference-payload-processor/issues/85). |
| 7 | + |
| 8 | +## How It Works |
| 9 | + |
| 10 | +``` |
| 11 | +Replica 1 Replica 2 Replica 3 |
| 12 | +Put→local Put→local Put→local |
| 13 | +(model-a: reqs=2, (model-a: reqs=3, (model-a: reqs=0, |
| 14 | + tok=8000) tok=4000) tok=0) |
| 15 | + │ │ │ |
| 16 | + └── heartbeat ──────────┼── heartbeat ──────────┘ |
| 17 | + ▼ |
| 18 | + Redis (user-owned) |
| 19 | + r1:model-a:inflight-requests = {Reqs:2,Tok:8000} TTL 10s |
| 20 | + r2:model-a:inflight-requests = {Reqs:3,Tok:4000} TTL 10s |
| 21 | + r3:model-a:inflight-requests = {Reqs:0,Tok:0} TTL 10s |
| 22 | + ▲ |
| 23 | + ┌── refresh (~1s) ──────┼── refresh (~1s) ──────┐ |
| 24 | + ▼ ▼ ▼ |
| 25 | +Get→cache: model-a Get→cache: model-a Get→cache: model-a |
| 26 | + reqs=5, tok=12000 reqs=5, tok=12000 reqs=5, tok=12000 |
| 27 | +Scorer reads Scorer reads Scorer reads |
| 28 | + {Reqs:5, Tok:12000} {Reqs:5, Tok:12000} {Reqs:5, Tok:12000} |
| 29 | +``` |
| 30 | + |
| 31 | +Each replica keeps two in-memory maps (local data and cache) and uses two background goroutines (heartbeat and refresh) to sync through Redis. Scorers always read from the local cache — no network call on the request path. For the full architecture, see the [design proposal](../../docs/proposals/ha-metrics-aggregation/README.md). |
| 32 | + |
| 33 | +## Prerequisites |
| 34 | + |
| 35 | +A running Redis instance accessible from the IPP pods. IPP does not deploy or manage Redis — you provide the endpoint (via a Redis Operator, Helm chart, or managed service like ElastiCache). |
| 36 | + |
| 37 | +Redis requirements: |
| 38 | + |
| 39 | +- Redis 6.x or later |
| 40 | +- No persistence needed (`--save ""`) — all IPP metrics are transient and reconstructed from live traffic |
| 41 | +- No AOF, no RDB |
| 42 | +- Single-node Redis is sufficient (if Redis restarts, IPP continues with cached data until keys rebuild) |
| 43 | + |
| 44 | +## Deploy Redis |
| 45 | + |
| 46 | +Choose one of the following options. |
| 47 | + |
| 48 | +### Option 1: Simple Deployment (dev / testing) |
| 49 | + |
| 50 | +```bash |
| 51 | +kubectl apply -f - <<EOF |
| 52 | +apiVersion: apps/v1 |
| 53 | +kind: Deployment |
| 54 | +metadata: |
| 55 | + name: ipp-redis |
| 56 | + namespace: llm-d |
| 57 | +spec: |
| 58 | + replicas: 1 |
| 59 | + selector: |
| 60 | + matchLabels: |
| 61 | + app: ipp-redis |
| 62 | + template: |
| 63 | + metadata: |
| 64 | + labels: |
| 65 | + app: ipp-redis |
| 66 | + spec: |
| 67 | + containers: |
| 68 | + - name: redis |
| 69 | + image: redis:7-alpine |
| 70 | + args: ["--save", "", "--maxmemory", "64mb", "--maxmemory-policy", "allkeys-lru"] |
| 71 | + ports: |
| 72 | + - containerPort: 6379 |
| 73 | + resources: |
| 74 | + requests: |
| 75 | + cpu: 100m |
| 76 | + memory: 64Mi |
| 77 | + limits: |
| 78 | + cpu: 250m |
| 79 | + memory: 128Mi |
| 80 | +--- |
| 81 | +apiVersion: v1 |
| 82 | +kind: Service |
| 83 | +metadata: |
| 84 | + name: ipp-redis |
| 85 | + namespace: llm-d |
| 86 | +spec: |
| 87 | + selector: |
| 88 | + app: ipp-redis |
| 89 | + ports: |
| 90 | + - port: 6379 |
| 91 | +EOF |
| 92 | +``` |
| 93 | + |
| 94 | +The Redis endpoint is `ipp-redis.llm-d.svc.cluster.local:6379`. |
| 95 | + |
| 96 | +### Option 2: Redis Operator (production) |
| 97 | + |
| 98 | +If your cluster runs a Redis Operator (e.g., [Spotahome](https://github.com/spotahome/redis-operator),[OpsTree](https://github.com/OT-CONTAINER-KIT/redis-operator)), create a minimal standalone instance: |
| 99 | + |
| 100 | +```yaml |
| 101 | +apiVersion: databases.spotahome.com/v1 |
| 102 | +kind: RedisFailover |
| 103 | +metadata: |
| 104 | + name: ipp-redis |
| 105 | + namespace: llm-d |
| 106 | +spec: |
| 107 | + sentinel: |
| 108 | + replicas: 3 |
| 109 | + redis: |
| 110 | + replicas: 2 |
| 111 | + customConfig: |
| 112 | + - "save \"\"" |
| 113 | + - "appendonly no" |
| 114 | +``` |
| 115 | +
|
| 116 | +Refer to your operator's documentation for the resulting Service endpoint: |
| 117 | +
|
| 118 | +```bash |
| 119 | +kubectl get svc -n llm-d | grep redis |
| 120 | +``` |
| 121 | + |
| 122 | +Use the Service name as the `datastore.redis.endpoint` value (see [Configure IPP to Use Redis](#configure-ipp-to-use-redis) below). |
| 123 | + |
| 124 | +### Option 3: Managed Redis (cloud) |
| 125 | + |
| 126 | +Use your cloud provider's managed Redis service: |
| 127 | + |
| 128 | + |
| 129 | +| Provider | Service | Notes | |
| 130 | +| -------- | --------------------- | ------------------------------------- | |
| 131 | +| AWS | ElastiCache for Redis | Disable backups, use `cache.t3.micro` | |
| 132 | +| GCP | Memorystore for Redis | Basic tier, no replicas needed | |
| 133 | +| Azure | Azure Cache for Redis | Basic C0, disable persistence | |
| 134 | + |
| 135 | + |
| 136 | +Use the endpoint provided by the managed service (e.g., `my-redis.abc123.cache.amazonaws.com:6379`). |
| 137 | + |
| 138 | +## Configure IPP to Use Redis |
| 139 | + |
| 140 | +> **Note:** The configuration flags and Helm values below are part of the Redis backend implementation (not yet merged). They document the target interface. |
| 141 | +
|
| 142 | +### Helm |
| 143 | + |
| 144 | +Set the Redis backend in your `values.yaml` (add the `datastore` section alongside existing fields): |
| 145 | + |
| 146 | +```yaml |
| 147 | +payloadProcessor: |
| 148 | + replicas: 3 # scale to multiple replicas |
| 149 | + datastore: # NEW — add this section |
| 150 | + backend: redis |
| 151 | + redis: |
| 152 | + endpoint: "ipp-redis.llm-d.svc.cluster.local:6379" |
| 153 | + # password: "" # optional, omit if Redis has no auth |
| 154 | + # heartbeatInterval: "1s" # how often to publish local values |
| 155 | + # refreshInterval: "1s" # how often to read aggregated values |
| 156 | + # keyTTL: "10s" # per-replica key expiry |
| 157 | +``` |
| 158 | + |
| 159 | +Or via `--set`: |
| 160 | + |
| 161 | +```bash |
| 162 | +helm install payload-processor ./config/charts/payload-processor \ |
| 163 | + --set payloadProcessor.replicas=3 \ |
| 164 | + --set payloadProcessor.datastore.backend=redis \ |
| 165 | + --set payloadProcessor.datastore.redis.endpoint=ipp-redis.llm-d.svc.cluster.local:6379 |
| 166 | +``` |
| 167 | + |
| 168 | +### CLI Flags |
| 169 | + |
| 170 | +When running IPP directly (outside Helm): |
| 171 | + |
| 172 | +```bash |
| 173 | +./payload-processor \ |
| 174 | + --datastore-backend=redis \ |
| 175 | + --redis-endpoint=ipp-redis.llm-d.svc.cluster.local:6379 \ |
| 176 | + --redis-heartbeat-interval=1s \ |
| 177 | + --redis-refresh-interval=1s \ |
| 178 | + --redis-key-ttl=10s |
| 179 | +``` |
| 180 | + |
| 181 | +## Configuration |
| 182 | + |
| 183 | + |
| 184 | +| Parameter | Description | Default | |
| 185 | +| ----------------------------------- | ----------------------------------------------------------------- | ------------------------------- | |
| 186 | +| `datastore.backend` | Storage backend: `inmemory` or `redis` | `inmemory` | |
| 187 | +| `datastore.redis.endpoint` | Redis address (`host:port`) | — (required when backend=redis) | |
| 188 | +| `datastore.redis.password` | Redis AUTH password | `""` (no auth) | |
| 189 | +| `datastore.redis.heartbeatInterval` | How often each replica publishes its local values to Redis | `1s` | |
| 190 | +| `datastore.redis.refreshInterval` | How often each replica reads and aggregates all values from Redis | `1s` | |
| 191 | +| `datastore.redis.keyTTL` | TTL for per-replica keys (stale keys auto-expire after this) | `10s` | |
| 192 | + |
| 193 | + |
| 194 | +## Redis Key Schema and Data Flow |
| 195 | + |
| 196 | +Extractors write to the local data map via `Put()`. The heartbeat goroutine reads from that map and publishes per-replica keys to Redis: |
| 197 | + |
| 198 | +``` |
| 199 | +Key format: <replica-id>:<model-name>:<metric-key> = <serialized value> |
| 200 | +
|
| 201 | +Example (2 replicas, 2 models): |
| 202 | + ipp-pod-abc:model-a:inflight-requests = {"Requests":2,"Tokens":8000} TTL 10s |
| 203 | + ipp-pod-abc:model-b:inflight-requests = {"Requests":4,"Tokens":16000} TTL 10s |
| 204 | + ipp-pod-def:model-a:inflight-requests = {"Requests":3,"Tokens":4000} TTL 10s |
| 205 | + ipp-pod-def:model-b:inflight-requests = {"Requests":6,"Tokens":22000} TTL 10s |
| 206 | +``` |
| 207 | + |
| 208 | +The refresh goroutine scans all keys via `SCAN`, groups by `<model>:<metric>`, and sums each field across replicas. After refresh, every replica's cache holds the same aggregated result: |
| 209 | + |
| 210 | +``` |
| 211 | +cache["model-a"]["inflight-requests"] = {Requests: 5, Tokens: 12000} // 2+3, 8000+4000 |
| 212 | +cache["model-b"]["inflight-requests"] = {Requests: 10, Tokens: 38000} // 4+6, 16000+22000 |
| 213 | +``` |
| 214 | + |
| 215 | +Scorers read from the cache via `Get()` — no network call. |
| 216 | + |
| 217 | +``` |
| 218 | +Put() → writes to local data → heartbeat publishes to Redis |
| 219 | +Get() → reads from local cache ← refresh aggregates from Redis |
| 220 | +``` |
| 221 | + |
| 222 | +## Verify |
| 223 | + |
| 224 | +After deploying Redis and configuring IPP: |
| 225 | + |
| 226 | +```bash |
| 227 | +# 1. Check Redis is reachable |
| 228 | +kubectl exec -it deploy/ipp-redis -n llm-d -- \ |
| 229 | + redis-cli PING |
| 230 | +# Expected: PONG |
| 231 | + |
| 232 | +# 2. Check keys are being written (after sending a few requests) |
| 233 | +kubectl exec -it deploy/ipp-redis -n llm-d -- \ |
| 234 | + redis-cli KEYS "*" |
| 235 | +# Expected: per-replica keys like ipp-pod-abc:model-a:inflight-requests |
| 236 | + |
| 237 | +# 3. Check TTL is set |
| 238 | +kubectl exec -it deploy/ipp-redis -n llm-d -- \ |
| 239 | + redis-cli TTL "ipp-pod-abc:model-a:inflight-requests" |
| 240 | +# Expected: a value between 1 and 10 |
| 241 | +``` |
| 242 | + |
| 243 | +## Failure Modes |
| 244 | + |
| 245 | + |
| 246 | +| Scenario | Behavior | |
| 247 | +| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
| 248 | +| **Replica crashes** | In-flight requests terminate with the pod. Per-replica keys expire via TTL (~10s). Next refresh excludes them. | |
| 249 | +| **Replica restarts** | New pod starts with an empty local store. Within ~1s the refresh goroutine reads aggregated state from Redis. | |
| 250 | +| **Redis unavailable** | Local cache retains last known aggregated values. Scorers continue with stale data. When Redis recovers, goroutines reconnect automatically. Degraded accuracy, not an outage. | |
| 251 | +| **Rolling update** | Pods restart one at a time. New pods read current state within ~1s. Old pod keys expire naturally via TTL. | |
| 252 | + |
| 253 | + |
| 254 | +## Notes |
| 255 | + |
| 256 | +- All IPP metrics stored in Redis are transient. Redis data loss (restart, eviction) causes temporary accuracy degradation, not an outage — metrics rebuild from live traffic within seconds. |
| 257 | +- The `keyTTL` should be at least 5× the `heartbeatInterval` to tolerate brief network hiccups without premature key expiry (e.g., heartbeat=1s → TTL≥5s, default 10s). |
| 258 | +- The replica ID used in key names is derived from the Kubernetes pod name (`POD_NAME` env var). Each pod produces a unique set of keys. |
| 259 | + |
0 commit comments