Skip to content

Commit 0e1396d

Browse files
committed
Add config/redis README with deployment and configuration guide for HA metrics aggregation
Signed-off-by: noalimoy <nlimoy@redhat.com>
1 parent a77c7b2 commit 0e1396d

2 files changed

Lines changed: 426 additions & 0 deletions

File tree

config/redis/README.md

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# Redis Backend for IPP Metrics Aggregation
2+
3+
This guide covers how to deploy, configure, and verify Redis as the external store for IPP cross-replica metrics aggregation.
4+
5+
For the architecture and design rationale, see the [HA Metrics Aggregation design proposal](../../docs/proposals/ha-metrics-aggregation/README.md).
6+
For background on requirements, see [Issue #79](https://github.com/llm-d/llm-d-inference-payload-processor/issues/79) and [Issue #85](https://github.com/llm-d/llm-d-inference-payload-processor/issues/85).
7+
8+
## How It Works
9+
10+
```
11+
Replica 1 Replica 2 Replica 3
12+
Put→local Put→local Put→local
13+
(model-a: reqs=2, (model-a: reqs=3, (model-a: reqs=0,
14+
tok=8000) tok=4000) tok=0)
15+
│ │ │
16+
└── heartbeat ──────────┼── heartbeat ──────────┘
17+
18+
Redis (user-owned)
19+
r1:model-a:inflight-requests = {Reqs:2,Tok:8000} TTL 10s
20+
r2:model-a:inflight-requests = {Reqs:3,Tok:4000} TTL 10s
21+
r3:model-a:inflight-requests = {Reqs:0,Tok:0} TTL 10s
22+
23+
┌── refresh (~1s) ──────┼── refresh (~1s) ──────┐
24+
▼ ▼ ▼
25+
Get→cache: model-a Get→cache: model-a Get→cache: model-a
26+
reqs=5, tok=12000 reqs=5, tok=12000 reqs=5, tok=12000
27+
Scorer reads Scorer reads Scorer reads
28+
{Reqs:5, Tok:12000} {Reqs:5, Tok:12000} {Reqs:5, Tok:12000}
29+
```
30+
31+
Each replica keeps two in-memory maps (local data and cache) and uses two background goroutines (heartbeat and refresh) to sync through Redis. Scorers always read from the local cache — no network call on the request path. For the full architecture, see the [design proposal](../../docs/proposals/ha-metrics-aggregation/README.md).
32+
33+
## Prerequisites
34+
35+
A running Redis instance accessible from the IPP pods. IPP does not deploy or manage Redis — you provide the endpoint (via a Redis Operator, Helm chart, or managed service like ElastiCache).
36+
37+
Redis requirements:
38+
39+
- Redis 6.x or later
40+
- No persistence needed (`--save ""`) — all IPP metrics are transient and reconstructed from live traffic
41+
- No AOF, no RDB
42+
- Single-node Redis is sufficient (if Redis restarts, IPP continues with cached data until keys rebuild)
43+
44+
## Deploy Redis
45+
46+
Choose one of the following options.
47+
48+
### Option 1: Simple Deployment (dev / testing)
49+
50+
```bash
51+
kubectl apply -f - <<EOF
52+
apiVersion: apps/v1
53+
kind: Deployment
54+
metadata:
55+
name: ipp-redis
56+
namespace: llm-d
57+
spec:
58+
replicas: 1
59+
selector:
60+
matchLabels:
61+
app: ipp-redis
62+
template:
63+
metadata:
64+
labels:
65+
app: ipp-redis
66+
spec:
67+
containers:
68+
- name: redis
69+
image: redis:7-alpine
70+
args: ["--save", "", "--maxmemory", "64mb", "--maxmemory-policy", "allkeys-lru"]
71+
ports:
72+
- containerPort: 6379
73+
resources:
74+
requests:
75+
cpu: 100m
76+
memory: 64Mi
77+
limits:
78+
cpu: 250m
79+
memory: 128Mi
80+
---
81+
apiVersion: v1
82+
kind: Service
83+
metadata:
84+
name: ipp-redis
85+
namespace: llm-d
86+
spec:
87+
selector:
88+
app: ipp-redis
89+
ports:
90+
- port: 6379
91+
EOF
92+
```
93+
94+
The Redis endpoint is `ipp-redis.llm-d.svc.cluster.local:6379`.
95+
96+
### Option 2: Redis Operator (production)
97+
98+
If your cluster runs a Redis Operator (e.g., [Spotahome](https://github.com/spotahome/redis-operator),[OpsTree](https://github.com/OT-CONTAINER-KIT/redis-operator)), create a minimal standalone instance:
99+
100+
```yaml
101+
apiVersion: databases.spotahome.com/v1
102+
kind: RedisFailover
103+
metadata:
104+
name: ipp-redis
105+
namespace: llm-d
106+
spec:
107+
sentinel:
108+
replicas: 3
109+
redis:
110+
replicas: 2
111+
customConfig:
112+
- "save \"\""
113+
- "appendonly no"
114+
```
115+
116+
Refer to your operator's documentation for the resulting Service endpoint:
117+
118+
```bash
119+
kubectl get svc -n llm-d | grep redis
120+
```
121+
122+
Use the Service name as the `datastore.redis.endpoint` value (see [Configure IPP to Use Redis](#configure-ipp-to-use-redis) below).
123+
124+
### Option 3: Managed Redis (cloud)
125+
126+
Use your cloud provider's managed Redis service:
127+
128+
129+
| Provider | Service | Notes |
130+
| -------- | --------------------- | ------------------------------------- |
131+
| AWS | ElastiCache for Redis | Disable backups, use `cache.t3.micro` |
132+
| GCP | Memorystore for Redis | Basic tier, no replicas needed |
133+
| Azure | Azure Cache for Redis | Basic C0, disable persistence |
134+
135+
136+
Use the endpoint provided by the managed service (e.g., `my-redis.abc123.cache.amazonaws.com:6379`).
137+
138+
## Configure IPP to Use Redis
139+
140+
> **Note:** The configuration flags and Helm values below are part of the Redis backend implementation (not yet merged). They document the target interface.
141+
142+
### Helm
143+
144+
Set the Redis backend in your `values.yaml` (add the `datastore` section alongside existing fields):
145+
146+
```yaml
147+
payloadProcessor:
148+
replicas: 3 # scale to multiple replicas
149+
datastore: # NEW — add this section
150+
backend: redis
151+
redis:
152+
endpoint: "ipp-redis.llm-d.svc.cluster.local:6379"
153+
# password: "" # optional, omit if Redis has no auth
154+
# heartbeatInterval: "1s" # how often to publish local values
155+
# refreshInterval: "1s" # how often to read aggregated values
156+
# keyTTL: "10s" # per-replica key expiry
157+
```
158+
159+
Or via `--set`:
160+
161+
```bash
162+
helm install payload-processor ./config/charts/payload-processor \
163+
--set payloadProcessor.replicas=3 \
164+
--set payloadProcessor.datastore.backend=redis \
165+
--set payloadProcessor.datastore.redis.endpoint=ipp-redis.llm-d.svc.cluster.local:6379
166+
```
167+
168+
### CLI Flags
169+
170+
When running IPP directly (outside Helm):
171+
172+
```bash
173+
./payload-processor \
174+
--datastore-backend=redis \
175+
--redis-endpoint=ipp-redis.llm-d.svc.cluster.local:6379 \
176+
--redis-heartbeat-interval=1s \
177+
--redis-refresh-interval=1s \
178+
--redis-key-ttl=10s
179+
```
180+
181+
## Configuration
182+
183+
184+
| Parameter | Description | Default |
185+
| ----------------------------------- | ----------------------------------------------------------------- | ------------------------------- |
186+
| `datastore.backend` | Storage backend: `inmemory` or `redis` | `inmemory` |
187+
| `datastore.redis.endpoint` | Redis address (`host:port`) | — (required when backend=redis) |
188+
| `datastore.redis.password` | Redis AUTH password | `""` (no auth) |
189+
| `datastore.redis.heartbeatInterval` | How often each replica publishes its local values to Redis | `1s` |
190+
| `datastore.redis.refreshInterval` | How often each replica reads and aggregates all values from Redis | `1s` |
191+
| `datastore.redis.keyTTL` | TTL for per-replica keys (stale keys auto-expire after this) | `10s` |
192+
193+
194+
## Redis Key Schema and Data Flow
195+
196+
Extractors write to the local data map via `Put()`. The heartbeat goroutine reads from that map and publishes per-replica keys to Redis:
197+
198+
```
199+
Key format: <replica-id>:<model-name>:<metric-key> = <serialized value>
200+
201+
Example (2 replicas, 2 models):
202+
ipp-pod-abc:model-a:inflight-requests = {"Requests":2,"Tokens":8000} TTL 10s
203+
ipp-pod-abc:model-b:inflight-requests = {"Requests":4,"Tokens":16000} TTL 10s
204+
ipp-pod-def:model-a:inflight-requests = {"Requests":3,"Tokens":4000} TTL 10s
205+
ipp-pod-def:model-b:inflight-requests = {"Requests":6,"Tokens":22000} TTL 10s
206+
```
207+
208+
The refresh goroutine scans all keys via `SCAN`, groups by `<model>:<metric>`, and sums each field across replicas. After refresh, every replica's cache holds the same aggregated result:
209+
210+
```
211+
cache["model-a"]["inflight-requests"] = {Requests: 5, Tokens: 12000} // 2+3, 8000+4000
212+
cache["model-b"]["inflight-requests"] = {Requests: 10, Tokens: 38000} // 4+6, 16000+22000
213+
```
214+
215+
Scorers read from the cache via `Get()` — no network call.
216+
217+
```
218+
Put() → writes to local data → heartbeat publishes to Redis
219+
Get() → reads from local cache ← refresh aggregates from Redis
220+
```
221+
222+
## Verify
223+
224+
After deploying Redis and configuring IPP:
225+
226+
```bash
227+
# 1. Check Redis is reachable
228+
kubectl exec -it deploy/ipp-redis -n llm-d -- \
229+
redis-cli PING
230+
# Expected: PONG
231+
232+
# 2. Check keys are being written (after sending a few requests)
233+
kubectl exec -it deploy/ipp-redis -n llm-d -- \
234+
redis-cli KEYS "*"
235+
# Expected: per-replica keys like ipp-pod-abc:model-a:inflight-requests
236+
237+
# 3. Check TTL is set
238+
kubectl exec -it deploy/ipp-redis -n llm-d -- \
239+
redis-cli TTL "ipp-pod-abc:model-a:inflight-requests"
240+
# Expected: a value between 1 and 10
241+
```
242+
243+
## Failure Modes
244+
245+
246+
| Scenario | Behavior |
247+
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
248+
| **Replica crashes** | In-flight requests terminate with the pod. Per-replica keys expire via TTL (~10s). Next refresh excludes them. |
249+
| **Replica restarts** | New pod starts with an empty local store. Within ~1s the refresh goroutine reads aggregated state from Redis. |
250+
| **Redis unavailable** | Local cache retains last known aggregated values. Scorers continue with stale data. When Redis recovers, goroutines reconnect automatically. Degraded accuracy, not an outage. |
251+
| **Rolling update** | Pods restart one at a time. New pods read current state within ~1s. Old pod keys expire naturally via TTL. |
252+
253+
254+
## Notes
255+
256+
- All IPP metrics stored in Redis are transient. Redis data loss (restart, eviction) causes temporary accuracy degradation, not an outage — metrics rebuild from live traffic within seconds.
257+
- The `keyTTL` should be at least 5× the `heartbeatInterval` to tolerate brief network hiccups without premature key expiry (e.g., heartbeat=1s → TTL≥5s, default 10s).
258+
- The replica ID used in key names is derived from the Kubernetes pod name (`POD_NAME` env var). Each pod produces a unique set of keys.
259+

0 commit comments

Comments
 (0)