Skip to content

Commit 09173b9

Browse files
Aditya-eddyclaude
andcommitted
docs(cloud-replay): document k8s-proxy-auth mode + self-hosted troubleshooting
- Complete the --proxy-url → --k8s-proxy-url rename (and cloud.proxyUrl → cloud.k8sProxyUrl) to match the CLI. - Add "Authenticating through the k8s-proxy": the --k8s-proxy-auth / KEPLOY_K8S_PROXY_AUTH offline/CI mode — required inputs (proxy URL + KEPLOY_API_KEY), what it does and which api-server calls it skips, KEPLOY_TLS_SKIP_VERIFY / SSL_CERT_FILE for self-signed proxy certs, --cluster behaviour, and read+write vs replay:start permissions. - Add "Troubleshooting self-hosted replay" table (self-signed TLS, empty clusterName 400, missing mock blob, sysctl/PATH, app image pull, eBPF agent image tag, --delay for slow apps, telemetry mock mismatches). - Note the auth axis in the overview, a k8s-proxy-auth quick example, and a report-upload row in the SaaS-vs-self-hosted table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Aditya Sharma <aditya282003@gmail.com>
1 parent b89a239 commit 09173b9

1 file changed

Lines changed: 95 additions & 11 deletions

File tree

versioned_docs/version-4.0.0/keploy-cloud/cloud-replay.md

Lines changed: 95 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
id: cloud-replay
33
title: Cloud Replay Command Reference
44
sidebar_label: Cloud Replay
5-
description: Complete reference for every flag of the keploy cloud replay command — app and cluster selection, self-hosted vs SaaS targeting, the --proxy-url ingress override, release-tag and branch selection, mapping flags, and trigger vs local replay, with examples.
5+
description: Complete reference for every flag of the keploy cloud replay command — app and cluster selection, self-hosted vs SaaS targeting, the --k8s-proxy-url ingress override, release-tag and branch selection, mapping flags, and trigger vs local replay, with examples.
66
tags:
77
- K8s
88
- cloud replay
@@ -12,7 +12,7 @@ tags:
1212
keywords:
1313
- keploy cloud replay
1414
- cloud replay flags
15-
- proxy-url
15+
- k8s-proxy-url
1616
- k8s-proxy ingress
1717
- release-tag
1818
- trigger replay
@@ -37,7 +37,8 @@ keploy cloud replay --app <namespace>.<deployment> [flags]
3737
Two axes determine how a run behaves:
3838

3939
- **Where the replay executes** — locally on your machine (the default) or inside the cluster (`--trigger`). See [Trigger vs local replay](#trigger-vs-local-replay).
40-
- **Where the test data and cluster live** — Keploy **SaaS** (test assets fetched from the api-server) or **self-hosted** (test assets fetched from your in-cluster k8s-proxy ingress). Keploy detects this from the selected cluster's `deployment_type`; some flags (notably [`--proxy-url`](#--proxy-url)) apply only to self-hosted clusters.
40+
- **Where the test data and cluster live** — Keploy **SaaS** (test assets fetched from the api-server) or **self-hosted** (test assets fetched from your in-cluster k8s-proxy ingress). Keploy detects this from the selected cluster's `deployment_type`; some flags (notably [`--k8s-proxy-url`](#--k8s-proxy-url)) apply only to self-hosted clusters.
41+
- **How the CLI authenticates** — directly against the api-server (the default), or **through the k8s-proxy** for offline/air-gapped self-hosted CI (`--k8s-proxy-auth`). See [Authenticating through the k8s-proxy](#authenticating-through-the-k8s-proxy).
4142

4243
:::tip
4344
Every flag is also available via `keploy cloud replay --help`. Run with `-v`/`--debug` to see which ingress URL, cluster, and branch a run resolved to.
@@ -57,10 +58,15 @@ keploy cloud replay --app prod.orders --release-tag docker.io/acme/orders:1.4.2
5758

5859
# Self-hosted: target an explicit k8s-proxy ingress instead of the heartbeat-inferred one
5960
keploy cloud replay --app prod.orders --cluster my-cluster \
60-
--proxy-url https://keploy-proxy.my-cluster.internal
61+
--k8s-proxy-url https://keploy-proxy.my-cluster.internal
6162

6263
# In-cluster (trigger) replay of the smart set
6364
keploy cloud replay --app prod.orders --cluster my-cluster --trigger --replay-source smart-set
65+
66+
# Offline/air-gapped self-hosted CI: authenticate through the k8s-proxy (no direct api-server)
67+
export KEPLOY_API_KEY=kep_xxxxxxxx
68+
keploy cloud replay --app prod.orders --cluster my-cluster \
69+
--k8s-proxy-auth --k8s-proxy-url https://keploy-proxy.my-cluster.internal
6470
```
6571

6672
---
@@ -80,16 +86,16 @@ The `--app` value `prod.orders` is split on the first `.`: `prod` is the namespa
8086

8187
## Self-hosted ingress targeting
8288

83-
### `--proxy-url`
89+
### `--k8s-proxy-url`
8490

8591
**Self-hosted only.** The k8s-proxy ingress URL to target directly, e.g. `https://keploy-proxy.my-cluster.internal`.
8692

8793
```bash
8894
keploy cloud replay --app prod.orders --cluster my-cluster \
89-
--proxy-url https://keploy-proxy.my-cluster.internal
95+
--k8s-proxy-url https://keploy-proxy.my-cluster.internal
9096
```
9197

92-
By default, a self-hosted cloud replay discovers the k8s-proxy ingress from the selected cluster's **latest heartbeat**. `--proxy-url` makes that endpoint explicit and overridable: when set, it **overrides** the heartbeat-inferred ingress URL for every k8s-proxy call the run makes (fetching test sets, mocks and mappings, and — under `--trigger` — starting the in-cluster replay and streaming its status).
98+
By default, a self-hosted cloud replay discovers the k8s-proxy ingress from the selected cluster's **latest heartbeat**. `--k8s-proxy-url` makes that endpoint explicit and overridable: when set, it **overrides** the heartbeat-inferred ingress URL for every k8s-proxy call the run makes (fetching test sets, mocks and mappings, and — under `--trigger` — starting the in-cluster replay and streaming its status).
9399

94100
Reach for it when:
95101

@@ -100,17 +106,71 @@ Rules and behaviour:
100106

101107
- Must be an **absolute `http://` or `https://` URL with a host**. Malformed values are rejected up front, before any cluster work, with a clear error. A trailing slash is trimmed automatically.
102108
- Applies to **self-hosted clusters only**. For a SaaS cluster the value is ignored (replay talks to the api-server directly) and the CLI logs a warning so the flag is never silently dropped.
103-
- It does **not** bypass cluster registration/activity checks — the cluster must still be registered and (for self-hosted) actively heart-beating. `--proxy-url` only changes *which ingress URL is used*, not whether the cluster is considered valid.
104-
- Can also be set in `keploy.yml` as `cloud.proxyUrl`.
109+
- It does **not** bypass cluster registration/activity checks — the cluster must still be registered and (for self-hosted) actively heart-beating. `--k8s-proxy-url` only changes *which ingress URL is used*, not whether the cluster is considered valid.
110+
- Can also be set in `keploy.yml` as `cloud.k8sProxyUrl`.
105111

106112
```yaml
107113
# keploy.yml
108114
cloud:
109-
proxyUrl: https://keploy-proxy.my-cluster.internal
115+
k8sProxyUrl: https://keploy-proxy.my-cluster.internal
110116
```
111117
112118
---
113119
120+
## Authenticating through the k8s-proxy
121+
122+
**Self-hosted only.** An additive, opt-in authentication mode for CI runners and boxes that can reach **only** the k8s-proxy — no direct route to the Keploy api-server and no internet. When enabled, the CLI authenticates your API key **through** the k8s-proxy (which validates it against the api-server on your behalf) and then skips every other direct api-server call for the rest of the run.
123+
124+
This is distinct from [`--k8s-proxy-url`](#--k8s-proxy-url): that flag only chooses *which ingress the data plane targets*; `--k8s-proxy-auth` changes *how the CLI authenticates and which control-plane calls it makes*. You normally use both together.
125+
126+
### Enabling it
127+
128+
| Input | Flag | Environment variable | keploy.yml |
129+
| --- | --- | --- | --- |
130+
| Turn the mode on | `--k8s-proxy-auth` | `KEPLOY_K8S_PROXY_AUTH=true` | `cloud.k8sProxyAuth: true` |
131+
| k8s-proxy ingress URL | `--k8s-proxy-url` | `KEPLOY_K8S_PROXY_URL` | `cloud.k8sProxyUrl` |
132+
| API key (PAT) | `--api-key` | `KEPLOY_API_KEY` | — |
133+
134+
All three are required — the mode needs a proxy URL to reach and a PAT to authenticate. Generate the PAT from **Settings → API Keys** in the dashboard.
135+
136+
```bash
137+
export KEPLOY_API_KEY=kep_xxxxxxxx
138+
keploy cloud replay \
139+
--k8s-proxy-auth \
140+
--k8s-proxy-url https://keploy-proxy.my-cluster.internal \
141+
--app prod.orders \
142+
--cluster my-cluster
143+
```
144+
145+
### What it does
146+
147+
- Authenticates by sending your PAT to the proxy's `/relay-auth` endpoint, which relays it to the api-server's `/auth/apikey` and returns only the verdict — the CLI never contacts the api-server directly.
148+
- Forces the run into **self-hosted** mode: it skips the cluster-list lookup and targets the proxy ingress directly. Because of this, a `--cluster` name is needed (see below).
149+
- **Skips** every direct api-server step that would otherwise hang offline: the api-key exchange, role/plan checks, IT-usage check, update check, and the end-of-run debug-bundle upload (which targets the api-server). Authorization is still enforced server-side by the proxy's RBAC on every request.
150+
151+
### TLS to a self-signed proxy
152+
153+
Self-hosted proxies commonly serve a self-signed certificate. If the auth preflight fails with `certificate signed by unknown authority`, either trust the CA or skip verification:
154+
155+
```bash
156+
# Option A — skip verification (matches the self-hosted data plane)
157+
export KEPLOY_TLS_SKIP_VERIFY=true
158+
159+
# Option B — keep verification on by trusting the proxy's CA
160+
export SSL_CERT_FILE=/path/to/proxy-ca.pem
161+
```
162+
163+
### `--cluster` in this mode
164+
165+
Since the mode skips the cluster-list lookup, pass `--cluster <name>` where `<name>` is the cluster the app was **registered/recorded under** (the deployment's `KEPLOY_CLUSTER_NAME`). The CLI adopts it automatically from the app's record when you omit it; a **wrong** name resolves zero apps (`app <ns>.<dep> not found for the cluster <name>`).
166+
167+
### Permissions
168+
169+
- **Local replay** (no `--trigger`) needs a key with **read + write** — enough to fetch tests and upload the report.
170+
- **In-cluster replay** (`--trigger`) starts an orchestrated replay via the proxy's `/test/start`, which requires the admin-tier `replay:start` permission (or the `ci` scope). Under `KEPLOY_RBAC_MODE=enforce`, a key without it is rejected.
171+
172+
---
173+
114174
## Trigger vs local replay
115175

116176
### `--trigger`
@@ -223,8 +283,32 @@ These take precedence over values auto-detected from the CI provider's environme
223283
| --- | --- | --- |
224284
| Where test assets come from | Keploy api-server | In-cluster k8s-proxy ingress |
225285
| `--cluster` | Optional (single active cluster auto-selected) | Recommended/required; cluster must be registered and active |
226-
| `--proxy-url` | Ignored (warned) | **Honored** — overrides the heartbeat-inferred ingress URL |
286+
| `--k8s-proxy-url` | Ignored (warned) | **Honored** — overrides the heartbeat-inferred ingress URL |
287+
| `--k8s-proxy-auth` | N/A | Optional — authenticate through the proxy for offline/CI runs |
227288
| Auth to the data plane | api-server token | User PAT validated by the k8s-proxy |
289+
| Test report upload | api-server | k8s-proxy ingress (proxy DB), authenticated with the PAT |
290+
291+
---
292+
293+
## Troubleshooting self-hosted replay
294+
295+
Local (non-`--trigger`) self-hosted replay boots the app **and** the Keploy eBPF agent locally via docker-compose, so it has a few environment prerequisites the SaaS path doesn't. The failures below are the common ones, with fixes:
296+
297+
| Symptom | Cause | Fix |
298+
| --- | --- | --- |
299+
| `k8s-proxy-auth: … certificate signed by unknown authority` | Proxy serves a self-signed cert | `export KEPLOY_TLS_SKIP_VERIFY=true` (or `SSL_CERT_FILE=<ca.pem>`) |
300+
| `failed to get test sets: unexpected status code: 400` | Empty `clusterName` sent to the proxy | Pass `--cluster <name>` (the app's registered cluster) |
301+
| `app <ns>.<dep> not found for the cluster <name>` | `--cluster` doesn't match the app's recorded cluster | Use the exact `KEPLOY_CLUSTER_NAME` the app was recorded under |
302+
| `blob not found` / `…/mocks.bin not found` | The mock blob isn't in the proxy's object store (e.g. the store was reset after recording) | Re-record the test set; keep the metadata DB and object store lifecycles in sync |
303+
| `sysctl: executable file not found in $PATH` | The privileged (sudo) re-exec stripped `/usr/sbin`/`/sbin` from `PATH` | Handled by the CLI automatically; on older builds `export PATH="$PATH:/usr/sbin:/sbin"` |
304+
| `pull access denied for <app-image>` | The app image lives only in the cluster, not the local docker daemon | Load it locally, pass `--image <local-tag>`, or use `--trigger` (in-cluster) |
305+
| `manifest for keploy/enterprise:v<ver> not found` | The eBPF agent image tag derived from the CLI version isn't published (a dev build resolves to `v<version>-dev`) | Use a release binary, or make that tag resolvable to the local docker daemon |
306+
| `connection reset by peer` → `ACTUAL STATUS 0` on the first test | The app wasn't listening yet when the first request fired | Raise `--delay` (e.g. `--delay 20`) for slow-starting apps (JVM, etc.) |
307+
| `no matching mock found for POST /v1/logs` (or similar) | The app exports OpenTelemetry/telemetry to a collector that wasn't mocked | Disable telemetry export for the replay, add the endpoint to `test.globalNoise`, or re-record |
308+
309+
:::tip
310+
For a cluster-deployed app, in-cluster replay (`--trigger --namespace <ns>`) sidesteps the local docker/image/eBPF prerequisites entirely — it replays where the app and its dependencies already run. It needs a key with `replay:start` (admin-tier or the `ci` scope).
311+
:::
228312

229313
---
230314

0 commit comments

Comments
 (0)