docs(k8s-proxy-llm-workflow): document --cluster for local cloud replay (#865)

charankamarapu · Charan Kamarapu · claude · web-flow · commit c6e9614f098e · 2026-05-29T11:11:11.000+05:30
* docs(k8s-proxy-llm-workflow): document --cluster for local cloud replay Discovery step 3: agent caches origin.clusterName via getApp (listApps does not return it). Pass it as --cluster on every keploy cloud replay so a local (no --trigger) run resolves the proxy app's identity without requiring an actively-heartbeating cluster. Routine A re-validate + Routine B B4 + Prompt B table now show the local- replay form with --cluster, -c, --container-name, --disableReportUpload=false; CI / active-cluster runs drop the local flags. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): tighten Case 2b — explicit edit→edit→delete loop Replace the prose "one or two mock edits then fall back" with a numbered 1→2→delete_recording loop the agent has to follow exactly. The previous wording let an agent linger on update_mock attempts indefinitely or improvise alternative repair strategies (e.g. editing the on-disk mocks.yaml directly) when an edit didn't take. Add two explicit DO-NOT rules that surfaced during Scenario 4 validation of the LLM workflow: * "Do not edit keploy/<test_set>/mocks.yaml on disk." cloud replay re-downloads mocks on every run, so the local file is a per-run snapshot and any local edit is silently overwritten before the test runs. An agent that didn't know this spiralled on grep+edit cycles against a file it couldn't actually change. * "Do not recompute hash fields by hand." Some recorded mocks carry derived fingerprints (sqlAstHash on Postgres v3); these are now recomputed by the proxy from the human-readable fields on load (api-server PR #1697 strips them on write, integrations PR #209 recomputes on read). LLMs that previously tried to rewrite the hash to match a SQL change couldn't compute the canonical value (libpg_query isn't reachable from a typical agent runtime) and shipped a mock the matcher couldn't reach. The new rule directs the agent to edit only what it can reason about and let the proxy derive the rest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): scoped vs whole-set re-record in Case 2b fallback Split today's single "fall back to delete_recording + re-record" path into two arms based on how many cases in the set are failing. 3a Whole-set: delete_recording (no test_case_ids), re-record all flows, keploy upload test-set with a fresh --name. Use when most of the set is failing. 3b Scoped: delete_recording({test_case_ids: [...]}) tombstones only the failing cases. Re-record just those flows. keploy upload test-set still produces a NEW test-set with a fresh --name; the branch ends with two coexisting test-sets (original-minus- tombstones + the small replacement set) both feeding the next replay. Use when one / a few cases are failing — preserves the unrelated passing tests in the same set. The 3b path is enabled by api-server PR #1697's scope-aware delete_recording (test_case_ids array param). Until it merges, the agent has only 3a available; that's why today's runs were destroying unrelated tests on every Case 2b fallback. Heuristic: pick 3a when ≥ ~75% of the cases are failing, 3b otherwise. The upload server enforces test-set name uniqueness per app, so the agent must mint a fresh --name on every upload (server returns `test set "X" already exists for this app` on collision). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): forbid AskUserQuestion; drop hash-recompute note Two changes to Case 2b guidance: 1. Add "Do NOT ask the dev which path to take" — Routine A is autonomous by contract. The previous wording ("announce … otherwise proceed") was read as "ask, then proceed" by an agent that interpreted AskUserQuestion as a safety net; that's wrong. Make the rule explicit: announce in plain text, never call AskUserQuestion, never offer numbered choices, never pause for confirmation. The dev reviews the streamed transcript and Ctrl-Cs if the agent is wrong — that's the contract. 2. Remove the "Do NOT recompute hash fields by hand" note. It referenced keploy/api-server#1697's strip-on-write commit and keploy/integrations#209's loader recompute, both reverted/closed. The supported path for Postgres mock drift is now plain Case 2b: two update_mock attempts (which the LLM may or may not get right for a Postgres v3 query), then the existing 3a/3b delete + re-record fallback. No special hash-related guidance needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): review-driven Case 2b refinements Addresses inline review comments on PR #865: * **3b availability banner.** The `test_case_ids` parameter on `delete_recording` requires a server build that advertises the scoped delete. Documented the discovery check + the fall-back to 3a so an agent running against an older api-server can still progress rather than hitting an unhelpful error. * **`<name-1>` placeholder + inline comment.** Replaced `<id-1>` with `<name-1>` and added a comment line clarifying that the values are the same friendly recording names used elsewhere in the skill (e.g. `get-api-orders-1`). Prevents agents from sending a branch-overlay UUID they don't have. * **Naming convention for --name.** Spelled out the convention `<original-set-name>--rerec-<YYYYMMDDHHMM>` (with `<short-git-sha>` as a deterministic alternative). Without a convention, every agent invents its own and the recordings page becomes a one-off-named long-tail mess. * **Cap-retry-3 vs Case 2b loop disambiguation.** A4's "cap retry at 3" was being read as the same budget as the 2-attempt mock-edit loop. Made the relationship explicit: 3 = total cloud-replay runs across the Case 2 loop (steps 1, 2, 3); 2 = update_mock attempts within step 2 before forcing the 3a/3b fallback. Two budgets, not one. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): round-2 polish on Case 2b 3b path - Availability banner: route agents to MCP `tools/list` (the protocol- level introspection that returns each tool's inputSchema) instead of `listMocks`/`getApp` which are domain endpoints that don't carry tool shape. - Naming convention: original-set-name often contains spaces, parens, or other chars that the api-server's name validator may reject. Switched to slug(original-set-name) and called out the slug rule explicitly. - Timestamp: UTC is now explicit (trailing Z is part of the literal name) so two agents running in different timezones at the same wall-clock minute mint the same name — preserving the "original + its re-records sort together" intent for cross-timezone CI integrations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): close slug-rule ambiguities Round-3 review caught three ambiguities in the round-2 naming convention block that would cause different LLM agents to mint different --name values for the same original-set-name: 1. Trailing literal `Z` collided with the slug charset `[a-z0-9-]` — a strict-slugger lowercases it; a literal-suffix-preserver doesn't. 2. The worked example silently applied "collapse runs of -" and "trim leading/trailing -" rules that weren't in the stated rule. 3. `<slug(original-set-name)>` read as a function-call placeholder that agents might try to invoke (no `slug` tool exists in the MCP toolset; library fallbacks produce divergent outputs). Replaced with a two-part declarative spec: (1) build the slug part with a fully-specified rule + worked example, (2) append a literal suffix with explicit "do NOT lowercase or slug" instruction. Also spelled out `<short-git-sha>` = first 7 chars of `git rev-parse HEAD` in the deterministic alternative. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(k8s-proxy-llm-workflow): note deterministic alt has no leading dash Round-4 review surfaced a separator inconsistency between the primary naming form (`<slug>--rerec-<…>`) and the deterministic alternative (`rerec-<sha>-<…>`). An LLM agent reading both could reflexively copy the primary's `--rerec-` literal into the alternative and mint `--rerec-<sha>-<…>` — leading `--` looks like a CLI flag to shell escapers and log scrapers, high chance of producing a bad command. Added one parenthetical line to the deterministic-alternative description: "The alternative begins with the literal `rerec` (NO leading `-` or `--` — the double-dash in the primary form is the slug/suffix boundary marker, which this form doesn't have because there's no slug part)." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(k8s-proxy-llm-workflow): prettier formatting Blank lines before fenced code blocks after the 3a/3b headers, and trailing-whitespace trim in one table cell. No content changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Charan Kamarapu <charan@keploy.io> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
@@ -134,8 +134,9 @@ You handle EVERYTHING else autonomously. Discover the app, the branch, the faili
 
 1. **App.** `basename $(pwd)` → `listApps({q: <basename>})` → pick the unambiguous match. Cache `app_id` for the session.
 2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name: <git branch>})` → cache `branch_id`. If `git rev-parse` returns `HEAD` or exits non-zero, ask the dev for a branch name ONCE.
+3. **Cluster.** `getApp({appId: app_id})` → read `origin.clusterName` (the proxy app's bound cluster). Cache as `cluster_name`. `listApps` does **not** return this — you must call `getApp`. You'll pass it to `--cluster` on every `keploy cloud replay` so a local (no-`--trigger`) run resolves the app's identity without requiring an actively-heartbeating cluster.
 
-Both values are sticky for the rest of the conversation. Don't re-discover unless the dev switches git branches.
+All three values are sticky for the rest of the conversation. Don't re-discover unless the dev switches git branches.
 
 ---
 
@@ -187,7 +188,66 @@ The contract changed on purpose; the test's recorded baseline is stale. Read `os
 
 **2b—Test diff plus a mock mismatch that's plausibly causing the diff.** The recorded mock is what's out of date—the downstream call's shape changed. Look at `oss_report.mock_mismatches.expected_mocks` (what the recorder captured) vs `actual_mocks` (what the replayer actually consumed) — entries that appear in `actual_mocks` but not `expected_mocks` are the new outgoing calls you need to capture. Update the mock via `update_mock({app_id, test_set_id, mock_id, branch_id, mock_yaml: <updated yaml>})`. Read the existing mock with `getMock` first to preserve fields you're not changing, then re-run replay.
 
-- If the test still fails after one or two mock edits, the recorded baseline is too far gone to patch piecemeal. Fall back: drop the stale test data (`delete_recording` on the affected test set) and re-capture from scratch using Routine B's flow (`keploy record` against the current behavior, then `keploy upload test-set --branch <git branch>` to land it on the branch).
+**Case 2b loop — follow exactly, do not improvise:**
+
+After step 1's `update_mock` lands and the re-replay is still red, your repertoire collapses to **exactly two moves**: another `update_mock` (step 2) or `delete_recording` + re-record (step 3). Anything else — editing `keploy/<test_set_id>/mocks.yaml`, passing `--useLocalMock`, `--disableMockUpload`, or `--useLocalTests` to `keploy cloud replay`, comparing local YAML to spot "what changed", restarting the agent — is **off the menu**. Those flags exist for OSS-level proxy debugging; they appear to work but pin a divergent state to your laptop while the Keploy branch the team replays against stays broken. The contract is "the cloud branch is the only source of truth"; respect it or get out of Routine A.
+
+1. **First edit.** `getMock` → mutate the canonical YAML to reflect the new contract → `update_mock`. Re-run replay.
+2. **Second edit (only if step 1 still red on the same mock for the same reason).** Re-read `oss_report.mock_mismatches` for the new run; the diff between `expected_mocks` and `actual_mocks` should now be tighter. `getMock` again (the server may have rewritten derived fields), mutate, `update_mock`. Re-run replay.
+3. **Fallback (if step 2 is still red).** Recorded baseline is too far gone to patch piecemeal — choose between a **whole-set** drop and a **scoped** drop based on how many test cases are failing:
+
+   **3a — Whole-set re-record (most / all cases in the set are failing):**
+
+   ```
+   delete_recording({app_id, test_set_id, branch_id})           # drops the entire set
+   keploy record -c "<dev run command>" --sync                   # captures all flows
+   # drive curls covering the same surface the original set covered
+   keploy upload test-set \
+     --app <ns.deployment> --branch <git branch> \
+     --test-set keploy/test-set-N --name <fresh-descriptive-name>
+   # re-run keploy cloud replay
+   ```
+
+   **3b — Scoped re-record (only one or a few cases in the set are failing):**
+
+   > **Availability**: 3b requires an api-server that advertises `test_case_ids` on the `delete_recording` MCP tool. To check, look at the `delete_recording` tool's `inputSchema.properties` in the MCP server's `tools/list` response — if `test_case_ids` (type: array of string) is listed, 3b is available. If it isn't, fall back to 3a until the deployment includes the scoped-delete change.
+
+   ```
+   delete_recording({                                            # tombstones JUST those cases;
+     app_id, test_set_id, branch_id,                             # the rest of the set + its
+     test_case_ids: [<name-1>, <name-2>, ...]                    # mocks stay intact.
+   })                                                            # Names are the same friendly
+                                                                 # identifiers used elsewhere
+                                                                 # (e.g. "get-api-orders-1").
+   keploy record -c "<dev run command>" --sync                   # capture only the dropped flows
+   # drive curls for ONLY the test cases you tombstoned — use the
+   # recorded request body of each as the curl shape
+   keploy upload test-set \
+     --app <ns.deployment> --branch <git branch> \
+     --test-set keploy/test-set-N \
+     --name <slug-part>--rerec-<utc-timestamp-part>              # build per "Naming convention details" below; do NOT post-process the result
+   # re-run keploy cloud replay                                   # so re-records cluster with the
+                                                                 # set they refresh
+   ```
+
+   In 3b the branch ends with two coexisting test-sets: the original (minus the tombstoned cases) and the new small one with the replacements — both contribute to the next replay. The server rejects duplicate names with `test set "X" already exists for this app`, so the convention below mints a unique `--name` and the shared prefix keeps the recordings page self-grouping (original + its re-records sort together).
+
+   > **Naming convention details.** Build the name in two parts and concatenate them verbatim — do NOT post-process the whole result.
+   >
+   > 1. **Slug part** — take `<original-set-name>`, lowercase it, replace every maximal run of characters outside `[a-z0-9]` (spaces, parens, dots, underscores, etc.) with a single `-`, then trim leading/trailing `-`. Worked example: `Scenario 4 v8 baseline (4 cases)` → `scenario-4-v8-baseline-4-cases`. This avoids the api-server's name validator rejecting spaces/parens/special chars.
+   > 2. **Suffix part** — append the literal `--rerec-<utc-timestamp>`, where `<utc-timestamp>` is `YYYYMMDDHHMMZ` (the trailing `Z` is uppercase, in UTC, and is part of the literal — do NOT lowercase or slug it). Example suffix: `--rerec-202605281430Z`.
+   >
+   > Combined example: `scenario-4-v8-baseline-4-cases--rerec-202605281430Z`. Two agents running in different timezones at the same wall-clock minute produce the same name — preserving the "original + its re-records sort together" intent for cross-timezone CI integrations.
+   >
+   > **Deterministic alternative.** If the original-set-name isn't available at name-mint time, use `rerec-<short-git-sha>-<utc-timestamp>` where `<short-git-sha>` is the first 7 characters of `git rev-parse HEAD`. The alternative begins with the literal `rerec` (NO leading `-` or `--` — the double-dash in the primary form is the slug/suffix boundary marker, which this form doesn't have because there's no slug part). This form drops the original-set prefix, so re-records won't sort-group with the original on the recordings page — only use it when the prefix is genuinely unavailable.
+
+   Pick 3a when ≥ ~75% of the set's cases fail, 3b otherwise. Defaulting to 3a when only one case is failing destroys unrelated passing tests for no reason.
+
+**Do NOT inspect or edit `keploy/<test_set_id>/mocks.yaml` on the local filesystem.** `keploy cloud replay` re-downloads mocks from the Keploy branch on every run; any local edit is silently overwritten before the next replay. All mock changes go through `update_mock`. If a local-edit + `--useLocalMock`-style workaround tempts you because cloud round-trip changed the shape of a value (e.g. a Postgres NUMERIC `price` came back as `{int: []}` instead of `{int: "1250", exp: -2}`), that is a step-2 retry signal — go to step 2 of the loop with a corrected payload, or to step 3 if step 2 also fails. Do not paper over the round-trip locally; the next replay on a CI runner or another laptop will undo your fix and the team-visible branch state will diverge from what you tested.
+
+**Do NOT pass `--useLocalMock`, `--useLocalTests`, or `--disableMockUpload` to `keploy cloud replay`.** Those flags belong to OSS-level proxy debugging and pin a laptop-local divergent state — the Keploy branch the rest of the team replays against stays broken regardless of what your local run reports. Every replay you trigger in Routine A must round-trip through the cloud branch; otherwise you are not in Routine A any more.
+
+**Do NOT ask the dev which path to take.** Routine A is autonomous. Announce the file:line you intend to change in plain text (so the dev can interrupt if they object), then proceed. Do not call `AskUserQuestion`, do not offer numbered choices, do not pause for confirmation. If two repair paths look equally valid, pick the one the skill recommends (Case 2a noise > body update, Case 2b update_mock > delete + re-record) and proceed. The dev is reviewing the streamed transcript and will Ctrl-C if you're wrong; that's the contract.
 
 Multiple failing test cases can land in different cases—handle each independently.
 
@@ -196,10 +256,13 @@ Multiple failing test cases can land in different cases—handle each independen
 After every Case-1 (app code edit) or Case-2 (test data edit) fix, run via Bash:
 
 ```bash
-keploy cloud replay --app <ns.deployment> --branch-name <git branch>
+keploy cloud replay --app <ns.deployment> --cluster "<cluster_name>" --branch-name <git branch> \
+  -c "<dev run command>" --container-name <app container> --disableReportUpload=false
 ```
 
-If still failing, re-enter Phase A2 with the new `test_run_id`. If passing, proceed to A5. Cap retry attempts at 3—if it's still red, the failures are likely a keploy-side proxy issue (your fixes aren't taking effect). Report the residual failures honestly with the `test_run_id` and the run-report URL so the dev can file a keploy bug, then stop.
+`--cluster` resolves the proxy app's identity without requiring an active heartbeat (use the `cluster_name` cached in Discovery). `-c` + `--container-name` start the app locally; omit them in CI / active-cluster runs and let the in-cluster agent run the deployment. `--disableReportUpload=false` makes the `/tr` report persist locally (the CLI silently sets it to `true` for OAuth sessions otherwise).
+
+If still failing, re-enter Phase A2 with the new `test_run_id`. If passing, proceed to A5. Cap retry attempts at 3 cloud-replay runs total across the Case 2 loop (i.e. step 1's replay + step 2's replay + step 3's replay = 3 — this is independent of the per-step retry budget within Case 2b, which caps `update_mock` attempts at 2 before forcing the 3a/3b fallback). If it's still red after 3 cloud-replay runs, the failures are likely a keploy-side proxy issue (your fixes aren't taking effect). Report the residual failures honestly with the `test_run_id` and the run-report URL so the dev can file a keploy bug, then stop.
 
 ### Phase A5—Report (exact format)
 
@@ -262,10 +325,15 @@ keploy upload test-set \
 
 ### Phase B4—Validate
 
+For **local** validation (dev's laptop) — pass `--cluster` (from Discovery), and start the app yourself via `-c` + `--container-name`:
+
 ```bash
-keploy cloud replay --app <ns.deployment> --branch-name <git branch>
+keploy cloud replay --app <ns.deployment> --cluster "<cluster_name>" --branch-name <git branch> \
+  -c "<dev run command>" --container-name <app container> --disableReportUpload=false
 ```
 
+For **CI / active-cluster** runs, omit `-c`/`--container-name`/`--disableReportUpload` and let the in-cluster agent run the deployment.
+
 If anything failed, enter Routine A from Phase A2—the diagnosis routine handles it.
 
 ### Phase B5—Report (exact format)
@@ -344,7 +412,7 @@ What happens behind the scenes for each:
 | B1    | `git diff origin/main...HEAD` to find handler files that changed; extract added/modified endpoints.                                                                                                                                                                                                                                                                                |
 | B2    | Pre-flight: discover the dev's run command from the repo (Makefile → docker-compose.yml → Procfile → package.json → README), start the app, curl any 200-returning endpoint to confirm it's serving traffic, stop it. Then run `keploy record -c "<dev run command>" --sync`, drive a realistic curl per new endpoint, stop the recorder. Recording lands at `keploy/test-set-N/`. |
 | B3    | `keploy upload test-set --app <ns.deployment> --branch <git branch> --test-set keploy/test-set-N --name <descriptive-name>` to land the bundle on the Keploy branch.                                                                                                                                                                                                               |
-| B4    | `keploy cloud replay --app <ns.deployment> --branch-name <git branch>` to validate. On failure, drop into Routine A.                                                                                                                                                                                                                                                               |
+| B4    | `keploy cloud replay --app <ns.deployment> --cluster "<cluster_name>" --branch-name <git branch> -c "<dev run command>" --container-name <app container> --disableReportUpload=false` to validate locally (drop the local flags in CI / active-cluster). On failure, drop into Routine A.                                                                                          |
 | B5    | Report: captured endpoints table + replay result + next-step (open PR) + branch-diff URL + run-report URL.                                                                                                                                                                                                                                                                         |
 
 For everything not covered by these two prompts—manually inspecting test data, editing one mock, listing recordings—use the manual flow on the [Developer Workflow](/docs/quickstart/k8s-proxy-developer-workflow) page directly. The two-prompt workflow handles the 90% case; the manual flow is the escape hatch.