From 7c74166725355c1c4ae234540ed4eccedc776986 Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 00:23:26 +0530 Subject: [PATCH 1/6] docs(keploy-cloud): add AI Agent for Smart Test Sets guide Document the ready-made smart-set agent skill: how an AI coding assistant (Claude Code, Cursor, ...) diagnoses a failing smart-set replay and adds new smart tests on a branch via the Keploy MCP tools, plus the mandatory replay flags, the branch boundary, and the full skill file to install. Adds 'rebase' to the Vale vocabulary (a git term the branching docs use). Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../config/vocabularies/Base/accept.txt | 1 + .../keploy-cloud/smart-set-agent.md | 193 ++++++++++++++++++ .../version-4.0.0-sidebars.json | 1 + 3 files changed, 195 insertions(+) create mode 100644 versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md diff --git a/vale_styles/config/vocabularies/Base/accept.txt b/vale_styles/config/vocabularies/Base/accept.txt index 6f77a50a3..a516a7f97 100644 --- a/vale_styles/config/vocabularies/Base/accept.txt +++ b/vale_styles/config/vocabularies/Base/accept.txt @@ -26,6 +26,7 @@ [Pp]assthrough [Pp]refill[s]? [Rr]eachability +[Rr]ebase[ds]? [Rr]efcount[s]? [Rr]ehydrate[ds]? [Rr]eplayer diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md new file mode 100644 index 000000000..e631a1c49 --- /dev/null +++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md @@ -0,0 +1,193 @@ +--- +id: smart-set-agent +title: AI Agent for Smart Test Sets +sidebar_label: AI Agent (Smart Tests) +description: Let AI coding agents like Claude Code and Cursor diagnose failing smart-set replays and add new smart tests on a branch, using the Keploy MCP tools +tags: + - AI Agent + - Smart Test Set + - Claude Code + - Cursor + - MCP + - branch +keywords: + - smart test set agent + - Claude Code + - Cursor + - Keploy MCP + - schema_ref + - branch-native testing + - failing replay fix +--- + +import ProductTier from '@site/src/components/ProductTier'; + + + +## Overview + +Keploy's [smart test set](/docs/keploy-cloud/deduplication/) is a content-addressed test substrate: cases are keyed by a `schema_ref` (a hash of the contract shape — method, path, status, content-types, and the request/response body & query **shapes**), deduplicated per application, and edited **branch-natively** (the `main` view is read-only; edits live on a branch until a human or CI merges them). + +This page describes a ready-made **agent skill** that lets an AI coding assistant (Claude Code, Cursor, and similar) operate that substrate end-to-end. Given one of two plain-English prompts, the agent: + +1. **Diagnoses a failing smart-set replay** — finds the app, branch, failing run, and the relevant code changes, classifies each failure, and fixes it **on a branch**. +2. **Adds new smart tests** for your latest code changes — records traffic, uploads it as a smart set onto the branch, and validates it. + +The agent always stops at a **verified branch** and reports back. Merging to `main` stays a human/CI decision — it is intentionally not something the agent does. + +## Prerequisites + +- Keploy Enterprise with [smart test sets enabled](/docs/keploy-cloud/deduplication/) on the app (`EnableSmartTestSet=true`). +- The Keploy **MCP server** configured in your agent (see [MCP Server setup](/docs/running-keploy/agent-test-generation/#mcp-server-recommended-for-ai-agents)). The smart-set workflow uses the same `/client/v1/mcp` endpoint and the same authentication. +- A Personal Access Token (PAT) or API key with access to the app. +- The application's recording cluster reachable for `keploy cloud replay`. + +## What the agent needs from you + +The developer only ever says one of two things — the skill handles everything else (discovering the app, branch, failing run, and code changes) autonomously: + +| Prompt | Routine | +| --------------------------------------------------------------------- | ------------------------------------------------ | +| _"my keploy smart-set replay is failing, please analyze and fix it."_ | Routine A — diagnose & fix on a branch | +| _"Add new keploy smart tests for my changes."_ | Routine B — record, upload, validate on a branch | + +## Installing the skill + +The skill is a single Markdown file that teaches your agent the smart-set workflow and guardrails. Drop it into your project so the agent picks it up automatically. + +### Cursor + +Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor loads project skills automatically; the agent invokes it when your prompt matches a failing smart-set replay or a request to add smart tests. + +### Claude Code + +Save the skill under your project's skills directory (e.g. `.claude/skills/smart-set/SKILL.md`) or reference its content from `CLAUDE.md`. Claude Code reads project-level skill and context files automatically. + +The full skill content is included at the end of this page under [Skill reference](#skill-reference). + +## How it works + +### Key concepts the agent relies on + +- **Branch-first, enforced by the substrate.** Every edit, delete, obsolete, or mock write is branch-scoped; a write without a `branch_id` is rejected. The Keploy branch name mirrors your git branch name (`git rev-parse --abbrev-ref HEAD`), and `create_branch` is idempotent (find-or-create). +- **`schema_ref` identity.** **Value edits** (response body, noise, assertions, mock re-links) keep the same `schema_ref` and are safe in place. **Shape edits** (changing method/path/status/content-type or the body/query structure) recompute the `schema_ref`; if the new ref collides with another case you get a typed `SchemaRefConflict` to resolve, not retry. +- **Non-destructive re-record.** Re-recording a same-shape contract replaces the case data in place and carries your noise/assertions/obsolete flags forward. A re-record that changes the shape lands a **new** `schema_ref` — the stale case is then deleted so the suite doesn't keep a red duplicate. +- **The boundary is the branch.** The agent never runs a merge or rebase to `main` — it reports a verified branch and the dashboard URLs, and you (or CI) merge. + +### Routine A — fix a failing replay + +1. **Resolve the failing run.** For a local failure, the agent fetches the newest `FAILED` report on the branch; for a CI failure, it extracts the `test_run_id` from the pasted CI/dashboard URL. +2. **Fetch the report**, projected to just the failing cases (a focused field set instead of the full ~34k-token report). +3. **Classify each failing case** after an unconditional working-tree check (`git status`/`git diff`): + - **Regression** — code changed and broke a correct contract → the agent **fixes the source and rebuilds**, never edits the test to match a bug. + - **Value drift** — a field/header/body value legitimately changed → `updateSmartTestCase` (golden body or `noise` for non-deterministic fields). + - **Shape drift** — the contract structure changed → `updateSmartTestCase` with the new request/response shape, resolving any `SchemaRefConflict`. + - **Mock drift** — a downstream response changed → `upsertSmartMock` (or re-record when the request itself changed). +4. **Verify on the branch** via `keploy cloud replay --replay-source smart-set` and iterate (capped retries). +5. **Report and stop** with a diagnosis table, the fixes applied, and dashboard URLs for the branch diff and run report. + +### Routine B — add new smart tests + +1. **Identify changed endpoints** from the git diff. +2. **Capture traffic** with `keploy record --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. +3. **Upload onto the branch** as a smart set (new contracts ingest as `imported-*`, deduplicated by `schema_ref`; existing ones are skipped). +4. **Validate on the branch** with `keploy cloud replay`. +5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`. + +## Replay flags the agent always uses + +When the agent runs `keploy cloud replay` for a smart-set app, these flags are required: + +| Flag | Why | +| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | +| `--cluster ` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | +| `--branch-name ` | Replay the branch view, including the agent's edits. | +| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | +| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. | +| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. | + +## Limitations + +- The agent's scope ends at a verified branch — it never runs a merge or rebase to `main`. +- Replaying connection-oriented data store mocks (e.g. some PostgreSQL flows) can require additional recorder support; if a replay can't go fully green for reasons outside the test data, the agent reports the blocker rather than masking it by editing the golden output. + +## Skill reference + +The complete skill file to install (`SKILL.md`): + +```markdown +--- +name: keploy-smart-set +description: Keploy SMART-SET MCP workflow — when a smart-set cloud replay is failing (analyze and fix on a branch), or to add new smart tests for code changes. Drives schema_ref-keyed, branch-native record/replay and smart-case/mock edits via the Keploy MCP tools. The agent fixes on a branch and reports; merging to main is the dev's (or CI's) call. +--- + +# Keploy SMART-SET playbook — autonomous developer workflow + +Smart test sets are Keploy's content-addressed test substrate: cases are keyed by `schema_ref` (a hash of the contract shape — method, path, status, content-types, request/response body & query SHAPES), deduped per app, and edited **branch-natively** (main is read-only; edits live on a branch until a human/CI merges). Re-recording is **non-destructive for same-shape refreshes** — it replaces a case's data in place by `schema_ref` and preserves history, so user edits (noise, assertions, obsolete) carry forward; but a re-record that changes the shape lands a new `schema_ref` and you must delete the stale old case (Hard rule 5). + +## Entry points + +The developer will only ever say one of two things to you: + +- **Prompt A:** "my keploy smart-set replay is failing, please analyze and fix it." (local: find the latest failing test_run on the branch) OR "the keploy smart-set pipeline is failing, please analyze and fix it." (CI: extract `test_run_id` from the pasted CI log/dashboard URL). +- **Prompt B:** "Add new keploy smart tests for my changes." + +You handle EVERYTHING else autonomously — discover the app, the branch, the failing run, the code changes. Execute fixes **on a branch**, report what you did, and tell the dev to review & merge. + +## Hard rules + +0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `search_tools`, `get_tool_schema`, `invoke_tool`), the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `listBranches`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. +1. **Branch-first — the substrate ENFORCES it.** Every edit/delete/obsolete/mock-write is branch-scoped; a write without a `branch_id` is rejected. Resolve `branch_id` before any write. +2. **Keploy branch name = git branch name** (`git rev-parse --abbrev-ref HEAD`). Pass it to `create_branch` (find-or-create, idempotent); reuse the returned `branch_id`. Never target the reserved `main` branch. +3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. One match → use it; zero/ambiguous → narrow by compose-service name, else ask once. +4. **schema_ref awareness.** VALUE edits keep `schema_ref` (`noiseJson`, `assertionsJson`, `description`, `mockReferencesJson`, `respBody`). SHAPE edits change it (`requestJson`/`responseJson`); a colliding new ref yields a `SchemaRefConflict` — don't retry blindly. All `*Json` args are STRINGIFIED JSON, not objects. +5. **Re-record replaces in place only if `schema_ref` is unchanged.** If the re-record changes the shape it lands a NEW `schema_ref` as a separate case — then `deleteSmartTestCase` the stale old one. +6. **Your boundary is the branch. NEVER merge or rebase.** After your fix is green on the branch, STOP and report — the dev/CI merges. +7. **Don't ask what you can find out** (`git log`, `git diff`, file reads, api-server calls). +8. **Always end with two dashboard URLs** — the branch diff page and the test-run report page. + +## Discovery (run once at the start) + +1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`. +2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`. +3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`. +4. **`--replay-source smart-set` is MANDATORY on every replay** (the CLI defaults to `latest-release`). +5. **`--freezeTime` is MANDATORY** if the app is built with the Go faketime agent. + +## Routine A — failing smart-set replay (ON A BRANCH) + +- **A1 — Resolve `test_run_id`.** Local → `listTestReports({appId, branch_id, status:"FAILED", limit:5})` exactly once, take `data[0].id` (`status` is case-sensitive). CI → extract from the pasted URL. +- **A2 — Fetch the report**, projected with `failed_only:true` + a `fields=` list (drops ~34k → ~1–2k tokens). For mock failures, a second call with `mock_mismatches_only:true` to get the `mock-N` ids. +- **A3 — Diagnose.** Unconditional working-tree check first (`git status -s`, `git diff`). **Code-change gate:** if a code change touches the same field the report says drifted, that's a regression by default — fix the source, don't bake it into the golden body. Classify each case: + - **Case 1 — App regression.** Edit/revert the application source, rebuild the image, replay. Don't touch the test. + - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change. + - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry. + - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored. +- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3. +- **A5 — Report and STOP.** Diagnosis table + fixes applied + the two dashboard URLs. Tell the dev to review & merge. + +## Routine B — add new smart tests + +- **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path. +- **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID. +- **B3 — Upload onto the branch.** `keploy upload test-set --app --branch --test-set keploy/test-set-N --smart-test-set --name ` (ingests new contracts as `imported-*`, dedup by `schema_ref`). +- **B4 — Validate** with `keploy cloud replay … --replay-source smart-set --freezeTime`. On failure, enter Routine A from A2. +- **B5 — Report and STOP.** Captured/skipped table + replay result + dashboard URLs; the dev merges (merge reconciles `imported-*` → `test-N`). + +## When you MAY ask the dev + +- PAT missing/invalid → ask for a fresh PAT. +- Detached `HEAD`/non-zero from `git rev-parse` → ask for a branch name once. +- `listApps` ambiguous and unnarrowable → list candidates, ask once. +- Pre-flight can't start the app → name the command + error, ask once. +- A `SchemaRefConflict` where both cases are legitimately distinct → surface it; "merge into existing" is the dev's call. + +## Anti-patterns (refuse these) + +- Merging or rebasing the branch to main. +- Editing on `main` (every mutation needs `branch_id`). +- Treating a `SchemaRefConflict` as retryable. +- Re-recording a shape-changed contract but forgetting to delete the stale case. +- Editing handler code on a Case A/B/C (contract-change) failure. +``` diff --git a/versioned_sidebars/version-4.0.0-sidebars.json b/versioned_sidebars/version-4.0.0-sidebars.json index 55f463295..20c90c93c 100644 --- a/versioned_sidebars/version-4.0.0-sidebars.json +++ b/versioned_sidebars/version-4.0.0-sidebars.json @@ -56,6 +56,7 @@ "keploy-cloud/auto-test-generation", "keploy-cloud/deduplication", "keploy-cloud/static-deduplication", + "keploy-cloud/smart-set-agent", { "type": "category", "label": "QuickStarts", From 14046ef0ac0740f94a9602be6af691ed7c58fe7d Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 03:34:20 +0530 Subject: [PATCH 2/6] docs: note MCP tool-search default in the legacy agent flows MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The lean-MCP change makes tool-search the default for all MCP clients: tools/list shows only the meta-tools, and the full catalog is reached by name via get_tool_schema/search_tools + invoke_tool (tools stay callable — hiding affects discovery, not reachability). Update the two legacy agent docs that still assumed the full catalog is listed: - k8s-proxy-llm-workflow.md: add the 'only meta-tools = tool-search mode, not a missing config' guidance to Hard rule 0 (ported from the local keploy agent skill), so an agent doesn't misread the short list as 'MCP not configured'. - agent-test-generation.md: add a tool-search note by the MCP section and fix the 'discovers tools via tools/list' step in How it Works. Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../version-4.0.0/quickstart/k8s-proxy-llm-workflow.md | 2 ++ .../version-4.0.0/running-keploy/agent-test-generation.md | 6 +++++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md index 21e6c6a9b..04c7c59d0 100644 --- a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md +++ b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md @@ -148,6 +148,8 @@ You handle EVERYTHING else autonomously. Discover the app, the branch, the faili 0. **Native MCP transport only — NEVER Python+urllib shell fallback.** Before any other discovery step, verify Keploy MCP tools are loaded in your tool list (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `keploy-create_branch`, etc. — names vary by editor: `keploy-` or `mcp__keploy*__`). If you see ZERO keploy MCP tools, the dev's editor MCP config is missing — ask them once to install `~/.cursor/mcp.json` / `~/.claude.json` (see the page's Step 1) and STOP. **Do NOT fall back to `python3 -c 'import urllib.request; ...'` heredocs hitting the api-server's `/client/v1/mcp` endpoint directly** — diagnosed against the validation harness 2026-06-08: the shell fallback embeds the full JSON-RPC envelope + auth token + response in each shell command, inflating per-turn cache_read by ~55K tokens (3× the cost of native MCP) because every heredoc + every result becomes new context bytes instead of structured tool envelopes the cache can reuse. Native MCP is required, not optional. + **If your tool list shows ONLY the meta-tools** (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`), that is NOT a missing config — the server runs in **tool-search mode** (the default): the full catalog is hidden from `tools/list` to save context, but every tool stays reachable by name. Fetch the schemas for the tools you need in ONE batched `get_tool_schema({names:[...]})` call — you already know the names (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `create_branch`, …) — then run each via `invoke_tool({name, arguments})`. Use `search_tools(query)` ONLY to discover a name you don't already know; never for a name you have (fuzzy search returns roughly 10× the tokens). A hidden tool can also be called directly by exact name if your client allows it — hiding affects discovery, not reachability. Only treat the list as "MCP not configured" when you see NEITHER the meta-tools NOR the keploy tools. + 1. **Branch-first.** Every write to mocks / tests / recordings is branch-scoped. Resolve `branch_id` before any write. If a tool returns "branch_id is required", you skipped this—fix and retry, don't ask the dev. 2. **Keploy branch name = git branch name.** Detect via `git rev-parse --abbrev-ref HEAD`. Pass that string to `create_branch` (find-or-create, idempotent). Reuse the returned `branch_id` for every subsequent write in this session. 3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. Exactly one match → use it. Multiple → pick the one whose name most specifically matches the dev's compose service. Zero matches → ask the dev once. diff --git a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md index d124ba1d4..da6496a19 100644 --- a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md +++ b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md @@ -72,6 +72,10 @@ Keploy provides an MCP (Model Context Protocol) endpoint that gives AI agents ** The MCP endpoint is built into the Keploy API server at `/client/v1/mcp`. Tools are auto-generated from the OpenAPI spec—when the API evolves, tools update automatically. +:::note Tool-search mode (default) +To keep the per-session context small, the server runs in **tool-search mode**: `tools/list` shows only a handful of meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) instead of the full catalog. The tools in the table below are still all available — you reach them by name: `get_tool_schema({names:[...]})` to fetch schemas you already know, `search_tools(query)` to discover ones you don't, then `invoke_tool({name, arguments})` to run them. Hiding affects discovery only, not reachability — a tool can still be called directly by its exact name. +::: + ### Available Tools | Tool | What it does | @@ -180,7 +184,7 @@ Antigravity (formerly Windsurf) supports MCP servers. Add to your Antigravity MC ### How it Works -1. The agent discovers available tools via the MCP `tools/list` method +1. The agent sees the meta-tools on `tools/list` (tool-search mode, above) and reaches the specific tool it needs by name via `get_tool_schema` / `search_tools` + `invoke_tool` 2. When you ask "generate API tests", the agent calls `generate_and_wait` with your OpenAPI spec 3. The tool triggers AI generation on the Keploy platform, polls until complete, and returns the created suites 4. The agent calls `run_and_report` to execute suites against your API From 944d393683d17d82d2d392280d2c6b502d82c9b2 Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 03:36:31 +0530 Subject: [PATCH 3/6] docs: reconcile the MCP-wiring verify step with tool-search mode The 'Verify the MCP wiring' step still said 'you should see ~100 tools; zero means config failed', which contradicted the new Hard rule 0 (only the meta-tools showing is normal tool-search mode). Reworded it to accept either the full catalog OR just the meta-tools as a healthy state; only zero keploy tools means the config didn't load. Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../version-4.0.0/quickstart/k8s-proxy-llm-workflow.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md index 04c7c59d0..d46127f42 100644 --- a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md +++ b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md @@ -97,7 +97,7 @@ After restarting your editor, ask the agent: > _"List your available tools — do you see any prefixed `keploy-` or `mcp__keploy`?"_ -You should see ~100 keploy MCP tools (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.). If you see **zero** keploy tools, the config didn't load — check: +You should see Keploy MCP tools. Depending on the server version you'll see **either** the full catalog (~100 tools: `listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.) **or** — when **tool-search mode** is on (the default on newer servers) — just the handful of meta-tools (`search_tools`, `get_tool_schema`, `invoke_tool`, `get_auth_status`, `get_setup_instructions`). **Either is fine**: the playbook's Hard rule 0 handles both and reaches the full catalog by name. Only if you see **zero** keploy tools did the config fail to load — check: - The file path matches your editor's expected location (see the table above). - The PAT in `Bearer kep_...` has no quotes / trailing whitespace issues. From bcc6b64af2b599c15aa6caf1a9f794b6b560be49 Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 04:18:40 +0530 Subject: [PATCH 4/6] docs(smart-set-agent): make the skill use all the replay flags it documents + verify Cursor path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review feedback on #883: - Flag table vs skill mismatch: the 'Replay flags the agent always uses' table listed 6 flags, but the embedded skill only mandated --replay-source/--freezeTime (so an installed agent would omit --cluster and hit 'no active clusters found'). Gave the skill's Discovery a single canonical replay command listing ALL the flags, and pointed A4/B4 at it — the skill now matches the table. - Cursor install path: verified .cursor/skills//SKILL.md is correct against Cursor's Agent Skills docs (cursor.com/docs/context/skills, auto-discovered) and the existing k8s-proxy-llm-workflow doc; added the docs link and the skills-vs-.cursor/rules distinction so readers can confirm. (The canonical command is inline, not a nested code fence — a fenced block inside the embedded SKILL.md block was closing it early.) Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../version-4.0.0/keploy-cloud/smart-set-agent.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md index e631a1c49..c9b61d050 100644 --- a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md +++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md @@ -57,7 +57,7 @@ The skill is a single Markdown file that teaches your agent the smart-set workfl ### Cursor -Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor loads project skills automatically; the agent invokes it when your prompt matches a failing smart-set replay or a request to add smart tests. +Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor auto-discovers [Agent Skills](https://cursor.com/docs/context/skills) from `.cursor/skills/` and invokes this one on demand when your prompt matches a failing smart-set replay or a request to add smart tests. This is the on-demand **skill** mechanism — distinct from always-on `.cursor/rules/*.mdc` project rules, which would bill the full skill on every turn. ### Claude Code @@ -152,8 +152,7 @@ You handle EVERYTHING else autonomously — discover the app, the branch, the fa 1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`. 2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`. 3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`. -4. **`--replay-source smart-set` is MANDATORY on every replay** (the CLI defaults to `latest-release`). -5. **`--freezeTime` is MANDATORY** if the app is built with the Go faketime agent. +4. **Canonical replay command — use ALL these flags on every replay:** `keploy cloud replay --app --branch-name --cluster --replay-source smart-set --freezeTime --disableReportUpload=false --strict-failure`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--freezeTime` (when the app is built with the Go faketime agent), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases). These match the "Replay flags" table above. ## Routine A — failing smart-set replay (ON A BRANCH) @@ -164,7 +163,7 @@ You handle EVERYTHING else autonomously — discover the app, the branch, the fa - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change. - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry. - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored. -- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3. +- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay with the **canonical command from Discovery (all flags)**, piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3. - **A5 — Report and STOP.** Diagnosis table + fixes applied + the two dashboard URLs. Tell the dev to review & merge. ## Routine B — add new smart tests @@ -172,7 +171,7 @@ You handle EVERYTHING else autonomously — discover the app, the branch, the fa - **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path. - **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID. - **B3 — Upload onto the branch.** `keploy upload test-set --app --branch --test-set keploy/test-set-N --smart-test-set --name ` (ingests new contracts as `imported-*`, dedup by `schema_ref`). -- **B4 — Validate** with `keploy cloud replay … --replay-source smart-set --freezeTime`. On failure, enter Routine A from A2. +- **B4 — Validate** with the **canonical replay command from Discovery (all flags)**. On failure, enter Routine A from A2. - **B5 — Report and STOP.** Captured/skipped table + replay result + dashboard URLs; the dev merges (merge reconciles `imported-*` → `test-N`). ## When you MAY ask the dev From af641cba6911cc85141491e61ce726cf46daf946 Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 04:32:30 +0530 Subject: [PATCH 5/6] docs(smart-set-agent): consistency fixes from review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Meta-tool list: aligned to the same five as the legacy pages (added get_setup_instructions) and keyed tool-search detection on 'none of the domain tools listed' so an extra onboarding tool can't cause misclassification. - --freezeTime: it's conditional (faketime builds only), so dropped the 'always uses' table title and marked it optional in both the table intro and the skill's canonical command ([--freezeTime], 'omit for non-faketime apps') — the doc no longer tells the agent to pass it unconditionally. - Routine B record line: added -c "" (the body had dropped it, so a literal copy recorded nothing); now matches SKILL.md B2. - Dropped listBranches from the tool-name list — no routine uses it (branch is resolved via create_branch). Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../version-4.0.0/keploy-cloud/smart-set-agent.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md index c9b61d050..7e84648fb 100644 --- a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md +++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md @@ -89,14 +89,14 @@ The full skill content is included at the end of this page under [Skill referenc ### Routine B — add new smart tests 1. **Identify changed endpoints** from the git diff. -2. **Capture traffic** with `keploy record --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. +2. **Capture traffic** with `keploy record -c "" --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. 3. **Upload onto the branch** as a smart set (new contracts ingest as `imported-*`, deduplicated by `schema_ref`; existing ones are skipped). 4. **Validate on the branch** with `keploy cloud replay`. 5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`. -## Replay flags the agent always uses +## Replay flags -When the agent runs `keploy cloud replay` for a smart-set app, these flags are required: +When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent: | Flag | Why | | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -137,7 +137,7 @@ You handle EVERYTHING else autonomously — discover the app, the branch, the fa ## Hard rules -0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `search_tools`, `get_tool_schema`, `invoke_tool`), the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `listBranches`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. +0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) and none of the Smart-set names below, the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. 1. **Branch-first — the substrate ENFORCES it.** Every edit/delete/obsolete/mock-write is branch-scoped; a write without a `branch_id` is rejected. Resolve `branch_id` before any write. 2. **Keploy branch name = git branch name** (`git rev-parse --abbrev-ref HEAD`). Pass it to `create_branch` (find-or-create, idempotent); reuse the returned `branch_id`. Never target the reserved `main` branch. 3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. One match → use it; zero/ambiguous → narrow by compose-service name, else ask once. @@ -152,7 +152,7 @@ You handle EVERYTHING else autonomously — discover the app, the branch, the fa 1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`. 2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`. 3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`. -4. **Canonical replay command — use ALL these flags on every replay:** `keploy cloud replay --app --branch-name --cluster --replay-source smart-set --freezeTime --disableReportUpload=false --strict-failure`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--freezeTime` (when the app is built with the Go faketime agent), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases). These match the "Replay flags" table above. +4. **Canonical replay command — these flags on every replay** (drop `--freezeTime` for non-faketime apps): `keploy cloud replay --app --branch-name --cluster --replay-source smart-set --disableReportUpload=false --strict-failure [--freezeTime]`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases), and `--freezeTime` ONLY when the app is built with the Go faketime agent (omit it otherwise). These match the "Replay flags" table above. ## Routine A — failing smart-set replay (ON A BRANCH) From 37c97bccdb586bd216c6fed44a6e81ad32162d8e Mon Sep 17 00:00:00 2001 From: charankamarapu Date: Tue, 30 Jun 2026 04:43:44 +0530 Subject: [PATCH 6/6] docs(smart-set-agent): add --app to flags table + note the replay/upload branch-flag asymmetry MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review follow-ups on #883: - Verified against the CLI: keploy cloud replay scopes a branch with --branch-name (its --branch is a CI test-data field), while keploy upload test-set has only --branch (find-or-create). The asymmetry is real, not a typo — added a note on the --branch-name table row so a reader doesn't 'fix' it. - Added the --app row to the Replay flags table (it's required by every replay/upload but was only in the canonical command). Co-Authored-By: Claude Opus 4.8 Signed-off-by: charankamarapu --- .../keploy-cloud/smart-set-agent.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md index 7e84648fb..d1872dc14 100644 --- a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md +++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md @@ -98,14 +98,15 @@ The full skill content is included at the end of this page under [Skill referenc When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent: -| Flag | Why | -| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | -| `--cluster ` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | -| `--branch-name ` | Replay the branch view, including the agent's edits. | -| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | -| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. | -| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. | +| Flag | Why | +| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `--app ` | The app to replay, as `namespace.deployment`. Required by every `keploy cloud replay` (and `keploy upload test-set`). | +| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | +| `--cluster ` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | +| `--branch-name ` | Replay the branch view, including the agent's edits. **Flag-name asymmetry (not a typo):** `keploy cloud replay` scopes by branch with `--branch-name`, while `keploy upload test-set` (Routine B3) uses `--branch` — different subcommands, different flag names. | +| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | +| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. | +| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. | ## Limitations