|
| 1 | +--- |
| 2 | +id: smart-set-agent |
| 3 | +title: AI Agent for Smart Test Sets |
| 4 | +sidebar_label: AI Agent (Smart Tests) |
| 5 | +description: Let AI coding agents like Claude Code and Cursor diagnose failing smart-set replays and add new smart tests on a branch, using the Keploy MCP tools |
| 6 | +tags: |
| 7 | + - AI Agent |
| 8 | + - Smart Test Set |
| 9 | + - Claude Code |
| 10 | + - Cursor |
| 11 | + - MCP |
| 12 | + - branch |
| 13 | +keywords: |
| 14 | + - smart test set agent |
| 15 | + - Claude Code |
| 16 | + - Cursor |
| 17 | + - Keploy MCP |
| 18 | + - schema_ref |
| 19 | + - branch-native testing |
| 20 | + - failing replay fix |
| 21 | +--- |
| 22 | + |
| 23 | +import ProductTier from '@site/src/components/ProductTier'; |
| 24 | + |
| 25 | +<ProductTier tiers="Enterprise" offerings="Self-Hosted, Dedicated" /> |
| 26 | + |
| 27 | +## Overview |
| 28 | + |
| 29 | +Keploy's [smart test set](/docs/keploy-cloud/deduplication/) is a content-addressed test substrate: cases are keyed by a `schema_ref` (a hash of the contract shape — method, path, status, content-types, and the request/response body & query **shapes**), deduplicated per application, and edited **branch-natively** (the `main` view is read-only; edits live on a branch until a human or CI merges them). |
| 30 | + |
| 31 | +This page describes a ready-made **agent skill** that lets an AI coding assistant (Claude Code, Cursor, and similar) operate that substrate end-to-end. Given one of two plain-English prompts, the agent: |
| 32 | + |
| 33 | +1. **Diagnoses a failing smart-set replay** — finds the app, branch, failing run, and the relevant code changes, classifies each failure, and fixes it **on a branch**. |
| 34 | +2. **Adds new smart tests** for your latest code changes — records traffic, uploads it as a smart set onto the branch, and validates it. |
| 35 | + |
| 36 | +The agent always stops at a **verified branch** and reports back. Merging to `main` stays a human/CI decision — it is intentionally not something the agent does. |
| 37 | + |
| 38 | +## Prerequisites |
| 39 | + |
| 40 | +- Keploy Enterprise with [smart test sets enabled](/docs/keploy-cloud/deduplication/) on the app (`EnableSmartTestSet=true`). |
| 41 | +- The Keploy **MCP server** configured in your agent (see [MCP Server setup](/docs/running-keploy/agent-test-generation/#mcp-server-recommended-for-ai-agents)). The smart-set workflow uses the same `/client/v1/mcp` endpoint and the same authentication. |
| 42 | +- A Personal Access Token (PAT) or API key with access to the app. |
| 43 | +- The application's recording cluster reachable for `keploy cloud replay`. |
| 44 | + |
| 45 | +## What the agent needs from you |
| 46 | + |
| 47 | +The developer only ever says one of two things — the skill handles everything else (discovering the app, branch, failing run, and code changes) autonomously: |
| 48 | + |
| 49 | +| Prompt | Routine | |
| 50 | +| --------------------------------------------------------------------- | ------------------------------------------------ | |
| 51 | +| _"my keploy smart-set replay is failing, please analyze and fix it."_ | Routine A — diagnose & fix on a branch | |
| 52 | +| _"Add new keploy smart tests for my changes."_ | Routine B — record, upload, validate on a branch | |
| 53 | + |
| 54 | +## Installing the skill |
| 55 | + |
| 56 | +The skill is a single Markdown file that teaches your agent the smart-set workflow and guardrails. Drop it into your project so the agent picks it up automatically. |
| 57 | + |
| 58 | +### Cursor |
| 59 | + |
| 60 | +Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor auto-discovers [Agent Skills](https://cursor.com/docs/context/skills) from `.cursor/skills/` and invokes this one on demand when your prompt matches a failing smart-set replay or a request to add smart tests. This is the on-demand **skill** mechanism — distinct from always-on `.cursor/rules/*.mdc` project rules, which would bill the full skill on every turn. |
| 61 | + |
| 62 | +### Claude Code |
| 63 | + |
| 64 | +Save the skill under your project's skills directory (e.g. `.claude/skills/smart-set/SKILL.md`) or reference its content from `CLAUDE.md`. Claude Code reads project-level skill and context files automatically. |
| 65 | + |
| 66 | +The full skill content is included at the end of this page under [Skill reference](#skill-reference). |
| 67 | + |
| 68 | +## How it works |
| 69 | + |
| 70 | +### Key concepts the agent relies on |
| 71 | + |
| 72 | +- **Branch-first, enforced by the substrate.** Every edit, delete, obsolete, or mock write is branch-scoped; a write without a `branch_id` is rejected. The Keploy branch name mirrors your git branch name (`git rev-parse --abbrev-ref HEAD`), and `create_branch` is idempotent (find-or-create). |
| 73 | +- **`schema_ref` identity.** **Value edits** (response body, noise, assertions, mock re-links) keep the same `schema_ref` and are safe in place. **Shape edits** (changing method/path/status/content-type or the body/query structure) recompute the `schema_ref`; if the new ref collides with another case you get a typed `SchemaRefConflict` to resolve, not retry. |
| 74 | +- **Non-destructive re-record.** Re-recording a same-shape contract replaces the case data in place and carries your noise/assertions/obsolete flags forward. A re-record that changes the shape lands a **new** `schema_ref` — the stale case is then deleted so the suite doesn't keep a red duplicate. |
| 75 | +- **The boundary is the branch.** The agent never runs a merge or rebase to `main` — it reports a verified branch and the dashboard URLs, and you (or CI) merge. |
| 76 | + |
| 77 | +### Routine A — fix a failing replay |
| 78 | + |
| 79 | +1. **Resolve the failing run.** For a local failure, the agent fetches the newest `FAILED` report on the branch; for a CI failure, it extracts the `test_run_id` from the pasted CI/dashboard URL. |
| 80 | +2. **Fetch the report**, projected to just the failing cases (a focused field set instead of the full ~34k-token report). |
| 81 | +3. **Classify each failing case** after an unconditional working-tree check (`git status`/`git diff`): |
| 82 | + - **Regression** — code changed and broke a correct contract → the agent **fixes the source and rebuilds**, never edits the test to match a bug. |
| 83 | + - **Value drift** — a field/header/body value legitimately changed → `updateSmartTestCase` (golden body or `noise` for non-deterministic fields). |
| 84 | + - **Shape drift** — the contract structure changed → `updateSmartTestCase` with the new request/response shape, resolving any `SchemaRefConflict`. |
| 85 | + - **Mock drift** — a downstream response changed → `upsertSmartMock` (or re-record when the request itself changed). |
| 86 | +4. **Verify on the branch** via `keploy cloud replay --replay-source smart-set` and iterate (capped retries). |
| 87 | +5. **Report and stop** with a diagnosis table, the fixes applied, and dashboard URLs for the branch diff and run report. |
| 88 | + |
| 89 | +### Routine B — add new smart tests |
| 90 | + |
| 91 | +1. **Identify changed endpoints** from the git diff. |
| 92 | +2. **Capture traffic** with `keploy record -c "<run command>" --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. |
| 93 | +3. **Upload onto the branch** as a smart set (new contracts ingest as `imported-*`, deduplicated by `schema_ref`; existing ones are skipped). |
| 94 | +4. **Validate on the branch** with `keploy cloud replay`. |
| 95 | +5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`. |
| 96 | + |
| 97 | +## Replay flags |
| 98 | + |
| 99 | +When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent: |
| 100 | + |
| 101 | +| Flag | Why | |
| 102 | +| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
| 103 | +| `--app <ns.deployment>` | The app to replay, as `namespace.deployment`. Required by every `keploy cloud replay` (and `keploy upload test-set`). | |
| 104 | +| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | |
| 105 | +| `--cluster <name>` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | |
| 106 | +| `--branch-name <git branch>` | Replay the branch view, including the agent's edits. **Flag-name asymmetry (not a typo):** `keploy cloud replay` scopes by branch with `--branch-name`, while `keploy upload test-set` (Routine B3) uses `--branch` — different subcommands, different flag names. | |
| 107 | +| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | |
| 108 | +| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. | |
| 109 | +| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. | |
| 110 | + |
| 111 | +## Limitations |
| 112 | + |
| 113 | +- The agent's scope ends at a verified branch — it never runs a merge or rebase to `main`. |
| 114 | +- Replaying connection-oriented data store mocks (e.g. some PostgreSQL flows) can require additional recorder support; if a replay can't go fully green for reasons outside the test data, the agent reports the blocker rather than masking it by editing the golden output. |
| 115 | + |
| 116 | +## Skill reference |
| 117 | + |
| 118 | +The complete skill file to install (`SKILL.md`): |
| 119 | + |
| 120 | +```markdown |
| 121 | +--- |
| 122 | +name: keploy-smart-set |
| 123 | +description: Keploy SMART-SET MCP workflow — when a smart-set cloud replay is failing (analyze and fix on a branch), or to add new smart tests for code changes. Drives schema_ref-keyed, branch-native record/replay and smart-case/mock edits via the Keploy MCP tools. The agent fixes on a branch and reports; merging to main is the dev's (or CI's) call. |
| 124 | +--- |
| 125 | + |
| 126 | +# Keploy SMART-SET playbook — autonomous developer workflow |
| 127 | + |
| 128 | +Smart test sets are Keploy's content-addressed test substrate: cases are keyed by `schema_ref` (a hash of the contract shape — method, path, status, content-types, request/response body & query SHAPES), deduped per app, and edited **branch-natively** (main is read-only; edits live on a branch until a human/CI merges). Re-recording is **non-destructive for same-shape refreshes** — it replaces a case's data in place by `schema_ref` and preserves history, so user edits (noise, assertions, obsolete) carry forward; but a re-record that changes the shape lands a new `schema_ref` and you must delete the stale old case (Hard rule 5). |
| 129 | + |
| 130 | +## Entry points |
| 131 | + |
| 132 | +The developer will only ever say one of two things to you: |
| 133 | + |
| 134 | +- **Prompt A:** "my keploy smart-set replay is failing, please analyze and fix it." (local: find the latest failing test_run on the branch) OR "the keploy smart-set pipeline is failing, please analyze and fix it." (CI: extract `test_run_id` from the pasted CI log/dashboard URL). |
| 135 | +- **Prompt B:** "Add new keploy smart tests for my changes." |
| 136 | + |
| 137 | +You handle EVERYTHING else autonomously — discover the app, the branch, the failing run, the code changes. Execute fixes **on a branch**, report what you did, and tell the dev to review & merge. |
| 138 | + |
| 139 | +## Hard rules |
| 140 | + |
| 141 | +0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) and none of the Smart-set names below, the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. |
| 142 | +1. **Branch-first — the substrate ENFORCES it.** Every edit/delete/obsolete/mock-write is branch-scoped; a write without a `branch_id` is rejected. Resolve `branch_id` before any write. |
| 143 | +2. **Keploy branch name = git branch name** (`git rev-parse --abbrev-ref HEAD`). Pass it to `create_branch` (find-or-create, idempotent); reuse the returned `branch_id`. Never target the reserved `main` branch. |
| 144 | +3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: <basename>})`. One match → use it; zero/ambiguous → narrow by compose-service name, else ask once. |
| 145 | +4. **schema_ref awareness.** VALUE edits keep `schema_ref` (`noiseJson`, `assertionsJson`, `description`, `mockReferencesJson`, `respBody`). SHAPE edits change it (`requestJson`/`responseJson`); a colliding new ref yields a `SchemaRefConflict` — don't retry blindly. All `*Json` args are STRINGIFIED JSON, not objects. |
| 146 | +5. **Re-record replaces in place only if `schema_ref` is unchanged.** If the re-record changes the shape it lands a NEW `schema_ref` as a separate case — then `deleteSmartTestCase` the stale old one. |
| 147 | +6. **Your boundary is the branch. NEVER merge or rebase.** After your fix is green on the branch, STOP and report — the dev/CI merges. |
| 148 | +7. **Don't ask what you can find out** (`git log`, `git diff`, file reads, api-server calls). |
| 149 | +8. **Always end with two dashboard URLs** — the branch diff page and the test-run report page. |
| 150 | + |
| 151 | +## Discovery (run once at the start) |
| 152 | + |
| 153 | +1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`. |
| 154 | +2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`. |
| 155 | +3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`. |
| 156 | +4. **Canonical replay command — these flags on every replay** (drop `--freezeTime` for non-faketime apps): `keploy cloud replay --app <ns.deployment> --branch-name <git branch> --cluster <origin.clusterName> --replay-source smart-set --disableReportUpload=false --strict-failure [--freezeTime]`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases), and `--freezeTime` ONLY when the app is built with the Go faketime agent (omit it otherwise). These match the "Replay flags" table above. |
| 157 | + |
| 158 | +## Routine A — failing smart-set replay (ON A BRANCH) |
| 159 | + |
| 160 | +- **A1 — Resolve `test_run_id`.** Local → `listTestReports({appId, branch_id, status:"FAILED", limit:5})` exactly once, take `data[0].id` (`status` is case-sensitive). CI → extract from the pasted URL. |
| 161 | +- **A2 — Fetch the report**, projected with `failed_only:true` + a `fields=` list (drops ~34k → ~1–2k tokens). For mock failures, a second call with `mock_mismatches_only:true` to get the `mock-N` ids. |
| 162 | +- **A3 — Diagnose.** Unconditional working-tree check first (`git status -s`, `git diff`). **Code-change gate:** if a code change touches the same field the report says drifted, that's a regression by default — fix the source, don't bake it into the golden body. Classify each case: |
| 163 | + - **Case 1 — App regression.** Edit/revert the application source, rebuild the image, replay. Don't touch the test. |
| 164 | + - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change. |
| 165 | + - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry. |
| 166 | + - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored. |
| 167 | +- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay with the **canonical command from Discovery (all flags)**, piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3. |
| 168 | +- **A5 — Report and STOP.** Diagnosis table + fixes applied + the two dashboard URLs. Tell the dev to review & merge. |
| 169 | + |
| 170 | +## Routine B — add new smart tests |
| 171 | + |
| 172 | +- **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path. |
| 173 | +- **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "<cmd>" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID. |
| 174 | +- **B3 — Upload onto the branch.** `keploy upload test-set --app <ns.deployment> --branch <git branch> --test-set keploy/test-set-N --smart-test-set --name <name>` (ingests new contracts as `imported-*`, dedup by `schema_ref`). |
| 175 | +- **B4 — Validate** with the **canonical replay command from Discovery (all flags)**. On failure, enter Routine A from A2. |
| 176 | +- **B5 — Report and STOP.** Captured/skipped table + replay result + dashboard URLs; the dev merges (merge reconciles `imported-*` → `test-N`). |
| 177 | + |
| 178 | +## When you MAY ask the dev |
| 179 | + |
| 180 | +- PAT missing/invalid → ask for a fresh PAT. |
| 181 | +- Detached `HEAD`/non-zero from `git rev-parse` → ask for a branch name once. |
| 182 | +- `listApps` ambiguous and unnarrowable → list candidates, ask once. |
| 183 | +- Pre-flight can't start the app → name the command + error, ask once. |
| 184 | +- A `SchemaRefConflict` where both cases are legitimately distinct → surface it; "merge into existing" is the dev's call. |
| 185 | + |
| 186 | +## Anti-patterns (refuse these) |
| 187 | + |
| 188 | +- Merging or rebasing the branch to main. |
| 189 | +- Editing on `main` (every mutation needs `branch_id`). |
| 190 | +- Treating a `SchemaRefConflict` as retryable. |
| 191 | +- Re-recording a shape-changed contract but forgetting to delete the stale case. |
| 192 | +- Editing handler code on a Case A/B/C (contract-change) failure. |
| 193 | +``` |
0 commit comments