diff --git a/vale_styles/config/vocabularies/Base/accept.txt b/vale_styles/config/vocabularies/Base/accept.txt index 6f77a50a3..a516a7f97 100644 --- a/vale_styles/config/vocabularies/Base/accept.txt +++ b/vale_styles/config/vocabularies/Base/accept.txt @@ -26,6 +26,7 @@ [Pp]assthrough [Pp]refill[s]? [Rr]eachability +[Rr]ebase[ds]? [Rr]efcount[s]? [Rr]ehydrate[ds]? [Rr]eplayer diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md new file mode 100644 index 000000000..d1872dc14 --- /dev/null +++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md @@ -0,0 +1,193 @@ +--- +id: smart-set-agent +title: AI Agent for Smart Test Sets +sidebar_label: AI Agent (Smart Tests) +description: Let AI coding agents like Claude Code and Cursor diagnose failing smart-set replays and add new smart tests on a branch, using the Keploy MCP tools +tags: + - AI Agent + - Smart Test Set + - Claude Code + - Cursor + - MCP + - branch +keywords: + - smart test set agent + - Claude Code + - Cursor + - Keploy MCP + - schema_ref + - branch-native testing + - failing replay fix +--- + +import ProductTier from '@site/src/components/ProductTier'; + + + +## Overview + +Keploy's [smart test set](/docs/keploy-cloud/deduplication/) is a content-addressed test substrate: cases are keyed by a `schema_ref` (a hash of the contract shape — method, path, status, content-types, and the request/response body & query **shapes**), deduplicated per application, and edited **branch-natively** (the `main` view is read-only; edits live on a branch until a human or CI merges them). + +This page describes a ready-made **agent skill** that lets an AI coding assistant (Claude Code, Cursor, and similar) operate that substrate end-to-end. Given one of two plain-English prompts, the agent: + +1. **Diagnoses a failing smart-set replay** — finds the app, branch, failing run, and the relevant code changes, classifies each failure, and fixes it **on a branch**. +2. **Adds new smart tests** for your latest code changes — records traffic, uploads it as a smart set onto the branch, and validates it. + +The agent always stops at a **verified branch** and reports back. Merging to `main` stays a human/CI decision — it is intentionally not something the agent does. + +## Prerequisites + +- Keploy Enterprise with [smart test sets enabled](/docs/keploy-cloud/deduplication/) on the app (`EnableSmartTestSet=true`). +- The Keploy **MCP server** configured in your agent (see [MCP Server setup](/docs/running-keploy/agent-test-generation/#mcp-server-recommended-for-ai-agents)). The smart-set workflow uses the same `/client/v1/mcp` endpoint and the same authentication. +- A Personal Access Token (PAT) or API key with access to the app. +- The application's recording cluster reachable for `keploy cloud replay`. + +## What the agent needs from you + +The developer only ever says one of two things — the skill handles everything else (discovering the app, branch, failing run, and code changes) autonomously: + +| Prompt | Routine | +| --------------------------------------------------------------------- | ------------------------------------------------ | +| _"my keploy smart-set replay is failing, please analyze and fix it."_ | Routine A — diagnose & fix on a branch | +| _"Add new keploy smart tests for my changes."_ | Routine B — record, upload, validate on a branch | + +## Installing the skill + +The skill is a single Markdown file that teaches your agent the smart-set workflow and guardrails. Drop it into your project so the agent picks it up automatically. + +### Cursor + +Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor auto-discovers [Agent Skills](https://cursor.com/docs/context/skills) from `.cursor/skills/` and invokes this one on demand when your prompt matches a failing smart-set replay or a request to add smart tests. This is the on-demand **skill** mechanism — distinct from always-on `.cursor/rules/*.mdc` project rules, which would bill the full skill on every turn. + +### Claude Code + +Save the skill under your project's skills directory (e.g. `.claude/skills/smart-set/SKILL.md`) or reference its content from `CLAUDE.md`. Claude Code reads project-level skill and context files automatically. + +The full skill content is included at the end of this page under [Skill reference](#skill-reference). + +## How it works + +### Key concepts the agent relies on + +- **Branch-first, enforced by the substrate.** Every edit, delete, obsolete, or mock write is branch-scoped; a write without a `branch_id` is rejected. The Keploy branch name mirrors your git branch name (`git rev-parse --abbrev-ref HEAD`), and `create_branch` is idempotent (find-or-create). +- **`schema_ref` identity.** **Value edits** (response body, noise, assertions, mock re-links) keep the same `schema_ref` and are safe in place. **Shape edits** (changing method/path/status/content-type or the body/query structure) recompute the `schema_ref`; if the new ref collides with another case you get a typed `SchemaRefConflict` to resolve, not retry. +- **Non-destructive re-record.** Re-recording a same-shape contract replaces the case data in place and carries your noise/assertions/obsolete flags forward. A re-record that changes the shape lands a **new** `schema_ref` — the stale case is then deleted so the suite doesn't keep a red duplicate. +- **The boundary is the branch.** The agent never runs a merge or rebase to `main` — it reports a verified branch and the dashboard URLs, and you (or CI) merge. + +### Routine A — fix a failing replay + +1. **Resolve the failing run.** For a local failure, the agent fetches the newest `FAILED` report on the branch; for a CI failure, it extracts the `test_run_id` from the pasted CI/dashboard URL. +2. **Fetch the report**, projected to just the failing cases (a focused field set instead of the full ~34k-token report). +3. **Classify each failing case** after an unconditional working-tree check (`git status`/`git diff`): + - **Regression** — code changed and broke a correct contract → the agent **fixes the source and rebuilds**, never edits the test to match a bug. + - **Value drift** — a field/header/body value legitimately changed → `updateSmartTestCase` (golden body or `noise` for non-deterministic fields). + - **Shape drift** — the contract structure changed → `updateSmartTestCase` with the new request/response shape, resolving any `SchemaRefConflict`. + - **Mock drift** — a downstream response changed → `upsertSmartMock` (or re-record when the request itself changed). +4. **Verify on the branch** via `keploy cloud replay --replay-source smart-set` and iterate (capped retries). +5. **Report and stop** with a diagnosis table, the fixes applied, and dashboard URLs for the branch diff and run report. + +### Routine B — add new smart tests + +1. **Identify changed endpoints** from the git diff. +2. **Capture traffic** with `keploy record -c "" --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint. +3. **Upload onto the branch** as a smart set (new contracts ingest as `imported-*`, deduplicated by `schema_ref`; existing ones are skipped). +4. **Validate on the branch** with `keploy cloud replay`. +5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`. + +## Replay flags + +When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent: + +| Flag | Why | +| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `--app ` | The app to replay, as `namespace.deployment`. Required by every `keploy cloud replay` (and `keploy upload test-set`). | +| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. | +| `--cluster ` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. | +| `--branch-name ` | Replay the branch view, including the agent's edits. **Flag-name asymmetry (not a typo):** `keploy cloud replay` scopes by branch with `--branch-name`, while `keploy upload test-set` (Routine B3) uses `--branch` — different subcommands, different flag names. | +| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). | +| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. | +| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. | + +## Limitations + +- The agent's scope ends at a verified branch — it never runs a merge or rebase to `main`. +- Replaying connection-oriented data store mocks (e.g. some PostgreSQL flows) can require additional recorder support; if a replay can't go fully green for reasons outside the test data, the agent reports the blocker rather than masking it by editing the golden output. + +## Skill reference + +The complete skill file to install (`SKILL.md`): + +```markdown +--- +name: keploy-smart-set +description: Keploy SMART-SET MCP workflow — when a smart-set cloud replay is failing (analyze and fix on a branch), or to add new smart tests for code changes. Drives schema_ref-keyed, branch-native record/replay and smart-case/mock edits via the Keploy MCP tools. The agent fixes on a branch and reports; merging to main is the dev's (or CI's) call. +--- + +# Keploy SMART-SET playbook — autonomous developer workflow + +Smart test sets are Keploy's content-addressed test substrate: cases are keyed by `schema_ref` (a hash of the contract shape — method, path, status, content-types, request/response body & query SHAPES), deduped per app, and edited **branch-natively** (main is read-only; edits live on a branch until a human/CI merges). Re-recording is **non-destructive for same-shape refreshes** — it replaces a case's data in place by `schema_ref` and preserves history, so user edits (noise, assertions, obsolete) carry forward; but a re-record that changes the shape lands a new `schema_ref` and you must delete the stale old case (Hard rule 5). + +## Entry points + +The developer will only ever say one of two things to you: + +- **Prompt A:** "my keploy smart-set replay is failing, please analyze and fix it." (local: find the latest failing test_run on the branch) OR "the keploy smart-set pipeline is failing, please analyze and fix it." (CI: extract `test_run_id` from the pasted CI log/dashboard URL). +- **Prompt B:** "Add new keploy smart tests for my changes." + +You handle EVERYTHING else autonomously — discover the app, the branch, the failing run, the code changes. Execute fixes **on a branch**, report what you did, and tell the dev to review & merge. + +## Hard rules + +0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) and none of the Smart-set names below, the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`. +1. **Branch-first — the substrate ENFORCES it.** Every edit/delete/obsolete/mock-write is branch-scoped; a write without a `branch_id` is rejected. Resolve `branch_id` before any write. +2. **Keploy branch name = git branch name** (`git rev-parse --abbrev-ref HEAD`). Pass it to `create_branch` (find-or-create, idempotent); reuse the returned `branch_id`. Never target the reserved `main` branch. +3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. One match → use it; zero/ambiguous → narrow by compose-service name, else ask once. +4. **schema_ref awareness.** VALUE edits keep `schema_ref` (`noiseJson`, `assertionsJson`, `description`, `mockReferencesJson`, `respBody`). SHAPE edits change it (`requestJson`/`responseJson`); a colliding new ref yields a `SchemaRefConflict` — don't retry blindly. All `*Json` args are STRINGIFIED JSON, not objects. +5. **Re-record replaces in place only if `schema_ref` is unchanged.** If the re-record changes the shape it lands a NEW `schema_ref` as a separate case — then `deleteSmartTestCase` the stale old one. +6. **Your boundary is the branch. NEVER merge or rebase.** After your fix is green on the branch, STOP and report — the dev/CI merges. +7. **Don't ask what you can find out** (`git log`, `git diff`, file reads, api-server calls). +8. **Always end with two dashboard URLs** — the branch diff page and the test-run report page. + +## Discovery (run once at the start) + +1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`. +2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`. +3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`. +4. **Canonical replay command — these flags on every replay** (drop `--freezeTime` for non-faketime apps): `keploy cloud replay --app --branch-name --cluster --replay-source smart-set --disableReportUpload=false --strict-failure [--freezeTime]`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases), and `--freezeTime` ONLY when the app is built with the Go faketime agent (omit it otherwise). These match the "Replay flags" table above. + +## Routine A — failing smart-set replay (ON A BRANCH) + +- **A1 — Resolve `test_run_id`.** Local → `listTestReports({appId, branch_id, status:"FAILED", limit:5})` exactly once, take `data[0].id` (`status` is case-sensitive). CI → extract from the pasted URL. +- **A2 — Fetch the report**, projected with `failed_only:true` + a `fields=` list (drops ~34k → ~1–2k tokens). For mock failures, a second call with `mock_mismatches_only:true` to get the `mock-N` ids. +- **A3 — Diagnose.** Unconditional working-tree check first (`git status -s`, `git diff`). **Code-change gate:** if a code change touches the same field the report says drifted, that's a regression by default — fix the source, don't bake it into the golden body. Classify each case: + - **Case 1 — App regression.** Edit/revert the application source, rebuild the image, replay. Don't touch the test. + - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change. + - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry. + - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored. +- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay with the **canonical command from Discovery (all flags)**, piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3. +- **A5 — Report and STOP.** Diagnosis table + fixes applied + the two dashboard URLs. Tell the dev to review & merge. + +## Routine B — add new smart tests + +- **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path. +- **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID. +- **B3 — Upload onto the branch.** `keploy upload test-set --app --branch --test-set keploy/test-set-N --smart-test-set --name ` (ingests new contracts as `imported-*`, dedup by `schema_ref`). +- **B4 — Validate** with the **canonical replay command from Discovery (all flags)**. On failure, enter Routine A from A2. +- **B5 — Report and STOP.** Captured/skipped table + replay result + dashboard URLs; the dev merges (merge reconciles `imported-*` → `test-N`). + +## When you MAY ask the dev + +- PAT missing/invalid → ask for a fresh PAT. +- Detached `HEAD`/non-zero from `git rev-parse` → ask for a branch name once. +- `listApps` ambiguous and unnarrowable → list candidates, ask once. +- Pre-flight can't start the app → name the command + error, ask once. +- A `SchemaRefConflict` where both cases are legitimately distinct → surface it; "merge into existing" is the dev's call. + +## Anti-patterns (refuse these) + +- Merging or rebasing the branch to main. +- Editing on `main` (every mutation needs `branch_id`). +- Treating a `SchemaRefConflict` as retryable. +- Re-recording a shape-changed contract but forgetting to delete the stale case. +- Editing handler code on a Case A/B/C (contract-change) failure. +``` diff --git a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md index 21e6c6a9b..d46127f42 100644 --- a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md +++ b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md @@ -97,7 +97,7 @@ After restarting your editor, ask the agent: > _"List your available tools — do you see any prefixed `keploy-` or `mcp__keploy`?"_ -You should see ~100 keploy MCP tools (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.). If you see **zero** keploy tools, the config didn't load — check: +You should see Keploy MCP tools. Depending on the server version you'll see **either** the full catalog (~100 tools: `listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.) **or** — when **tool-search mode** is on (the default on newer servers) — just the handful of meta-tools (`search_tools`, `get_tool_schema`, `invoke_tool`, `get_auth_status`, `get_setup_instructions`). **Either is fine**: the playbook's Hard rule 0 handles both and reaches the full catalog by name. Only if you see **zero** keploy tools did the config fail to load — check: - The file path matches your editor's expected location (see the table above). - The PAT in `Bearer kep_...` has no quotes / trailing whitespace issues. @@ -148,6 +148,8 @@ You handle EVERYTHING else autonomously. Discover the app, the branch, the faili 0. **Native MCP transport only — NEVER Python+urllib shell fallback.** Before any other discovery step, verify Keploy MCP tools are loaded in your tool list (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `keploy-create_branch`, etc. — names vary by editor: `keploy-` or `mcp__keploy*__`). If you see ZERO keploy MCP tools, the dev's editor MCP config is missing — ask them once to install `~/.cursor/mcp.json` / `~/.claude.json` (see the page's Step 1) and STOP. **Do NOT fall back to `python3 -c 'import urllib.request; ...'` heredocs hitting the api-server's `/client/v1/mcp` endpoint directly** — diagnosed against the validation harness 2026-06-08: the shell fallback embeds the full JSON-RPC envelope + auth token + response in each shell command, inflating per-turn cache_read by ~55K tokens (3× the cost of native MCP) because every heredoc + every result becomes new context bytes instead of structured tool envelopes the cache can reuse. Native MCP is required, not optional. + **If your tool list shows ONLY the meta-tools** (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`), that is NOT a missing config — the server runs in **tool-search mode** (the default): the full catalog is hidden from `tools/list` to save context, but every tool stays reachable by name. Fetch the schemas for the tools you need in ONE batched `get_tool_schema({names:[...]})` call — you already know the names (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `create_branch`, …) — then run each via `invoke_tool({name, arguments})`. Use `search_tools(query)` ONLY to discover a name you don't already know; never for a name you have (fuzzy search returns roughly 10× the tokens). A hidden tool can also be called directly by exact name if your client allows it — hiding affects discovery, not reachability. Only treat the list as "MCP not configured" when you see NEITHER the meta-tools NOR the keploy tools. + 1. **Branch-first.** Every write to mocks / tests / recordings is branch-scoped. Resolve `branch_id` before any write. If a tool returns "branch_id is required", you skipped this—fix and retry, don't ask the dev. 2. **Keploy branch name = git branch name.** Detect via `git rev-parse --abbrev-ref HEAD`. Pass that string to `create_branch` (find-or-create, idempotent). Reuse the returned `branch_id` for every subsequent write in this session. 3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. Exactly one match → use it. Multiple → pick the one whose name most specifically matches the dev's compose service. Zero matches → ask the dev once. diff --git a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md index d124ba1d4..da6496a19 100644 --- a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md +++ b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md @@ -72,6 +72,10 @@ Keploy provides an MCP (Model Context Protocol) endpoint that gives AI agents ** The MCP endpoint is built into the Keploy API server at `/client/v1/mcp`. Tools are auto-generated from the OpenAPI spec—when the API evolves, tools update automatically. +:::note Tool-search mode (default) +To keep the per-session context small, the server runs in **tool-search mode**: `tools/list` shows only a handful of meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) instead of the full catalog. The tools in the table below are still all available — you reach them by name: `get_tool_schema({names:[...]})` to fetch schemas you already know, `search_tools(query)` to discover ones you don't, then `invoke_tool({name, arguments})` to run them. Hiding affects discovery only, not reachability — a tool can still be called directly by its exact name. +::: + ### Available Tools | Tool | What it does | @@ -180,7 +184,7 @@ Antigravity (formerly Windsurf) supports MCP servers. Add to your Antigravity MC ### How it Works -1. The agent discovers available tools via the MCP `tools/list` method +1. The agent sees the meta-tools on `tools/list` (tool-search mode, above) and reaches the specific tool it needs by name via `get_tool_schema` / `search_tools` + `invoke_tool` 2. When you ask "generate API tests", the agent calls `generate_and_wait` with your OpenAPI spec 3. The tool triggers AI generation on the Keploy platform, polls until complete, and returns the created suites 4. The agent calls `run_and_report` to execute suites against your API diff --git a/versioned_sidebars/version-4.0.0-sidebars.json b/versioned_sidebars/version-4.0.0-sidebars.json index 55f463295..20c90c93c 100644 --- a/versioned_sidebars/version-4.0.0-sidebars.json +++ b/versioned_sidebars/version-4.0.0-sidebars.json @@ -56,6 +56,7 @@ "keploy-cloud/auto-test-generation", "keploy-cloud/deduplication", "keploy-cloud/static-deduplication", + "keploy-cloud/smart-set-agent", { "type": "category", "label": "QuickStarts",