diff --git a/vale_styles/config/vocabularies/Base/accept.txt b/vale_styles/config/vocabularies/Base/accept.txt
index 6f77a50a3..a516a7f97 100644
--- a/vale_styles/config/vocabularies/Base/accept.txt
+++ b/vale_styles/config/vocabularies/Base/accept.txt
@@ -26,6 +26,7 @@
[Pp]assthrough
[Pp]refill[s]?
[Rr]eachability
+[Rr]ebase[ds]?
[Rr]efcount[s]?
[Rr]ehydrate[ds]?
[Rr]eplayer
diff --git a/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md
new file mode 100644
index 000000000..d1872dc14
--- /dev/null
+++ b/versioned_docs/version-4.0.0/keploy-cloud/smart-set-agent.md
@@ -0,0 +1,193 @@
+---
+id: smart-set-agent
+title: AI Agent for Smart Test Sets
+sidebar_label: AI Agent (Smart Tests)
+description: Let AI coding agents like Claude Code and Cursor diagnose failing smart-set replays and add new smart tests on a branch, using the Keploy MCP tools
+tags:
+ - AI Agent
+ - Smart Test Set
+ - Claude Code
+ - Cursor
+ - MCP
+ - branch
+keywords:
+ - smart test set agent
+ - Claude Code
+ - Cursor
+ - Keploy MCP
+ - schema_ref
+ - branch-native testing
+ - failing replay fix
+---
+
+import ProductTier from '@site/src/components/ProductTier';
+
+
+
+## Overview
+
+Keploy's [smart test set](/docs/keploy-cloud/deduplication/) is a content-addressed test substrate: cases are keyed by a `schema_ref` (a hash of the contract shape — method, path, status, content-types, and the request/response body & query **shapes**), deduplicated per application, and edited **branch-natively** (the `main` view is read-only; edits live on a branch until a human or CI merges them).
+
+This page describes a ready-made **agent skill** that lets an AI coding assistant (Claude Code, Cursor, and similar) operate that substrate end-to-end. Given one of two plain-English prompts, the agent:
+
+1. **Diagnoses a failing smart-set replay** — finds the app, branch, failing run, and the relevant code changes, classifies each failure, and fixes it **on a branch**.
+2. **Adds new smart tests** for your latest code changes — records traffic, uploads it as a smart set onto the branch, and validates it.
+
+The agent always stops at a **verified branch** and reports back. Merging to `main` stays a human/CI decision — it is intentionally not something the agent does.
+
+## Prerequisites
+
+- Keploy Enterprise with [smart test sets enabled](/docs/keploy-cloud/deduplication/) on the app (`EnableSmartTestSet=true`).
+- The Keploy **MCP server** configured in your agent (see [MCP Server setup](/docs/running-keploy/agent-test-generation/#mcp-server-recommended-for-ai-agents)). The smart-set workflow uses the same `/client/v1/mcp` endpoint and the same authentication.
+- A Personal Access Token (PAT) or API key with access to the app.
+- The application's recording cluster reachable for `keploy cloud replay`.
+
+## What the agent needs from you
+
+The developer only ever says one of two things — the skill handles everything else (discovering the app, branch, failing run, and code changes) autonomously:
+
+| Prompt | Routine |
+| --------------------------------------------------------------------- | ------------------------------------------------ |
+| _"my keploy smart-set replay is failing, please analyze and fix it."_ | Routine A — diagnose & fix on a branch |
+| _"Add new keploy smart tests for my changes."_ | Routine B — record, upload, validate on a branch |
+
+## Installing the skill
+
+The skill is a single Markdown file that teaches your agent the smart-set workflow and guardrails. Drop it into your project so the agent picks it up automatically.
+
+### Cursor
+
+Save the skill as `.cursor/skills/smart-set/SKILL.md` in your project root. Cursor auto-discovers [Agent Skills](https://cursor.com/docs/context/skills) from `.cursor/skills/` and invokes this one on demand when your prompt matches a failing smart-set replay or a request to add smart tests. This is the on-demand **skill** mechanism — distinct from always-on `.cursor/rules/*.mdc` project rules, which would bill the full skill on every turn.
+
+### Claude Code
+
+Save the skill under your project's skills directory (e.g. `.claude/skills/smart-set/SKILL.md`) or reference its content from `CLAUDE.md`. Claude Code reads project-level skill and context files automatically.
+
+The full skill content is included at the end of this page under [Skill reference](#skill-reference).
+
+## How it works
+
+### Key concepts the agent relies on
+
+- **Branch-first, enforced by the substrate.** Every edit, delete, obsolete, or mock write is branch-scoped; a write without a `branch_id` is rejected. The Keploy branch name mirrors your git branch name (`git rev-parse --abbrev-ref HEAD`), and `create_branch` is idempotent (find-or-create).
+- **`schema_ref` identity.** **Value edits** (response body, noise, assertions, mock re-links) keep the same `schema_ref` and are safe in place. **Shape edits** (changing method/path/status/content-type or the body/query structure) recompute the `schema_ref`; if the new ref collides with another case you get a typed `SchemaRefConflict` to resolve, not retry.
+- **Non-destructive re-record.** Re-recording a same-shape contract replaces the case data in place and carries your noise/assertions/obsolete flags forward. A re-record that changes the shape lands a **new** `schema_ref` — the stale case is then deleted so the suite doesn't keep a red duplicate.
+- **The boundary is the branch.** The agent never runs a merge or rebase to `main` — it reports a verified branch and the dashboard URLs, and you (or CI) merge.
+
+### Routine A — fix a failing replay
+
+1. **Resolve the failing run.** For a local failure, the agent fetches the newest `FAILED` report on the branch; for a CI failure, it extracts the `test_run_id` from the pasted CI/dashboard URL.
+2. **Fetch the report**, projected to just the failing cases (a focused field set instead of the full ~34k-token report).
+3. **Classify each failing case** after an unconditional working-tree check (`git status`/`git diff`):
+ - **Regression** — code changed and broke a correct contract → the agent **fixes the source and rebuilds**, never edits the test to match a bug.
+ - **Value drift** — a field/header/body value legitimately changed → `updateSmartTestCase` (golden body or `noise` for non-deterministic fields).
+ - **Shape drift** — the contract structure changed → `updateSmartTestCase` with the new request/response shape, resolving any `SchemaRefConflict`.
+ - **Mock drift** — a downstream response changed → `upsertSmartMock` (or re-record when the request itself changed).
+4. **Verify on the branch** via `keploy cloud replay --replay-source smart-set` and iterate (capped retries).
+5. **Report and stop** with a diagnosis table, the fixes applied, and dashboard URLs for the branch diff and run report.
+
+### Routine B — add new smart tests
+
+1. **Identify changed endpoints** from the git diff.
+2. **Capture traffic** with `keploy record -c "" --sync --disable-mapping=false`, driving one realistic request per new/changed endpoint.
+3. **Upload onto the branch** as a smart set (new contracts ingest as `imported-*`, deduplicated by `schema_ref`; existing ones are skipped).
+4. **Validate on the branch** with `keploy cloud replay`.
+5. **Report and stop** — you review the branch diff and merge; merge reconciles `imported-*` to stable `test-N`.
+
+## Replay flags
+
+When the agent runs `keploy cloud replay` for a smart-set app, these flags are required on **every** replay — except `--freezeTime`, which is added **only** when the app is built with the Go `faketime` agent:
+
+| Flag | Why |
+| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `--app ` | The app to replay, as `namespace.deployment`. Required by every `keploy cloud replay` (and `keploy upload test-set`). |
+| `--replay-source smart-set` | Replay the deduplicated smart-set cases. Without it the CLI defaults to `latest-release` and replays raw per-release recordings instead. |
+| `--cluster ` | The recording cluster (`origin.clusterName`); a `no active clusters found` error usually means this flag was omitted. |
+| `--branch-name ` | Replay the branch view, including the agent's edits. **Flag-name asymmetry (not a typo):** `keploy cloud replay` scopes by branch with `--branch-name`, while `keploy upload test-set` (Routine B3) uses `--branch` — different subcommands, different flag names. |
+| `--freezeTime` | Required when the app is built with the Go `faketime` agent, so `time.Now()` matches the recording and timestamp-bearing mocks still match. See [Time freezing](/docs/keploy-cloud/time-freezing/). |
+| `--disableReportUpload=false` | Write the `/tr` report row so the run is visible on the dashboard. |
+| `--strict-failure` | Keep response-divergent cases failing instead of silently demoting them. |
+
+## Limitations
+
+- The agent's scope ends at a verified branch — it never runs a merge or rebase to `main`.
+- Replaying connection-oriented data store mocks (e.g. some PostgreSQL flows) can require additional recorder support; if a replay can't go fully green for reasons outside the test data, the agent reports the blocker rather than masking it by editing the golden output.
+
+## Skill reference
+
+The complete skill file to install (`SKILL.md`):
+
+```markdown
+---
+name: keploy-smart-set
+description: Keploy SMART-SET MCP workflow — when a smart-set cloud replay is failing (analyze and fix on a branch), or to add new smart tests for code changes. Drives schema_ref-keyed, branch-native record/replay and smart-case/mock edits via the Keploy MCP tools. The agent fixes on a branch and reports; merging to main is the dev's (or CI's) call.
+---
+
+# Keploy SMART-SET playbook — autonomous developer workflow
+
+Smart test sets are Keploy's content-addressed test substrate: cases are keyed by `schema_ref` (a hash of the contract shape — method, path, status, content-types, request/response body & query SHAPES), deduped per app, and edited **branch-natively** (main is read-only; edits live on a branch until a human/CI merges). Re-recording is **non-destructive for same-shape refreshes** — it replaces a case's data in place by `schema_ref` and preserves history, so user edits (noise, assertions, obsolete) carry forward; but a re-record that changes the shape lands a new `schema_ref` and you must delete the stale old case (Hard rule 5).
+
+## Entry points
+
+The developer will only ever say one of two things to you:
+
+- **Prompt A:** "my keploy smart-set replay is failing, please analyze and fix it." (local: find the latest failing test_run on the branch) OR "the keploy smart-set pipeline is failing, please analyze and fix it." (CI: extract `test_run_id` from the pasted CI log/dashboard URL).
+- **Prompt B:** "Add new keploy smart tests for my changes."
+
+You handle EVERYTHING else autonomously — discover the app, the branch, the failing run, the code changes. Execute fixes **on a branch**, report what you did, and tell the dev to review & merge.
+
+## Hard rules
+
+0. **Native MCP transport only.** Verify the Keploy MCP tools are loaded. If your tool list shows only the meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) and none of the Smart-set names below, the real tools are hidden server-side to save context — fetch their schemas in ONE batched `get_tool_schema({names:[…]})` call, then run each via `invoke_tool({name, arguments})`. Smart-set names: `listApps`, `getApp`, `create_branch`, `listTestReports`, `getTestReportFull`, `listSmartTestCases`, `updateSmartTestCase`, `setSmartTestCaseObsolete`, `deleteSmartTestCase`, `upsertSmartMock`, `deleteSmartMock`, `getMock`, `uploadRecordingBundle`.
+1. **Branch-first — the substrate ENFORCES it.** Every edit/delete/obsolete/mock-write is branch-scoped; a write without a `branch_id` is rejected. Resolve `branch_id` before any write.
+2. **Keploy branch name = git branch name** (`git rev-parse --abbrev-ref HEAD`). Pass it to `create_branch` (find-or-create, idempotent); reuse the returned `branch_id`. Never target the reserved `main` branch.
+3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. One match → use it; zero/ambiguous → narrow by compose-service name, else ask once.
+4. **schema_ref awareness.** VALUE edits keep `schema_ref` (`noiseJson`, `assertionsJson`, `description`, `mockReferencesJson`, `respBody`). SHAPE edits change it (`requestJson`/`responseJson`); a colliding new ref yields a `SchemaRefConflict` — don't retry blindly. All `*Json` args are STRINGIFIED JSON, not objects.
+5. **Re-record replaces in place only if `schema_ref` is unchanged.** If the re-record changes the shape it lands a NEW `schema_ref` as a separate case — then `deleteSmartTestCase` the stale old one.
+6. **Your boundary is the branch. NEVER merge or rebase.** After your fix is green on the branch, STOP and report — the dev/CI merges.
+7. **Don't ask what you can find out** (`git log`, `git diff`, file reads, api-server calls).
+8. **Always end with two dashboard URLs** — the branch diff page and the test-run report page.
+
+## Discovery (run once at the start)
+
+1. **App.** `basename $(pwd)` → `listApps({q})` → cache `app_id`.
+2. **Branch.** `git rev-parse --abbrev-ref HEAD` → `create_branch({app_id, name})` → cache `branch_id`.
+3. **App context (once).** `getApp({appId, fields:["name","namespace","deployment","origin.clusterName","origin.namespace","origin.deployment"]})` — you need `origin.clusterName` for `--cluster`.
+4. **Canonical replay command — these flags on every replay** (drop `--freezeTime` for non-faketime apps): `keploy cloud replay --app --branch-name --cluster --replay-source smart-set --disableReportUpload=false --strict-failure [--freezeTime]`. Why each: `--replay-source smart-set` (the CLI defaults to latest-release), `--cluster` (from origin.clusterName; omit it and the CLI errors "no active clusters found"), `--disableReportUpload=false` (writes the /tr report row so the run shows on the dashboard), `--strict-failure` (don't silently demote response-divergent cases), and `--freezeTime` ONLY when the app is built with the Go faketime agent (omit it otherwise). These match the "Replay flags" table above.
+
+## Routine A — failing smart-set replay (ON A BRANCH)
+
+- **A1 — Resolve `test_run_id`.** Local → `listTestReports({appId, branch_id, status:"FAILED", limit:5})` exactly once, take `data[0].id` (`status` is case-sensitive). CI → extract from the pasted URL.
+- **A2 — Fetch the report**, projected with `failed_only:true` + a `fields=` list (drops ~34k → ~1–2k tokens). For mock failures, a second call with `mock_mismatches_only:true` to get the `mock-N` ids.
+- **A3 — Diagnose.** Unconditional working-tree check first (`git status -s`, `git diff`). **Code-change gate:** if a code change touches the same field the report says drifted, that's a regression by default — fix the source, don't bake it into the golden body. Classify each case:
+ - **Case 1 — App regression.** Edit/revert the application source, rebuild the image, replay. Don't touch the test.
+ - **Case A — Value drift.** `updateSmartTestCase` — `noiseJson` for non-deterministic fields, `respBody` for a real value change.
+ - **Case B — Shape drift.** `updateSmartTestCase` with `requestJson`/`responseJson`; resolve a `SchemaRefConflict` by obsoleting/deleting the twin, never by blind retry.
+ - **Case C — Mock drift.** `upsertSmartMock` for an in-place value drift; re-record when the outbound request changed or the match key can't be hand-authored.
+- **A4 — Verify on the branch.** Rebuild first after a Case 1 edit. Replay with the **canonical command from Discovery (all flags)**, piping output through `tail`/`grep`. All cases failing "connection reset"/status 0 = a stale leftover replay container on the app port (`docker rm -f` it), not a code bug. Cap retries at 3.
+- **A5 — Report and STOP.** Diagnosis table + fixes applied + the two dashboard URLs. Tell the dev to review & merge.
+
+## Routine B — add new smart tests
+
+- **B1 — Identify changes.** `git diff origin/main...HEAD --name-only`, filter to HTTP handlers, list each endpoint's method+path.
+- **B2 — Capture traffic.** Pre-flight the run command, then `keploy record -c "" --sync --disable-mapping=false` (both flags mandatory), drive one realistic request per endpoint, stop the recorder by PID.
+- **B3 — Upload onto the branch.** `keploy upload test-set --app --branch --test-set keploy/test-set-N --smart-test-set --name ` (ingests new contracts as `imported-*`, dedup by `schema_ref`).
+- **B4 — Validate** with the **canonical replay command from Discovery (all flags)**. On failure, enter Routine A from A2.
+- **B5 — Report and STOP.** Captured/skipped table + replay result + dashboard URLs; the dev merges (merge reconciles `imported-*` → `test-N`).
+
+## When you MAY ask the dev
+
+- PAT missing/invalid → ask for a fresh PAT.
+- Detached `HEAD`/non-zero from `git rev-parse` → ask for a branch name once.
+- `listApps` ambiguous and unnarrowable → list candidates, ask once.
+- Pre-flight can't start the app → name the command + error, ask once.
+- A `SchemaRefConflict` where both cases are legitimately distinct → surface it; "merge into existing" is the dev's call.
+
+## Anti-patterns (refuse these)
+
+- Merging or rebasing the branch to main.
+- Editing on `main` (every mutation needs `branch_id`).
+- Treating a `SchemaRefConflict` as retryable.
+- Re-recording a shape-changed contract but forgetting to delete the stale case.
+- Editing handler code on a Case A/B/C (contract-change) failure.
+```
diff --git a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
index 21e6c6a9b..d46127f42 100644
--- a/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
+++ b/versioned_docs/version-4.0.0/quickstart/k8s-proxy-llm-workflow.md
@@ -97,7 +97,7 @@ After restarting your editor, ask the agent:
> _"List your available tools — do you see any prefixed `keploy-` or `mcp__keploy`?"_
-You should see ~100 keploy MCP tools (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.). If you see **zero** keploy tools, the config didn't load — check:
+You should see Keploy MCP tools. Depending on the server version you'll see **either** the full catalog (~100 tools: `listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, etc.) **or** — when **tool-search mode** is on (the default on newer servers) — just the handful of meta-tools (`search_tools`, `get_tool_schema`, `invoke_tool`, `get_auth_status`, `get_setup_instructions`). **Either is fine**: the playbook's Hard rule 0 handles both and reaches the full catalog by name. Only if you see **zero** keploy tools did the config fail to load — check:
- The file path matches your editor's expected location (see the table above).
- The PAT in `Bearer kep_...` has no quotes / trailing whitespace issues.
@@ -148,6 +148,8 @@ You handle EVERYTHING else autonomously. Discover the app, the branch, the faili
0. **Native MCP transport only — NEVER Python+urllib shell fallback.** Before any other discovery step, verify Keploy MCP tools are loaded in your tool list (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `keploy-create_branch`, etc. — names vary by editor: `keploy-` or `mcp__keploy*__`). If you see ZERO keploy MCP tools, the dev's editor MCP config is missing — ask them once to install `~/.cursor/mcp.json` / `~/.claude.json` (see the page's Step 1) and STOP. **Do NOT fall back to `python3 -c 'import urllib.request; ...'` heredocs hitting the api-server's `/client/v1/mcp` endpoint directly** — diagnosed against the validation harness 2026-06-08: the shell fallback embeds the full JSON-RPC envelope + auth token + response in each shell command, inflating per-turn cache_read by ~55K tokens (3× the cost of native MCP) because every heredoc + every result becomes new context bytes instead of structured tool envelopes the cache can reuse. Native MCP is required, not optional.
+ **If your tool list shows ONLY the meta-tools** (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`), that is NOT a missing config — the server runs in **tool-search mode** (the default): the full catalog is hidden from `tools/list` to save context, but every tool stays reachable by name. Fetch the schemas for the tools you need in ONE batched `get_tool_schema({names:[...]})` call — you already know the names (`listApps`, `getApp`, `listTestReports`, `getTestReportFull`, `getMock`, `getTestCase`, `updateTestCase`, `create_branch`, …) — then run each via `invoke_tool({name, arguments})`. Use `search_tools(query)` ONLY to discover a name you don't already know; never for a name you have (fuzzy search returns roughly 10× the tokens). A hidden tool can also be called directly by exact name if your client allows it — hiding affects discovery, not reachability. Only treat the list as "MCP not configured" when you see NEITHER the meta-tools NOR the keploy tools.
+
1. **Branch-first.** Every write to mocks / tests / recordings is branch-scoped. Resolve `branch_id` before any write. If a tool returns "branch_id is required", you skipped this—fix and retry, don't ask the dev.
2. **Keploy branch name = git branch name.** Detect via `git rev-parse --abbrev-ref HEAD`. Pass that string to `create_branch` (find-or-create, idempotent). Reuse the returned `branch_id` for every subsequent write in this session.
3. **App resolution from cwd.** `basename $(pwd)` → `listApps({q: })`. Exactly one match → use it. Multiple → pick the one whose name most specifically matches the dev's compose service. Zero matches → ask the dev once.
diff --git a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md
index d124ba1d4..da6496a19 100644
--- a/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md
+++ b/versioned_docs/version-4.0.0/running-keploy/agent-test-generation.md
@@ -72,6 +72,10 @@ Keploy provides an MCP (Model Context Protocol) endpoint that gives AI agents **
The MCP endpoint is built into the Keploy API server at `/client/v1/mcp`. Tools are auto-generated from the OpenAPI spec—when the API evolves, tools update automatically.
+:::note Tool-search mode (default)
+To keep the per-session context small, the server runs in **tool-search mode**: `tools/list` shows only a handful of meta-tools (`get_auth_status`, `get_setup_instructions`, `search_tools`, `get_tool_schema`, `invoke_tool`) instead of the full catalog. The tools in the table below are still all available — you reach them by name: `get_tool_schema({names:[...]})` to fetch schemas you already know, `search_tools(query)` to discover ones you don't, then `invoke_tool({name, arguments})` to run them. Hiding affects discovery only, not reachability — a tool can still be called directly by its exact name.
+:::
+
### Available Tools
| Tool | What it does |
@@ -180,7 +184,7 @@ Antigravity (formerly Windsurf) supports MCP servers. Add to your Antigravity MC
### How it Works
-1. The agent discovers available tools via the MCP `tools/list` method
+1. The agent sees the meta-tools on `tools/list` (tool-search mode, above) and reaches the specific tool it needs by name via `get_tool_schema` / `search_tools` + `invoke_tool`
2. When you ask "generate API tests", the agent calls `generate_and_wait` with your OpenAPI spec
3. The tool triggers AI generation on the Keploy platform, polls until complete, and returns the created suites
4. The agent calls `run_and_report` to execute suites against your API
diff --git a/versioned_sidebars/version-4.0.0-sidebars.json b/versioned_sidebars/version-4.0.0-sidebars.json
index 55f463295..20c90c93c 100644
--- a/versioned_sidebars/version-4.0.0-sidebars.json
+++ b/versioned_sidebars/version-4.0.0-sidebars.json
@@ -56,6 +56,7 @@
"keploy-cloud/auto-test-generation",
"keploy-cloud/deduplication",
"keploy-cloud/static-deduplication",
+ "keploy-cloud/smart-set-agent",
{
"type": "category",
"label": "QuickStarts",