Add test-docs skill for validating documentation (#520)

danbarr · jhrozek · claude · web-flow · commit 9908827891c2 · 2026-02-04T13:07:36.000-05:00
* Add test-docs skill for validating documentation

Adds a Claude Code skill that executes tutorial and guide steps against
a live environment to verify documentation accuracy. Supports both CLI
and Kubernetes-based docs.

Co-Authored-By: Jakub Hrozek &lt;jhrozek@users.noreply.github.com&gt;
Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Exclude .claude/ from linters

Skill files are prompts for Claude Code, not documentation, so they
shouldn't be subject to the same markdown/formatting rules.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Address Copilot review feedback

- Add SPDX license headers to check-prereqs.sh
- Change shebang to #!/bin/bash for consistency
- Fix cluster name references to lowercase "toolhive"
- Escape asterisks in placeholder patterns
- Make kind/cluster check a warning instead of error (optional for CLI docs)

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Dan Barr &lt;6922515+danbarr@users.noreply.github.com&gt;
Co-authored-by: Jakub Hrozek &lt;jhrozek@users.noreply.github.com&gt;
Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/.claude/skills/test-docs/SKILL.md b/.claude/skills/test-docs/SKILL.md
@@ -0,0 +1,220 @@
+---
+name: test-docs
+description: >
+  Test ToolHive documentation by executing the steps in tutorials and guides
+  against a real environment, verifying that examples are correct and ToolHive
+  has not regressed. Use when the user asks to test, validate, or verify
+  documentation such as: "test the vault integration tutorial",
+  "verify the K8s quickstart", "check the CLI install docs",
+  "test toolhive vault integration in kubernetes", or any request to run
+  through a doc's instructions to confirm they work. Supports both
+  Kubernetes-based docs (tutorials, K8s guides) and CLI-based docs.
+---
+
+# Test Docs
+
+Test ToolHive documentation by running each step against a live environment,
+reporting pass/fail with evidence, and recommending fixes for failures.
+
+## Workflow
+
+1. **Find the documentation** - launch an Explore agent to locate the
+   relevant doc file
+2. **Read and parse** - extract testable steps (bash code blocks, YAML
+   manifests, expected outputs)
+3. **Check prerequisites** - run `scripts/check-prereqs.sh`; confirm the
+   `thv` version with the user
+4. **Prepare the environment** - create a dedicated test namespace unless
+   the doc specifies otherwise
+5. **Execute steps** - run each step sequentially, capture output
+6. **Report results** - pass/fail per step with evidence
+7. **Clean up** - delete all resources created during testing
+8. **Recommend fixes** - classify failures as doc issues or ToolHive bugs
+
+## Step 1: Find the documentation
+
+If the user provides a **file path** to the doc, use it directly. Assume
+you are running in the repository that contains the file.
+
+Otherwise, use the Task tool with `subagent_type=Explore` to search the
+docs directory for the page matching the user's request. Documentation
+lives under:
+
+- `docs/toolhive/tutorials/` - step-by-step tutorials
+- `docs/toolhive/guides-cli/` - CLI how-to guides
+- `docs/toolhive/guides-k8s/` - Kubernetes how-to guides
+- `docs/toolhive/guides-mcp/` - MCP server guides
+- `docs/toolhive/guides-vmcp/` - Virtual MCP Server guides
+- `docs/toolhive/guides-registry/` - registry guides
+
+All doc files use the `.mdx` extension. Search by keywords from the user's
+request. If multiple docs match, ask the user which one to test.
+
+## Step 2: Read and parse the doc
+
+Read the full doc file. Extract an ordered list of **testable steps**:
+
+- **Bash code blocks** (`\`\`\`bash`) - commands to execute
+- **YAML code blocks** with `title="*.yaml"` - manifests to apply with
+  `kubectl apply`
+- **Expected outputs** - text code blocks (`\`\`\`text`) that follow a
+  command, or prose describing expected behavior ("You should see...")
+
+Build a test plan: a numbered list of steps, each with:
+
+- The command(s) to run
+- What constitutes a pass (expected output, exit code 0, resource created)
+- Any dependencies on previous steps (variables, resources)
+
+Present the test plan to the user for approval before executing.
+
+### Handling placeholders
+
+Docs often contain placeholder values like `ghp_your_github_token_here` or
+`your-org`. Before executing, scan for obvious placeholders (ALL_CAPS
+patterns, "your-\*", "example-\*", "replace-\*") and ask the user for real
+values. If a step requires a secret that the user cannot provide, mark it
+as **skipped** with a note explaining why.
+
+## Step 3: Check prerequisites
+
+Run the prerequisites checker:
+
+```bash
+bash <skill-path>/scripts/check-prereqs.sh
+```
+
+Where `<skill-path>` is the absolute path to this skill's directory.
+
+If the script reports errors, inform the user and stop. For K8s docs, the
+script checks for a kind cluster named "toolhive". For CLI docs, confirm
+with the user that the installed `thv` version matches what the doc expects.
+
+## Step 4: Prepare the environment
+
+For Kubernetes docs, ask the user about existing infrastructure before
+deploying anything:
+
+- **Operator CRDs and operator**: if the doc includes steps to install
+  CRDs or the ToolHive operator, ask the user whether these are already
+  installed. If so, skip those installation steps and proceed to the
+  doc-specific content. Use `AskUserQuestion` with options like
+  "Already installed", "Install fresh", "Reinstall (upgrade)".
+- **Namespaces**: create a dedicated namespace `test-docs-<timestamp>`
+  (e.g., `test-docs-1706886400`) unless the doc explicitly uses a
+  specific namespace (like `toolhive-system`). If the doc requires
+  `toolhive-system`, use it but track all resources created for cleanup.
+
+For CLI docs:
+
+- Use a temporary directory for any files created
+- Note the state of running MCP servers before testing (`thv list`)
+- **Use unique server names**: when running `thv run <server>`, always use
+  `--name <server>-test` to avoid conflicts with existing servers. The
+  default name is derived from the registry entry, which may already be
+  running. Example: `thv run --name osv-test osv`
+- **Verify registry configuration**: if a registry server isn't found,
+  confirm with the user that they're using the default registry (not a
+  custom registry configuration)
+
+## Step 5: Execute steps
+
+Run each step sequentially. For each step:
+
+1. Print the step number and a short description
+2. Run the command(s)
+3. Capture stdout, stderr, and exit code
+4. Compare against expected output or success criteria
+5. Record pass/fail with evidence
+
+### Execution rules
+
+- **Wait for readiness**: when a step deploys resources, wait for them
+  (e.g., `kubectl wait --for=condition=ready`) before proceeding. If the
+  doc includes a wait command, use it. Otherwise add a reasonable wait
+  (up to 120s for pods).
+- **Variable propagation**: some steps set shell variables used by later
+  steps (e.g., `VAULT_POD=$(kubectl get pods ...)`). Maintain these
+  across steps by running in the same shell session.
+- **Skip non-testable steps**: informational code blocks (showing file
+  contents, expected output) are not commands to run. Use them as
+  verification targets instead.
+- **Timeout**: if any command hangs for more than 5 minutes, kill it and
+  mark the step as failed with a timeout note.
+
+### Known pitfalls
+
+- **Server name conflicts**: `thv run <server>` uses the registry entry
+  name by default, which may already exist. Always use
+  `thv run --name <server>-test <server>` to create a uniquely-named
+  instance. This avoids "workload with name already exists" errors.
+- **Port-forward needs time**: when testing via `kubectl port-forward`,
+  wait at least 5 seconds before issuing `curl`. Start the port-forward
+  in the background, sleep 5, then curl. Kill the port-forward PID
+  after the check.
+- **MCPServer status field**: the MCPServer CRD uses `.status.phase`
+  (not `.status.state`) for the running state. When polling for
+  readiness, use:
+  `kubectl get mcpservers <name> -n <ns> -o jsonpath='{.status.phase}'`
+  or simply poll `kubectl get mcpservers` and check the STATUS column.
+- **MCPServer readiness polling**: after `kubectl apply` for an
+  MCPServer, it may take 30-60 seconds to reach `Running`. Poll with
+  5-second intervals up to 120 seconds before declaring a timeout.
+
+## Step 6: Report results
+
+After all steps complete, produce a summary table:
+
+```text
+## Test Results: <doc title>
+
+| Step | Description          | Result | Notes                    |
+|------|----------------------|--------|--------------------------|
+| 1    | Install Vault        | PASS   |                          |
+| 2    | Configure auth       | PASS   |                          |
+| 3    | Store secrets        | PASS   |                          |
+| 4    | Deploy MCPServer     | FAIL   | Pod CrashLoopBackOff    |
+| 5    | Verify integration   | SKIP   | Depends on step 4       |
+```
+
+For each **FAIL**, include:
+
+- The command that failed
+- Actual output vs expected output
+- A classification:
+  - **Doc issue**: the command or example in the doc is wrong or outdated
+    (e.g., wrong image tag, deprecated flag, missing step)
+  - **ToolHive bug**: the doc appears correct but ToolHive behaves
+    unexpectedly (e.g., crash, wrong output from a correct command)
+  - **Environment issue**: missing prerequisite, network problem, or
+    transient error
+- A specific recommendation (e.g., "Update image tag from `v0.7.0` to
+  `v0.8.2`" or "File a bug: MCPServer pods fail to start when Vault
+  annotations are present")
+
+## Step 7: Clean up
+
+Always clean up after testing:
+
+- Delete test namespaces: `kubectl delete namespace <test-namespace>`
+- Remove Helm releases installed during testing
+- Delete any temporary files or directories
+- If resources were created in `toolhive-system`, delete them individually
+  by name
+
+If cleanup fails, report what could not be cleaned up so the user can
+handle it manually.
+
+## Example session
+
+User: "test the vault integration tutorial"
+
+1. Explore agent finds `docs/toolhive/tutorials/vault-integration.mdx`
+2. Parse the doc: 5 major steps with 12 bash commands and 1 YAML manifest
+3. Run `check-prereqs.sh` - kind cluster "toolhive" exists, thv v0.8.2
+4. Ask user: "The doc uses a placeholder GitHub token
+   `ghp_your_github_token_here`. Provide a real token or skip step 2?"
+5. Present test plan, user approves
+6. Execute steps 1-5, report results
+7. Clean up: delete vault namespace, remove MCPServer resource
+8. Summary: 4/5 PASS, 1 FAIL with recommendation
diff --git a/.claude/skills/test-docs/scripts/check-prereqs.sh b/.claude/skills/test-docs/scripts/check-prereqs.sh
@@ -0,0 +1,102 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright 2026 Stacklok, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+# Check prerequisites for ToolHive documentation testing.
+# Exits with non-zero status if critical prerequisites are missing.
+# Outputs a summary of what's available.
+
+set -euo pipefail
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+NC='\033[0m' # No Color
+
+ERRORS=0
+WARNINGS=0
+
+check_command() {
+    local cmd="$1"
+    local required="$2" # "required" or "optional"
+    if command -v "$cmd" &>/dev/null; then
+        local version
+        version=$("$cmd" version 2>/dev/null | head -1 || "$cmd" --version 2>/dev/null | head -1 || echo "unknown")
+        echo -e "${GREEN}[OK]${NC} $cmd: $version"
+    elif [ "$required" = "required" ]; then
+        echo -e "${RED}[MISSING]${NC} $cmd (required)"
+        ERRORS=$((ERRORS + 1))
+    else
+        echo -e "${YELLOW}[MISSING]${NC} $cmd (optional)"
+        WARNINGS=$((WARNINGS + 1))
+    fi
+}
+
+check_kind_cluster() {
+    local cluster_name="${1:-toolhive}"
+    if ! command -v kind &>/dev/null; then
+        echo -e "${YELLOW}[SKIP]${NC} kind CLI not found (optional for CLI-only docs)"
+        WARNINGS=$((WARNINGS + 1))
+        return
+    fi
+
+    if kind get clusters 2>/dev/null | grep -q "^${cluster_name}$"; then
+        echo -e "${GREEN}[OK]${NC} kind cluster '$cluster_name' exists"
+
+        # Check if kubectl context is set to this cluster
+        local current_context
+        current_context=$(kubectl config current-context 2>/dev/null || echo "none")
+        local expected_context="kind-${cluster_name}"
+        if [ "$current_context" = "$expected_context" ]; then
+            echo -e "${GREEN}[OK]${NC} kubectl context set to '$expected_context'"
+        else
+            echo -e "${YELLOW}[WARN]${NC} kubectl context is '$current_context', expected '$expected_context'"
+            WARNINGS=$((WARNINGS + 1))
+        fi
+    else
+        echo -e "${YELLOW}[WARN]${NC} kind cluster '$cluster_name' not found (required for K8s docs)"
+        echo "       Available clusters: $(kind get clusters 2>/dev/null | tr '\n' ' ')"
+        WARNINGS=$((WARNINGS + 1))
+    fi
+}
+
+check_thv_version() {
+    if ! command -v thv &>/dev/null; then
+        echo -e "${RED}[MISSING]${NC} thv CLI not found"
+        ERRORS=$((ERRORS + 1))
+        return
+    fi
+
+    local version
+    version=$(thv version 2>/dev/null || echo "unknown")
+    echo -e "${GREEN}[OK]${NC} thv: $version"
+}
+
+echo "=== ToolHive Documentation Test Prerequisites ==="
+echo ""
+
+echo "--- Required tools ---"
+check_command kubectl required
+check_thv_version
+check_command helm required
+
+echo ""
+echo "--- Optional tools ---"
+check_command kind optional
+check_command jq optional
+check_command curl optional
+
+echo ""
+echo "--- Kubernetes cluster ---"
+check_kind_cluster "${1:-toolhive}"
+
+echo ""
+echo "=== Summary ==="
+if [ "$ERRORS" -gt 0 ]; then
+    echo -e "${RED}$ERRORS error(s)${NC}, $WARNINGS warning(s)"
+    echo "Fix the errors above before running documentation tests."
+    exit 1
+else
+    echo -e "${GREEN}All prerequisites met${NC} ($WARNINGS warning(s))"
+    exit 0
+fi
diff --git a/.markdownlint-cli2.jsonc b/.markdownlint-cli2.jsonc
@@ -2,6 +2,7 @@
   "gitignore": true,
   "globs": ["**/*.md"],
   "ignores": [
+    ".claude/",
     "docs/toolhive/reference/cli/",
     "static/api-specs/*.md",
     "docs/toolhive/reference/crd-spec.md",
diff --git a/.prettierignore b/.prettierignore
@@ -3,5 +3,8 @@ docs/toolhive/reference/cli/**/*.md
 static/api-specs/
 docs/toolhive/reference/crd-spec.md
 
+# Ignore Claude Code skills (prompts, not documentation)
+.claude/
+
 # Ignore templates
 .github/pull_request_template.md