|
| 1 | +--- |
| 2 | +name: test-docs |
| 3 | +description: > |
| 4 | + Test ToolHive documentation by executing the steps in tutorials and guides |
| 5 | + against a real environment, verifying that examples are correct and ToolHive |
| 6 | + has not regressed. Use when the user asks to test, validate, or verify |
| 7 | + documentation such as: "test the vault integration tutorial", |
| 8 | + "verify the K8s quickstart", "check the CLI install docs", |
| 9 | + "test toolhive vault integration in kubernetes", or any request to run |
| 10 | + through a doc's instructions to confirm they work. Supports both |
| 11 | + Kubernetes-based docs (tutorials, K8s guides) and CLI-based docs. |
| 12 | +--- |
| 13 | + |
| 14 | +# Test Docs |
| 15 | + |
| 16 | +Test ToolHive documentation by running each step against a live environment, |
| 17 | +reporting pass/fail with evidence, and recommending fixes for failures. |
| 18 | + |
| 19 | +## Workflow |
| 20 | + |
| 21 | +1. **Find the documentation** - launch an Explore agent to locate the |
| 22 | + relevant doc file |
| 23 | +2. **Read and parse** - extract testable steps (bash code blocks, YAML |
| 24 | + manifests, expected outputs) |
| 25 | +3. **Check prerequisites** - run `scripts/check-prereqs.sh`; confirm the |
| 26 | + `thv` version with the user |
| 27 | +4. **Prepare the environment** - create a dedicated test namespace unless |
| 28 | + the doc specifies otherwise |
| 29 | +5. **Execute steps** - run each step sequentially, capture output |
| 30 | +6. **Report results** - pass/fail per step with evidence |
| 31 | +7. **Clean up** - delete all resources created during testing |
| 32 | +8. **Recommend fixes** - classify failures as doc issues or ToolHive bugs |
| 33 | + |
| 34 | +## Step 1: Find the documentation |
| 35 | + |
| 36 | +If the user provides a **file path** to the doc, use it directly. Assume |
| 37 | +you are running in the repository that contains the file. |
| 38 | + |
| 39 | +Otherwise, use the Task tool with `subagent_type=Explore` to search the |
| 40 | +docs directory for the page matching the user's request. Documentation |
| 41 | +lives under: |
| 42 | + |
| 43 | +- `docs/toolhive/tutorials/` - step-by-step tutorials |
| 44 | +- `docs/toolhive/guides-cli/` - CLI how-to guides |
| 45 | +- `docs/toolhive/guides-k8s/` - Kubernetes how-to guides |
| 46 | +- `docs/toolhive/guides-mcp/` - MCP server guides |
| 47 | +- `docs/toolhive/guides-vmcp/` - Virtual MCP Server guides |
| 48 | +- `docs/toolhive/guides-registry/` - registry guides |
| 49 | + |
| 50 | +All doc files use the `.mdx` extension. Search by keywords from the user's |
| 51 | +request. If multiple docs match, ask the user which one to test. |
| 52 | + |
| 53 | +## Step 2: Read and parse the doc |
| 54 | + |
| 55 | +Read the full doc file. Extract an ordered list of **testable steps**: |
| 56 | + |
| 57 | +- **Bash code blocks** (`\`\`\`bash`) - commands to execute |
| 58 | +- **YAML code blocks** with `title="*.yaml"` - manifests to apply with |
| 59 | + `kubectl apply` |
| 60 | +- **Expected outputs** - text code blocks (`\`\`\`text`) that follow a |
| 61 | + command, or prose describing expected behavior ("You should see...") |
| 62 | + |
| 63 | +Build a test plan: a numbered list of steps, each with: |
| 64 | + |
| 65 | +- The command(s) to run |
| 66 | +- What constitutes a pass (expected output, exit code 0, resource created) |
| 67 | +- Any dependencies on previous steps (variables, resources) |
| 68 | + |
| 69 | +Present the test plan to the user for approval before executing. |
| 70 | + |
| 71 | +### Handling placeholders |
| 72 | + |
| 73 | +Docs often contain placeholder values like `ghp_your_github_token_here` or |
| 74 | +`your-org`. Before executing, scan for obvious placeholders (ALL_CAPS |
| 75 | +patterns, "your-\*", "example-\*", "replace-\*") and ask the user for real |
| 76 | +values. If a step requires a secret that the user cannot provide, mark it |
| 77 | +as **skipped** with a note explaining why. |
| 78 | + |
| 79 | +## Step 3: Check prerequisites |
| 80 | + |
| 81 | +Run the prerequisites checker: |
| 82 | + |
| 83 | +```bash |
| 84 | +bash <skill-path>/scripts/check-prereqs.sh |
| 85 | +``` |
| 86 | + |
| 87 | +Where `<skill-path>` is the absolute path to this skill's directory. |
| 88 | + |
| 89 | +If the script reports errors, inform the user and stop. For K8s docs, the |
| 90 | +script checks for a kind cluster named "toolhive". For CLI docs, confirm |
| 91 | +with the user that the installed `thv` version matches what the doc expects. |
| 92 | + |
| 93 | +## Step 4: Prepare the environment |
| 94 | + |
| 95 | +For Kubernetes docs, ask the user about existing infrastructure before |
| 96 | +deploying anything: |
| 97 | + |
| 98 | +- **Operator CRDs and operator**: if the doc includes steps to install |
| 99 | + CRDs or the ToolHive operator, ask the user whether these are already |
| 100 | + installed. If so, skip those installation steps and proceed to the |
| 101 | + doc-specific content. Use `AskUserQuestion` with options like |
| 102 | + "Already installed", "Install fresh", "Reinstall (upgrade)". |
| 103 | +- **Namespaces**: create a dedicated namespace `test-docs-<timestamp>` |
| 104 | + (e.g., `test-docs-1706886400`) unless the doc explicitly uses a |
| 105 | + specific namespace (like `toolhive-system`). If the doc requires |
| 106 | + `toolhive-system`, use it but track all resources created for cleanup. |
| 107 | + |
| 108 | +For CLI docs: |
| 109 | + |
| 110 | +- Use a temporary directory for any files created |
| 111 | +- Note the state of running MCP servers before testing (`thv list`) |
| 112 | +- **Use unique server names**: when running `thv run <server>`, always use |
| 113 | + `--name <server>-test` to avoid conflicts with existing servers. The |
| 114 | + default name is derived from the registry entry, which may already be |
| 115 | + running. Example: `thv run --name osv-test osv` |
| 116 | +- **Verify registry configuration**: if a registry server isn't found, |
| 117 | + confirm with the user that they're using the default registry (not a |
| 118 | + custom registry configuration) |
| 119 | + |
| 120 | +## Step 5: Execute steps |
| 121 | + |
| 122 | +Run each step sequentially. For each step: |
| 123 | + |
| 124 | +1. Print the step number and a short description |
| 125 | +2. Run the command(s) |
| 126 | +3. Capture stdout, stderr, and exit code |
| 127 | +4. Compare against expected output or success criteria |
| 128 | +5. Record pass/fail with evidence |
| 129 | + |
| 130 | +### Execution rules |
| 131 | + |
| 132 | +- **Wait for readiness**: when a step deploys resources, wait for them |
| 133 | + (e.g., `kubectl wait --for=condition=ready`) before proceeding. If the |
| 134 | + doc includes a wait command, use it. Otherwise add a reasonable wait |
| 135 | + (up to 120s for pods). |
| 136 | +- **Variable propagation**: some steps set shell variables used by later |
| 137 | + steps (e.g., `VAULT_POD=$(kubectl get pods ...)`). Maintain these |
| 138 | + across steps by running in the same shell session. |
| 139 | +- **Skip non-testable steps**: informational code blocks (showing file |
| 140 | + contents, expected output) are not commands to run. Use them as |
| 141 | + verification targets instead. |
| 142 | +- **Timeout**: if any command hangs for more than 5 minutes, kill it and |
| 143 | + mark the step as failed with a timeout note. |
| 144 | + |
| 145 | +### Known pitfalls |
| 146 | + |
| 147 | +- **Server name conflicts**: `thv run <server>` uses the registry entry |
| 148 | + name by default, which may already exist. Always use |
| 149 | + `thv run --name <server>-test <server>` to create a uniquely-named |
| 150 | + instance. This avoids "workload with name already exists" errors. |
| 151 | +- **Port-forward needs time**: when testing via `kubectl port-forward`, |
| 152 | + wait at least 5 seconds before issuing `curl`. Start the port-forward |
| 153 | + in the background, sleep 5, then curl. Kill the port-forward PID |
| 154 | + after the check. |
| 155 | +- **MCPServer status field**: the MCPServer CRD uses `.status.phase` |
| 156 | + (not `.status.state`) for the running state. When polling for |
| 157 | + readiness, use: |
| 158 | + `kubectl get mcpservers <name> -n <ns> -o jsonpath='{.status.phase}'` |
| 159 | + or simply poll `kubectl get mcpservers` and check the STATUS column. |
| 160 | +- **MCPServer readiness polling**: after `kubectl apply` for an |
| 161 | + MCPServer, it may take 30-60 seconds to reach `Running`. Poll with |
| 162 | + 5-second intervals up to 120 seconds before declaring a timeout. |
| 163 | + |
| 164 | +## Step 6: Report results |
| 165 | + |
| 166 | +After all steps complete, produce a summary table: |
| 167 | + |
| 168 | +```text |
| 169 | +## Test Results: <doc title> |
| 170 | +
|
| 171 | +| Step | Description | Result | Notes | |
| 172 | +|------|----------------------|--------|--------------------------| |
| 173 | +| 1 | Install Vault | PASS | | |
| 174 | +| 2 | Configure auth | PASS | | |
| 175 | +| 3 | Store secrets | PASS | | |
| 176 | +| 4 | Deploy MCPServer | FAIL | Pod CrashLoopBackOff | |
| 177 | +| 5 | Verify integration | SKIP | Depends on step 4 | |
| 178 | +``` |
| 179 | + |
| 180 | +For each **FAIL**, include: |
| 181 | + |
| 182 | +- The command that failed |
| 183 | +- Actual output vs expected output |
| 184 | +- A classification: |
| 185 | + - **Doc issue**: the command or example in the doc is wrong or outdated |
| 186 | + (e.g., wrong image tag, deprecated flag, missing step) |
| 187 | + - **ToolHive bug**: the doc appears correct but ToolHive behaves |
| 188 | + unexpectedly (e.g., crash, wrong output from a correct command) |
| 189 | + - **Environment issue**: missing prerequisite, network problem, or |
| 190 | + transient error |
| 191 | +- A specific recommendation (e.g., "Update image tag from `v0.7.0` to |
| 192 | + `v0.8.2`" or "File a bug: MCPServer pods fail to start when Vault |
| 193 | + annotations are present") |
| 194 | + |
| 195 | +## Step 7: Clean up |
| 196 | + |
| 197 | +Always clean up after testing: |
| 198 | + |
| 199 | +- Delete test namespaces: `kubectl delete namespace <test-namespace>` |
| 200 | +- Remove Helm releases installed during testing |
| 201 | +- Delete any temporary files or directories |
| 202 | +- If resources were created in `toolhive-system`, delete them individually |
| 203 | + by name |
| 204 | + |
| 205 | +If cleanup fails, report what could not be cleaned up so the user can |
| 206 | +handle it manually. |
| 207 | + |
| 208 | +## Example session |
| 209 | + |
| 210 | +User: "test the vault integration tutorial" |
| 211 | + |
| 212 | +1. Explore agent finds `docs/toolhive/tutorials/vault-integration.mdx` |
| 213 | +2. Parse the doc: 5 major steps with 12 bash commands and 1 YAML manifest |
| 214 | +3. Run `check-prereqs.sh` - kind cluster "toolhive" exists, thv v0.8.2 |
| 215 | +4. Ask user: "The doc uses a placeholder GitHub token |
| 216 | + `ghp_your_github_token_here`. Provide a real token or skip step 2?" |
| 217 | +5. Present test plan, user approves |
| 218 | +6. Execute steps 1-5, report results |
| 219 | +7. Clean up: delete vault namespace, remove MCPServer resource |
| 220 | +8. Summary: 4/5 PASS, 1 FAIL with recommendation |
0 commit comments