Add tools/captureToolSchemas.mjs + main-README listing for tool-schemas/ (follow-up to #24)#25
Add tools/captureToolSchemas.mjs + main-README listing for tool-schemas/ (follow-up to #24)#25YiRaaaan wants to merge 3 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a CLI (tools/captureToolSchemas.mjs) that runs a local stub server and captures per-tool ChangesTool Schemas
Sequence Diagram(s)sequenceDiagram
participant ClaudeCLI as claude CLI
participant StubServer as Stub HTTP Server
participant CaptureScript as captureToolSchemas.mjs
participant FileSystem as File System
ClaudeCLI->>StubServer: POST /v1/messages (request body with tools[])
StubServer->>CaptureScript: deliver parsed JSON (tools[])
CaptureScript->>CaptureScript: merge tools across runs, kebab-case names, remove StructuredOutput
CaptureScript->>FileSystem: write tool-schemas/<kebab-name>.json (sorted keys)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/captureToolSchemas.mjs`:
- Around line 58-64: The toKebab function (and its use with
NAME_TO_KEBAB_OVERRIDES when writing files into OUT_DIR) must sanitize tool
names to prevent path traversal; update toKebab to first strip any directory
components (e.g., via path.basename or by removing path separators), reject or
replace suspicious characters (.., /, \, null bytes), and constrain the output
to a safe whitelist pattern and length (e.g., only a-z0-9 and hyphens) before
returning; ensure callers that pass the kebab name into path.join(OUT_DIR, ...)
only receive this sanitized value so generated filenames cannot escape OUT_DIR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d098088-aa88-40b9-9289-d50b68b98086
📒 Files selected for processing (1)
tools/captureToolSchemas.mjs
e54a658 to
20d5f47
Compare
Complements Piebald-AI#24, which adds the `tool-schemas/` directory. Per the PR discussion on Piebald-AI#24, the README listing should surface these alongside the existing prompt categories. Behavior: - `getToolSchemas()` scans `tool-schemas/*.json` alphabetically; returns empty when the directory is absent, so the patch is a no-op on repos that haven't merged the schemas yet (safe to land in either order). - `schemaFileToDisplayName()` does kebab→PascalCase with one explicit override (`lsp` → `LSP`). Add more overrides as needed. - A new `### Tool Schemas` section is appended after the existing Builtin Tool Descriptions section, with a one-line intro and one bullet per file linking to `./tool-schemas/<file>.json`. No changes to existing categories, prompt extraction, token counting, or any other behavior. Only one new code path, guarded by directory existence.
Bundles a tiny HTTP intercept server with the four-run capture loop so
that, on any machine with `claude -p` installed, running
node tools/captureToolSchemas.mjs
regenerates the full `tool-schemas/` directory in ~15s, with no external
proxy, no API key, no Anthropic account, no upstream call at all.
How it works:
1. Server listens on 127.0.0.1:4099 (configurable via CAPTURE_PORT).
2. Spawns `claude -p "ok"` four times, each with the env that surfaces
a different tool set (default, --agent-teams, local-agent
entrypoint, brief / KAIROS).
3. On each spawn, claude POSTs to /v1/messages through ANTHROPIC_BASE_URL
pointed at us. The server pulls `tools[]` out of the request body
and replies with a stub 403 — claude exits immediately without
retrying, and we move on to the next env.
4. After all four runs, the unioned `tools[]` (first-seen wins per
name) is sorted by tool name, each `input_schema` is recursively
lexically key-sorted for byte-stable diffs, StructuredOutput is
dropped (its schema is caller-supplied), and one file per tool is
written under tool-schemas/.
Why a stub 403 instead of forwarding to api.anthropic.com:
Anthropic's OAuth bearer tokens are bound to the TLS / connection
profile of the proxy that established them. A fresh Node proxy
forwarding identical headers gets `403 Request not allowed`
consistently. So we don't forward — we capture the request body
(which is what we actually want) and stub a fast-exit error
response. Claude still emits the full tools[] before we reply.
Verified end-to-end on Claude Code v2.1.168: 33/35 output files are
byte-identical to PR Piebald-AI#24's committed schemas; the 2 deltas (agent.json,
…) reflect real-world wire changes since Piebald-AI#24 was captured. Two
back-to-back runs produce byte-identical output.
20d5f47 to
41f0045
Compare
Two schemas changed since the initial capture against v2.1.168:
- agent.json: model enum gained "fable" (Fable 5 family).
- send-message.json: added length constraints — a 200-char maxLength
on the top-level `summary` field and a matching ^[^\n\r]{1,200}$
regex on the same field inside the nested shutdown_request /
status_update message variants.
Regenerated with tools/captureToolSchemas.mjs (PR Piebald-AI#25). No other
schema bytes changed across the four runs.
Per @coderabbitai on Piebald-AI#25: even though tool names come from Claude Code's own bundle (and would be a supply-chain compromise on Anthropic's side before they're attacker-controlled), checking the derived slug against a strict /^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/ pattern is cheap and prevents the script from silently writing outside tool-schemas/ if Claude Code ever emits an unexpected character. Verified: the six current canonical names (Bash, AskUserQuestion, LSP, WebFetch, CronCreate, SendUserMessage) all pass; ../etc/passwd, foo/bar, '..', and 'foo bar' all throw and abort the run.
Follow-up to #24, as requested in this comment.
Two pieces; you can think of them as "refresh" and "surface":
1.
tools/captureToolSchemas.mjs— one-command schema regenerationOn any machine that has
claude -pinstalled, this regenerates the fulltool-schemas/directory in ~8s. No Anthropic account, no API key, no upstream call. The script runs a tiny intercept server on127.0.0.1:4099, spawnsclaude -p "ok"four times with the env that surfaces each gated tool set (default,--agent-teams,local-agententrypoint,brief/KAIROS), capturestools[]out of each request body, and replies with a stub 403 so claude exits immediately without ever talking to Anthropic.Output is byte-stable across runs (all object keys are lexically sorted) and across machines (no env-dependent code paths).
StructuredOutputis intentionally skipped — itsinput_schemais supplied per-call by the workflow, seetool-schemas/README.md.Verified end-to-end on Claude Code v2.1.172 against the schemas committed to #24: 33/35 files reproduce byte-identical; the two deltas are real upstream changes (
agent.jsongained"fable"in its model enum;send-message.jsongained a 200-charmaxLengthplus matching regex onsummary). Independent fresh-clone smoke test passed: 35/35 valid JSON, ~8.3s wall clock.2.
tools/updatePrompts.js— main README listingA small extension so the next time
updatePrompts.jsruns, the mainREADME.mdwill include a### Tool Schemassection listing every file intool-schemas/, one bullet per schema, linking to the file.The new code path is fully guarded by
existsSync(TOOL_SCHEMAS_DIR)— iftool-schemas/isn't in the working tree (e.g. on a branch where #24 hasn't merged yet), the script behaves identically to before. So either PR is safe to land before or after the other in any order.What the new section looks like
After the existing
### Builtin Tool Descriptionssection, the script will append:Why a directory scan, not a JSON-file argument
The existing prompt flow is:
tweakcc/promptExtractor.js→prompts-X.X.X.json→updatePrompts.jsreads that JSON → writessystem-prompts/*.md.Schemas don't have a tweakcc-side extractor — per #24 they're verbatim wire captures committed directly, so the JSON files in
tool-schemas/are themselves the source of truth. The capture script writes there directly, andupdatePrompts.jsjust lists what's present in the README. No intermediate JSON, no second source of truth.Implementation notes
captureToolSchemas.mjs
http,https,child_process,fs,path,url). No dependencies.toKebab()validates its output against/^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/and throws on any name that wouldn't produce a safe slug (defense in depth against unexpected tool names — see the path-traversal thread on this PR).CAPTURE_PORTenv (default4099).updatePrompts.js
getToolSchemas()readstool-schemas/, returns[]if absent, else a sorted list of{filename, displayName}.schemaFileToDisplayName()does kebab → PascalCase with one explicit override (lsp→LSP). New overrides go inSCHEMA_DISPLAY_NAME_OVERRIDES; the matching capture-side override isNAME_TO_KEBAB_OVERRIDESincaptureToolSchemas.mjs.What this does NOT do
tweakcc/promptExtractor.js. Per the design discussion in Add tool-schemas/ — JSON input_schema for 35 builtin tools (refs #22) #24, the schema-capture path stays independent of the prompt-extraction pipeline.Summary by CodeRabbit
New Features
Chores