Skip to content

Add tools/captureToolSchemas.mjs + main-README listing for tool-schemas/ (follow-up to #24)#25

Open
YiRaaaan wants to merge 3 commits into
Piebald-AI:mainfrom
YiRaaaan:updatePrompts-tool-schemas
Open

Add tools/captureToolSchemas.mjs + main-README listing for tool-schemas/ (follow-up to #24)#25
YiRaaaan wants to merge 3 commits into
Piebald-AI:mainfrom
YiRaaaan:updatePrompts-tool-schemas

Conversation

@YiRaaaan

@YiRaaaan YiRaaaan commented Jun 9, 2026

Copy link
Copy Markdown

Follow-up to #24, as requested in this comment.

Two pieces; you can think of them as "refresh" and "surface":

1. tools/captureToolSchemas.mjs — one-command schema regeneration

node tools/captureToolSchemas.mjs

On any machine that has claude -p installed, this regenerates the full tool-schemas/ directory in ~8s. No Anthropic account, no API key, no upstream call. The script runs a tiny intercept server on 127.0.0.1:4099, spawns claude -p "ok" four times with the env that surfaces each gated tool set (default, --agent-teams, local-agent entrypoint, brief/KAIROS), captures tools[] out of each request body, and replies with a stub 403 so claude exits immediately without ever talking to Anthropic.

Output is byte-stable across runs (all object keys are lexically sorted) and across machines (no env-dependent code paths). StructuredOutput is intentionally skipped — its input_schema is supplied per-call by the workflow, see tool-schemas/README.md.

Verified end-to-end on Claude Code v2.1.172 against the schemas committed to #24: 33/35 files reproduce byte-identical; the two deltas are real upstream changes (agent.json gained "fable" in its model enum; send-message.json gained a 200-char maxLength plus matching regex on summary). Independent fresh-clone smoke test passed: 35/35 valid JSON, ~8.3s wall clock.

2. tools/updatePrompts.js — main README listing

A small extension so the next time updatePrompts.js runs, the main README.md will include a ### Tool Schemas section listing every file in tool-schemas/, one bullet per schema, linking to the file.

The new code path is fully guarded by existsSync(TOOL_SCHEMAS_DIR) — if tool-schemas/ isn't in the working tree (e.g. on a branch where #24 hasn't merged yet), the script behaves identically to before. So either PR is safe to land before or after the other in any order.

What the new section looks like

After the existing ### Builtin Tool Descriptions section, the script will append:

### Tool Schemas

JSON `input_schema` for each builtin tool, captured verbatim from the Anthropic API payload. See [`tool-schemas/README.md`](./tool-schemas/README.md) for grouping by surface condition.

- [Tool Schema: Agent](./tool-schemas/agent.json)
- [Tool Schema: AskUserQuestion](./tool-schemas/ask-user-question.json)
- [Tool Schema: Bash](./tool-schemas/bash.json)- [Tool Schema: Workflow](./tool-schemas/workflow.json)
- [Tool Schema: Write](./tool-schemas/write.json)

Why a directory scan, not a JSON-file argument

The existing prompt flow is: tweakcc/promptExtractor.jsprompts-X.X.X.jsonupdatePrompts.js reads that JSON → writes system-prompts/*.md.

Schemas don't have a tweakcc-side extractor — per #24 they're verbatim wire captures committed directly, so the JSON files in tool-schemas/ are themselves the source of truth. The capture script writes there directly, and updatePrompts.js just lists what's present in the README. No intermediate JSON, no second source of truth.

Implementation notes

captureToolSchemas.mjs

  • Single file, Node stdlib only (http, https, child_process, fs, path, url). No dependencies.
  • toKebab() validates its output against /^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/ and throws on any name that wouldn't produce a safe slug (defense in depth against unexpected tool names — see the path-traversal thread on this PR).
  • Configurable port via CAPTURE_PORT env (default 4099).

updatePrompts.js

  • getToolSchemas() reads tool-schemas/, returns [] if absent, else a sorted list of {filename, displayName}.
  • schemaFileToDisplayName() does kebab → PascalCase with one explicit override (lspLSP). New overrides go in SCHEMA_DISPLAY_NAME_OVERRIDES; the matching capture-side override is NAME_TO_KEBAB_OVERRIDES in captureToolSchemas.mjs.

What this does NOT do

  • Does not version-stamp the schemas (no token-counting analog — schemas aren't text content).
  • Does not maintain a separate CHANGELOG entry for schema changes — they're captured in the verbatim files themselves and surface naturally in git diffs.
  • Does not modify tweakcc/promptExtractor.js. Per the design discussion in Add tool-schemas/ — JSON input_schema for 35 builtin tools (refs #22) #24, the schema-capture path stays independent of the prompt-extraction pipeline.

Summary by CodeRabbit

  • New Features

    • README now includes an auto-generated "Tool Schemas" section that lists available tool schemas when present, with human-friendly display names.
  • Chores

    • Added a local capture utility that collects tool definitions across runs, consolidates them deterministically, and emits sorted per-tool schema files for documentation and discovery.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a CLI (tools/captureToolSchemas.mjs) that runs a local stub server and captures per-tool input_schema from /v1/messages requests into tool-schemas/*.json, and updates tools/updatePrompts.js to discover those schemas and conditionally append a "Tool Schemas" section with links into README.md.

Changes

Tool Schemas

Layer / File(s) Summary
Capture CLI: parse, merge, write schemas
tools/captureToolSchemas.mjs
Runs a local stub server, spawns the claude CLI across four capture passes, captures tools[] from the first /v1/messages POST, merges by first-seen name, removes StructuredOutput, maps names to kebab-case (with overrides), sorts JSON keys, and writes one deterministic tool-schemas/<name>.json per tool.
Schema discovery infrastructure
tools/updatePrompts.js
Adds TOOL_SCHEMAS_DIR and a filename→display-name mapper (override lspLSP) plus getToolSchemas() to enumerate and sort tool-schemas/*.json, returning an empty list if the directory is absent.
README generation for schemas
tools/updatePrompts.js
Extends updateReadme() to conditionally append a ### Tool Schemas section with a brief description and a bullet list of links to each discovered schema file when schemas exist.

Sequence Diagram(s)

sequenceDiagram
  participant ClaudeCLI as claude CLI
  participant StubServer as Stub HTTP Server
  participant CaptureScript as captureToolSchemas.mjs
  participant FileSystem as File System

  ClaudeCLI->>StubServer: POST /v1/messages (request body with tools[])
  StubServer->>CaptureScript: deliver parsed JSON (tools[])
  CaptureScript->>CaptureScript: merge tools across runs, kebab-case names, remove StructuredOutput
  CaptureScript->>FileSystem: write tool-schemas/<kebab-name>.json (sorted keys)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

"A rabbit found schemas in a row, 🐇
nibbling names that softly glow,
LSP and friends in tidy files,
README hums across the miles,
hopping schemas to and fro."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely identifies the main changes: adding captureToolSchemas.mjs and updating the README to list tool schemas, with context that it follows PR #24.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/captureToolSchemas.mjs`:
- Around line 58-64: The toKebab function (and its use with
NAME_TO_KEBAB_OVERRIDES when writing files into OUT_DIR) must sanitize tool
names to prevent path traversal; update toKebab to first strip any directory
components (e.g., via path.basename or by removing path separators), reject or
replace suspicious characters (.., /, \, null bytes), and constrain the output
to a safe whitelist pattern and length (e.g., only a-z0-9 and hyphens) before
returning; ensure callers that pass the kebab name into path.join(OUT_DIR, ...)
only receive this sanitized value so generated filenames cannot escape OUT_DIR.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d098088-aa88-40b9-9289-d50b68b98086

📥 Commits

Reviewing files that changed from the base of the PR and between 281cf76 and e54a658.

📒 Files selected for processing (1)
  • tools/captureToolSchemas.mjs

Comment thread tools/captureToolSchemas.mjs
@YiRaaaan YiRaaaan force-pushed the updatePrompts-tool-schemas branch from e54a658 to 20d5f47 Compare June 11, 2026 03:27
YiRaaaan added 2 commits June 11, 2026 11:31
Complements Piebald-AI#24, which adds the `tool-schemas/` directory. Per the PR
discussion on Piebald-AI#24, the README listing should surface these alongside the
existing prompt categories.

Behavior:
- `getToolSchemas()` scans `tool-schemas/*.json` alphabetically; returns
  empty when the directory is absent, so the patch is a no-op on repos
  that haven't merged the schemas yet (safe to land in either order).
- `schemaFileToDisplayName()` does kebab→PascalCase with one explicit
  override (`lsp` → `LSP`). Add more overrides as needed.
- A new `### Tool Schemas` section is appended after the existing
  Builtin Tool Descriptions section, with a one-line intro and one
  bullet per file linking to `./tool-schemas/<file>.json`.

No changes to existing categories, prompt extraction, token counting,
or any other behavior. Only one new code path, guarded by directory
existence.
Bundles a tiny HTTP intercept server with the four-run capture loop so
that, on any machine with `claude -p` installed, running

  node tools/captureToolSchemas.mjs

regenerates the full `tool-schemas/` directory in ~15s, with no external
proxy, no API key, no Anthropic account, no upstream call at all.

How it works:
  1. Server listens on 127.0.0.1:4099 (configurable via CAPTURE_PORT).
  2. Spawns `claude -p "ok"` four times, each with the env that surfaces
     a different tool set (default, --agent-teams, local-agent
     entrypoint, brief / KAIROS).
  3. On each spawn, claude POSTs to /v1/messages through ANTHROPIC_BASE_URL
     pointed at us. The server pulls `tools[]` out of the request body
     and replies with a stub 403 — claude exits immediately without
     retrying, and we move on to the next env.
  4. After all four runs, the unioned `tools[]` (first-seen wins per
     name) is sorted by tool name, each `input_schema` is recursively
     lexically key-sorted for byte-stable diffs, StructuredOutput is
     dropped (its schema is caller-supplied), and one file per tool is
     written under tool-schemas/.

Why a stub 403 instead of forwarding to api.anthropic.com:
  Anthropic's OAuth bearer tokens are bound to the TLS / connection
  profile of the proxy that established them. A fresh Node proxy
  forwarding identical headers gets `403 Request not allowed`
  consistently. So we don't forward — we capture the request body
  (which is what we actually want) and stub a fast-exit error
  response. Claude still emits the full tools[] before we reply.

Verified end-to-end on Claude Code v2.1.168: 33/35 output files are
byte-identical to PR Piebald-AI#24's committed schemas; the 2 deltas (agent.json,
…) reflect real-world wire changes since Piebald-AI#24 was captured. Two
back-to-back runs produce byte-identical output.
@YiRaaaan YiRaaaan force-pushed the updatePrompts-tool-schemas branch from 20d5f47 to 41f0045 Compare June 11, 2026 03:31
YiRaaaan added a commit to YiRaaaan/claude-code-system-prompts that referenced this pull request Jun 11, 2026
Two schemas changed since the initial capture against v2.1.168:

- agent.json: model enum gained "fable" (Fable 5 family).
- send-message.json: added length constraints — a 200-char maxLength
  on the top-level `summary` field and a matching ^[^\n\r]{1,200}$
  regex on the same field inside the nested shutdown_request /
  status_update message variants.

Regenerated with tools/captureToolSchemas.mjs (PR Piebald-AI#25). No other
schema bytes changed across the four runs.
Per @coderabbitai on Piebald-AI#25: even though tool names come from Claude Code's
own bundle (and would be a supply-chain compromise on Anthropic's side
before they're attacker-controlled), checking the derived slug against
a strict /^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/ pattern is cheap and prevents
the script from silently writing outside tool-schemas/ if Claude Code
ever emits an unexpected character.

Verified: the six current canonical names (Bash, AskUserQuestion, LSP,
WebFetch, CronCreate, SendUserMessage) all pass; ../etc/passwd, foo/bar,
'..', and 'foo bar' all throw and abort the run.
@YiRaaaan YiRaaaan changed the title Surface tool-schemas/ in the main README via updatePrompts.js (follow-up to #24) Add tools/captureToolSchemas.mjs + main-README listing for tool-schemas/ (follow-up to #24) Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant