chore(commands): modernize slash command frontmatter (v1.2.3) by JonyanDunh · Pull Request #12 · JonyanDunh/claude-code-watchdog

JonyanDunh · 2026-04-11T02:55:24Z

Summary

Sweep all three slash command `.md` files under `commands/` so they use only documented frontmatter fields from the canonical Claude Code reference at https://code.claude.com/docs/en/skills.md#frontmatter-reference.

Changes

Removed hide-from-slash-command-tool from `commands/start.md` and `commands/stop.md`. This field is not documented anywhere in the official Claude Code reference — it was inherited from the upstream ralph-loop plugin as a legacy / undocumented convention. Claude Code silently ignores unknown fields so removing it is a no-op behaviorally, but it cleans up the file and removes the implication that we know something the docs don't.
Added `disable-model-invocation: true` to all three slash commands. This is the documented way to block programmatic invocation by the agent — ensures `/watchdog:start`, `/watchdog:stop`, and `/watchdog:help` are only triggered by a user typing them directly.
Added explicit `argument-hint` to `stop.md` and `help.md` (empty string for no-args). `start.md` already had one; tightened it to `"" [--max-iterations N]` to hint at the required double quotes around the prompt.

Threat model clarification

Watchdog was never designed to survive an adversarial agent deliberately trying to escape — only a confused or lazy agent. With just `disable-model-invocation: true`:

The agent cannot call `/watchdog:stop` programmatically even if it wanted to
The classifier subprocess lives in a separate claude process the agent has no handle on
The Stop hook fires regardless of any agent output; the agent cannot suppress or spoof it
Even if the agent deletes `.claude/watchdog.claudepid.*.local.json` to break the loop, that action shows up in the session transcript and is visible to the user — a self-owning "win"

So the level of protection provided by the removed undocumented field was zero in practice — it didn't buy any adversarial resistance, and the documented fields give all the protection the realistic threat model needs.

Scope

All changes confined to `commands/*.md`. Zero changes to `lib/`, `hooks/`, `scripts/`, or tests. Bumped `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json` to 1.2.3.

Test plan

80-test suite passes locally (78 active + 2 skipped-inside-Claude-Code, 0 failures)
CI will confirm on 9 matrix jobs (ubuntu/macos/windows × Node 18/20/22)

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Update all three slash command .md files (start, stop, help) to use only documented frontmatter fields from the canonical Claude Code reference at https://code.claude.com/docs/en/skills.md#frontmatter-reference. ## What changed - Removed `hide-from-slash-command-tool` from start.md and stop.md. This field is NOT documented in the official Claude Code reference. It was inherited from the upstream ralph-loop plugin where it appears as a legacy / undocumented convention — Claude Code silently ignores unknown fields, so removing it has no behavioral downside. Confirmed absent from: - https://code.claude.com/docs/en/skills.md - https://code.claude.com/docs/en/plugins-reference.md - https://code.claude.com/docs/en/commands.md - Added `disable-model-invocation: true` to all three commands. This IS the documented way to block programmatic invocation of a slash command by the agent — ensures /watchdog:start, /watchdog:stop, and /watchdog:help can only be triggered by the user typing them directly, never by the agent auto-invoking them via the Skill tool. - Added explicit `argument-hint` to stop.md and help.md (empty string for no-args commands, so autocomplete shows the command takes no arguments). start.md already had `argument-hint` from earlier, tightened to "\"<PROMPT>\" [--max-iterations N]" which hints at the required double quotes around the prompt. ## Threat model clarification Even without any "hide from agent" mechanism, the watchdog plugin is robust against a *confused or lazy* agent — which is the realistic threat model for a self-referential loop: - `disable-model-invocation: true` means the agent cannot call /watchdog:stop programmatically even if it wanted to - The classifier subprocess lives in a separate claude process the agent has no handle on - The Stop hook fires regardless of any output the agent produces; the agent cannot suppress or spoof it - Even if the agent deletes `.claude/watchdog.claudepid.*.local.json` to break the loop, that action shows up in the session transcript and is visible to the user — a self-owning "win" Watchdog is not designed to survive an *adversarial* agent that is deliberately trying to escape a sandbox; it is designed to keep a task-following agent honest about whether it has actually done the work. ## No code changes All changes are confined to the three .md files under `commands/`. Zero changes to lib/, hooks/, scripts/, or tests. The 80-test suite still passes unchanged (78 active + 2 skipped-inside-Claude-Code, 0 failures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JonyanDunh merged commit cc2d6d0 into main Apr 11, 2026
11 checks passed

JonyanDunh deleted the chore/slash-command-frontmatter-v1.2.3 branch April 11, 2026 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(commands): modernize slash command frontmatter (v1.2.3)#12

chore(commands): modernize slash command frontmatter (v1.2.3)#12
JonyanDunh merged 1 commit into
mainfrom
chore/slash-command-frontmatter-v1.2.3

JonyanDunh commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JonyanDunh commented Apr 11, 2026

Summary

Changes

Threat model clarification

Scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant