Skip to content

security: pass Haiku classifier prompt via stdin, not argv#1157

Open
garagon wants to merge 2 commits intogarrytan:mainfrom
garagon:security/haiku-stdin-prompt
Open

security: pass Haiku classifier prompt via stdin, not argv#1157
garagon wants to merge 2 commits intogarrytan:mainfrom
garagon:security/haiku-stdin-prompt

Conversation

@garagon
Copy link
Copy Markdown
Contributor

@garagon garagon commented Apr 23, 2026

Summary

The Haiku transcript classifier passes scanned content (user messages + tool outputs, up to 8KB) as a CLI argument to claude -p <prompt>. This makes the full prompt visible via ps aux or /proc/<pid>/cmdline for up to 15 seconds per classification call.

On shared Linux hosts (default hidepid=0), any local user can read the scanned content — which may include page text, tool outputs, and potentially tokens or credentials visible on the page when the classifier fires.

On macOS 10.15+ the exposure is lower (Full Disk Access required to read other users' processes), but still present for same-user monitoring.

What this PR does

Two changes in checkTranscript() (security-classifier.ts):

  1. Prompt via stdin: claude -p (no argument) reads from stdin. The prompt is written to p.stdin and the pipe closed, keeping it off the process argument list entirely.

  2. Scoped child env: the spawned process now inherits only PATH, HOME, and ANTHROPIC_API_KEY instead of the full parent process.env. This prevents leaking unrelated secrets (other API keys, tokens) into the child's environment.

Before / After

# BEFORE: visible in ps output
claude -p "You are a prompt-injection detector...\n\nINPUTS:\n{\"user_message\":\"<scanned content>\"..." --model haiku --output-format json

# AFTER: only flags visible
claude -p --model haiku --output-format json

Test plan

  • bun test browse/test/security-classifier.test.ts — 16/16 pass
  • bun test — full free suite passes
  • Manual: run classification, verify ps aux | grep claude shows no prompt content
  • Manual: verify Haiku classifier still returns valid verdicts (stdin path works)

garagon added 2 commits April 22, 2026 20:39
When both TestSavantAI and Haiku transcript classifiers fail to load,
preSpawnSecurityCheck silently returns safe and the agent spawns with
zero ML prompt injection defense. This adds a fail-closed gate that
blocks agent spawn when all classifiers are inactive, with an explicit
opt-out via GSTACK_SECURITY_ALLOW_INACTIVE=1.
Scanned content (user messages, tool outputs up to 8KB) was passed as
a CLI argument to `claude -p <prompt>`, making it visible in `ps aux`
and `/proc/<pid>/cmdline` for up to 15 seconds per classification.
On shared Linux hosts (default hidepid=0) any local user could read it.

Fix: pipe the prompt through stdin (`claude -p` reads from stdin when
no argument follows) and scope the child env to PATH + HOME +
ANTHROPIC_API_KEY only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant