Skip to content

feat: Ollama + MCP bridge so local models can drive bolt#7

Open
Sajawal007 wants to merge 6 commits into
CyberStrikeus:mainfrom
Sajawal007:claude/recursing-lehmann
Open

feat: Ollama + MCP bridge so local models can drive bolt#7
Sajawal007 wants to merge 6 commits into
CyberStrikeus:mainfrom
Sajawal007:claude/recursing-lehmann

Conversation

@Sajawal007
Copy link
Copy Markdown

Summary

Adds an end-to-end bridge that lets a local Ollama model use bolt's MCP toolset without depending on Claude/Anthropic, plus a stdlib HTTP API so external services (e.g. a Rails threat-intel pipeline) can call it over the wire. Nothing in bolt's existing code paths is changed; everything here is additive.

What changed

File Purpose
ollama-bridge.sh CLI wrapper around mcphost. Pulls bolt's MCP_ADMIN_TOKEN from the running container, health-checks ollama + bolt, applies the default system prompt, max-steps and temperature.
.mcphost.json mcphost remote-server config pointing at bolt's HTTP MCP endpoint with bearer auth via ${env://BOLT_ADMIN_TOKEN} substitution.
Modelfile.qwen7b-bolt, Modelfile.qwen14b-bolt Derived Ollama models that bake num_ctx=32768 (default 4096 was too small for bolt's 25-tool schema + system prompt) and a low temperature for decisive tool calls.
prompts/recon-agent.md Agent role + default playbook telling the model to classify the target, plan, and chain tools (subfinder → dnsx → httpx → nuclei) instead of asking the user which tool to invoke.
bridge-api.py Stdlib-only HTTP wrapper. POST /scan accepts {target, targets, iocs, infra, instructions, prompt, system_prompt, model, max_steps, timeout, verbose}; auto-builds a prompt or accepts a raw one; supports per-request system-prompt override; returns JSON {ok, output, duration_ms, model, error}. GET /health verifies ollama, bolt, and the bridge script.

Why

Bolt today is consumed by Claude Desktop / Claude Code — i.e. an Anthropic-hosted client drives the toolset. This PR makes bolt usable from any local Ollama model and from any HTTP-speaking app. The pieces are intentionally split:

  • The shell wrapper handles the messy environment glue (token, health, mcphost flags).
  • The Modelfiles solve a real footgun: Ollama loads qwen2.5:* with a 4 K context by default, which silently truncates bolt's 25-tool schema and makes the model "forget" what tools it has.
  • The system prompt fixes the second footgun: small models will happily generate tool-use examples in text instead of issuing function calls — the prompt forces them to invoke instead of describe.
  • bridge-api.py is the integration point for a Rails / FastAPI / other backend that wants to fire scans without spawning processes itself.

How to use

# one-shot CLI
./ollama-bridge.sh -p "Target: scanme.nmap.org. Recon it."

# interactive REPL
./ollama-bridge.sh

# HTTP API for external apps
python3 bridge-api.py            # listens on :8080
curl -X POST http://localhost:8080/scan \
  -H 'Content-Type: application/json' \
  -d '{"target":"scanme.nmap.org","iocs":["45.33.32.156"]}'

Override per call: MODEL=qwen2.5:14b-bolt, MAX_STEPS=60, SYSTEM_PROMPT=/path/to/role.md, etc.

Known issue worth a follow-up

Bolt's nmap (and any other plugin marked requiresRoot: true) wraps the binary with sudo via packages/core/src/executor.ts. The shipped Docker image runs as root and does not include sudo — so every root-required tool fails with Executable not found in $PATH: "sudo". As a runtime workaround we drop a passthrough shim into the running container:

docker exec bolt sh -c 'printf "#!/bin/sh\nexec \"\$@\"\n" > /usr/local/bin/sudo && chmod +x /usr/local/bin/sudo'

The clean upstream fix is a one-liner in executor.ts: skip the sudo prefix when process.getuid?.() === 0. Happy to send a separate PR for that if maintainers prefer.

Test plan

  • ollama-bridge.sh -p "say OK" returns OK (~5s, plain chat path)
  • ollama-bridge.sh -p "Target: scanme.nmap.org. Recon it." issues real bolt__nmap, bolt__httpx etc. tool calls visible in docker logs bolt
  • bridge-api.py GET /health returns ok with all three checks green
  • POST /scan with bare {} returns 400 with the right message
  • POST /scan with targets array of wrong type is rejected with a clear error
  • POST /scan with target only, targets only, and both, all auto-build a prompt and run end-to-end
  • system_prompt override is written to a tempfile, passed via SYSTEM_PROMPT env var to mcphost, and cleaned up after the run
  • Modelfile-built qwen2.5:7b-bolt reports context_length=32768 from /api/ps

🤖 Generated with Claude Code

Sajawal007 and others added 6 commits April 30, 2026 22:48
End-to-end bridge that lets a local Ollama model use bolt's MCP toolset
without depending on Claude/anthropic, plus a stdlib HTTP API so external
services (e.g. a Rails threat-intel pipeline) can call it over the wire.

- ollama-bridge.sh: wrapper around mcphost. Pulls bolt's MCP_ADMIN_TOKEN
  from the running container, health-checks ollama + bolt, applies a
  default system prompt, max-steps and low temperature. Single CLI
  entrypoint for both interactive and one-shot use.
- .mcphost.json: mcphost remote-server config pointing at bolt's HTTP MCP
  endpoint with bearer auth via ${env://BOLT_ADMIN_TOKEN} substitution.
- Modelfile.qwen{7,14}b-bolt: derived Ollama models that bake
  num_ctx=32768 (default 4096 was too small for the 25-tool schema +
  system prompt) and a low temperature for decisive tool calls.
- prompts/recon-agent.md: agent role + default playbook telling the model
  to classify the target, plan, and chain tools (subfinder -> dnsx ->
  httpx -> nuclei) instead of asking the user which tool to invoke.
- bridge-api.py: stdlib-only HTTP wrapper. POST /scan accepts
  {target, targets, iocs, infra, instructions, prompt, system_prompt,
  model, max_steps, timeout, verbose}. Auto-builds a prompt or accepts a
  raw one; supports per-request system-prompt override; returns JSON
  {ok, output, duration_ms, model, error}. GET /health verifies ollama,
  bolt and the bridge script.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /scan now returns a `tool_calls` array alongside `output` so the
Rails app can see exactly which bolt tools the model fired, with what
arguments, and a snippet of each result — without parsing freeform
narrative or running with `verbose: true`.

Each entry:
  { seq, tool, args, status, result }
where status is "ok" or "error" and result is truncated to 500 chars
to keep responses sane on big nuclei/ffuf dumps.

Implementation notes:
- Drop --quiet from the mcphost invocation so we capture the full
  transcript; --quiet is no longer needed since we surface only the
  fields the caller asked for.
- New parse_mcphost_output() walks mcphost's --compact TUI output and
  extracts tool calls + the final assistant frame. mcphost has no
  machine-readable mode in this version, so it's regex-based.
- Carriage returns (mcphost's per-token line redraws) are now turned
  into newlines instead of stripped, so spinner frames don't glue onto
  the start of a tool-open marker.
- Final assistant message is reassembled from the LAST "<" frame plus
  its word-wrapped continuation lines; a small dewrap heuristic fuses
  hyphen-split words and orphan continuations back together.
- New `verbose: true` flag adds `raw_transcript` to the response (full
  ANSI-stripped, noise-filtered transcript) for debugging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same body shape as /scan, but instead of blocking until the agent loop
finishes, the response streams one JSON event per line as state
transitions happen. Lets the Rails-side Solid Queue job (or any HTTP
client) update the UI in real time as bolt tools are invoked, instead
of staring at a "running…" spinner for 1–3 minutes.

Event types (one JSON object per line, application/x-ndjson):
  {type:"start",      ts_ms, model}
  {type:"tool_start", ts_ms, seq, tool, args}
  {type:"tool_end",   ts_ms, seq, status, result}   # deferred to next marker
  {type:"final",      ts_ms, ok, output, tool_calls, duration_ms, exit_code}
  {type:"error",      ts_ms, message}

Implementation:
- New StreamingParser: incremental version of parse_mcphost_output. Feed
  it logical lines; it yields events on state transitions. tool_end
  emission is deferred until the next marker so the result field carries
  the complete output, not a half-arrived prefix.
- run_bridge_stream(): spawns mcphost, uses select.select on the stdout
  fd so a stuck/long Thinking phase doesn't block the timeout check,
  splits incoming bytes on both \r and \n so TUI redraws produce
  separate logical lines.
- _handle_scan_stream() writes events with explicit flush() after every
  chunk, returns Connection: close + X-Accel-Buffering: no so proxies
  don't buffer the stream.
- Subprocess is terminated on client disconnect (BrokenPipeError ->
  generator GC -> finally block).
- Sync /scan endpoint refactored to share validation logic with the
  stream endpoint via _read_and_validate_body().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Symptoms reported: small models (qwen2.5:7b) only run recon tools and
fail those with wrong-shaped arguments.

Root causes (none in bridge-api.py logic itself):

1. recon-agent.md identified the model as a "reconnaissance agent" and
   defined success as "findings to report" — so any subdomain list was
   treated as the final answer and the run stopped after step 1 of the
   playbook. Renamed to "offensive-security analyst", made the four
   phases (enumerate → resolve → probe → vuln-scan) MANDATORY, and
   added an explicit "stop conditions" block that disallows stopping
   before vuln-scan has run.

2. The prompt listed tool names but never their parameter schemas, so
   small models guessed field names (target vs domain vs hosts) and the
   first call errored. Added a per-tool cheat-sheet with the correct
   field name, type (string vs array), and an example invocation for
   the 16 tools the model picks most often.

3. build_prompt() ended every request with "Recon this target now",
   reinforcing the recon-only framing. Changed to "Begin the engagement
   now …Run the full pipeline…". Type guidance is now always included,
   not only when both target and targets are set.

4. MAX_STEPS default of 30 was tight for full-pipeline runs (5 phases x
   1-2 retries each). Bumped to 50.

bridge-api.py code (validation, streaming, parsing, subprocess wiring)
is unchanged — those weren't the problem.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 7b model was hitting its agentic ceiling — choking on multi-step
chains and getting tool argument shapes wrong even with the tightened
system prompt. The 14b variant (already built locally via
Modelfile.qwen14b-bolt with num_ctx=32768) is materially better at
both tool selection and argument fidelity.

Override per-run with:  MODEL=qwen2.5:7b-bolt ./ollama-bridge.sh

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First pass of the production harness. Two enforcement layers:

1. Server-side scope allow-list (scope.json.example -> scope.json).
   bridge-api.py loads it once at startup and validates every
   target/targets entry on /scan and /scan/stream. Out-of-scope
   requests return HTTP 403 with {error:"target out of scope",
   out_of_scope:[...]} BEFORE mcphost is ever spawned — so no model
   inference, no tool calls, no bolt invocations against the wrong
   asset, even if Rails is tricked into widening scope.

   Rules:
   - allowed_domain_patterns: list of regexes matched against the
     hostname (scheme/port/path stripped)
   - allowed_cidrs: list of CIDR blocks for raw-IP targets
   - disabled_tools: list of bolt tool names the model must not call

   If scope.json is absent OR has no allow rules, scope checks are
   skipped — keeps local dev frictionless. Activation is reported in
   the startup banner.

   Raw `prompt`-only requests bypass scope (we can't reliably extract
   targets from free text); Rails callers should prefer structured
   fields when scope matters.

2. bolt__run_command disabled by default. The system prompt now
   carries a hard rule against calling it — a generic shell escape
   that defeats the structured tool surface and is the easiest sink
   for prompt-injection via scanned content. build_prompt() also
   injects the active disabled-tools list as a per-request reminder
   so the model sees it on every call, not only via the cached
   system prompt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant