feat: Ollama + MCP bridge so local models can drive bolt#7
Open
Sajawal007 wants to merge 6 commits into
Open
Conversation
End-to-end bridge that lets a local Ollama model use bolt's MCP toolset
without depending on Claude/anthropic, plus a stdlib HTTP API so external
services (e.g. a Rails threat-intel pipeline) can call it over the wire.
- ollama-bridge.sh: wrapper around mcphost. Pulls bolt's MCP_ADMIN_TOKEN
from the running container, health-checks ollama + bolt, applies a
default system prompt, max-steps and low temperature. Single CLI
entrypoint for both interactive and one-shot use.
- .mcphost.json: mcphost remote-server config pointing at bolt's HTTP MCP
endpoint with bearer auth via ${env://BOLT_ADMIN_TOKEN} substitution.
- Modelfile.qwen{7,14}b-bolt: derived Ollama models that bake
num_ctx=32768 (default 4096 was too small for the 25-tool schema +
system prompt) and a low temperature for decisive tool calls.
- prompts/recon-agent.md: agent role + default playbook telling the model
to classify the target, plan, and chain tools (subfinder -> dnsx ->
httpx -> nuclei) instead of asking the user which tool to invoke.
- bridge-api.py: stdlib-only HTTP wrapper. POST /scan accepts
{target, targets, iocs, infra, instructions, prompt, system_prompt,
model, max_steps, timeout, verbose}. Auto-builds a prompt or accepts a
raw one; supports per-request system-prompt override; returns JSON
{ok, output, duration_ms, model, error}. GET /health verifies ollama,
bolt and the bridge script.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /scan now returns a `tool_calls` array alongside `output` so the
Rails app can see exactly which bolt tools the model fired, with what
arguments, and a snippet of each result — without parsing freeform
narrative or running with `verbose: true`.
Each entry:
{ seq, tool, args, status, result }
where status is "ok" or "error" and result is truncated to 500 chars
to keep responses sane on big nuclei/ffuf dumps.
Implementation notes:
- Drop --quiet from the mcphost invocation so we capture the full
transcript; --quiet is no longer needed since we surface only the
fields the caller asked for.
- New parse_mcphost_output() walks mcphost's --compact TUI output and
extracts tool calls + the final assistant frame. mcphost has no
machine-readable mode in this version, so it's regex-based.
- Carriage returns (mcphost's per-token line redraws) are now turned
into newlines instead of stripped, so spinner frames don't glue onto
the start of a tool-open marker.
- Final assistant message is reassembled from the LAST "<" frame plus
its word-wrapped continuation lines; a small dewrap heuristic fuses
hyphen-split words and orphan continuations back together.
- New `verbose: true` flag adds `raw_transcript` to the response (full
ANSI-stripped, noise-filtered transcript) for debugging.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same body shape as /scan, but instead of blocking until the agent loop
finishes, the response streams one JSON event per line as state
transitions happen. Lets the Rails-side Solid Queue job (or any HTTP
client) update the UI in real time as bolt tools are invoked, instead
of staring at a "running…" spinner for 1–3 minutes.
Event types (one JSON object per line, application/x-ndjson):
{type:"start", ts_ms, model}
{type:"tool_start", ts_ms, seq, tool, args}
{type:"tool_end", ts_ms, seq, status, result} # deferred to next marker
{type:"final", ts_ms, ok, output, tool_calls, duration_ms, exit_code}
{type:"error", ts_ms, message}
Implementation:
- New StreamingParser: incremental version of parse_mcphost_output. Feed
it logical lines; it yields events on state transitions. tool_end
emission is deferred until the next marker so the result field carries
the complete output, not a half-arrived prefix.
- run_bridge_stream(): spawns mcphost, uses select.select on the stdout
fd so a stuck/long Thinking phase doesn't block the timeout check,
splits incoming bytes on both \r and \n so TUI redraws produce
separate logical lines.
- _handle_scan_stream() writes events with explicit flush() after every
chunk, returns Connection: close + X-Accel-Buffering: no so proxies
don't buffer the stream.
- Subprocess is terminated on client disconnect (BrokenPipeError ->
generator GC -> finally block).
- Sync /scan endpoint refactored to share validation logic with the
stream endpoint via _read_and_validate_body().
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Symptoms reported: small models (qwen2.5:7b) only run recon tools and fail those with wrong-shaped arguments. Root causes (none in bridge-api.py logic itself): 1. recon-agent.md identified the model as a "reconnaissance agent" and defined success as "findings to report" — so any subdomain list was treated as the final answer and the run stopped after step 1 of the playbook. Renamed to "offensive-security analyst", made the four phases (enumerate → resolve → probe → vuln-scan) MANDATORY, and added an explicit "stop conditions" block that disallows stopping before vuln-scan has run. 2. The prompt listed tool names but never their parameter schemas, so small models guessed field names (target vs domain vs hosts) and the first call errored. Added a per-tool cheat-sheet with the correct field name, type (string vs array), and an example invocation for the 16 tools the model picks most often. 3. build_prompt() ended every request with "Recon this target now", reinforcing the recon-only framing. Changed to "Begin the engagement now …Run the full pipeline…". Type guidance is now always included, not only when both target and targets are set. 4. MAX_STEPS default of 30 was tight for full-pipeline runs (5 phases x 1-2 retries each). Bumped to 50. bridge-api.py code (validation, streaming, parsing, subprocess wiring) is unchanged — those weren't the problem. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 7b model was hitting its agentic ceiling — choking on multi-step chains and getting tool argument shapes wrong even with the tightened system prompt. The 14b variant (already built locally via Modelfile.qwen14b-bolt with num_ctx=32768) is materially better at both tool selection and argument fidelity. Override per-run with: MODEL=qwen2.5:7b-bolt ./ollama-bridge.sh Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First pass of the production harness. Two enforcement layers:
1. Server-side scope allow-list (scope.json.example -> scope.json).
bridge-api.py loads it once at startup and validates every
target/targets entry on /scan and /scan/stream. Out-of-scope
requests return HTTP 403 with {error:"target out of scope",
out_of_scope:[...]} BEFORE mcphost is ever spawned — so no model
inference, no tool calls, no bolt invocations against the wrong
asset, even if Rails is tricked into widening scope.
Rules:
- allowed_domain_patterns: list of regexes matched against the
hostname (scheme/port/path stripped)
- allowed_cidrs: list of CIDR blocks for raw-IP targets
- disabled_tools: list of bolt tool names the model must not call
If scope.json is absent OR has no allow rules, scope checks are
skipped — keeps local dev frictionless. Activation is reported in
the startup banner.
Raw `prompt`-only requests bypass scope (we can't reliably extract
targets from free text); Rails callers should prefer structured
fields when scope matters.
2. bolt__run_command disabled by default. The system prompt now
carries a hard rule against calling it — a generic shell escape
that defeats the structured tool surface and is the easiest sink
for prompt-injection via scanned content. build_prompt() also
injects the active disabled-tools list as a per-request reminder
so the model sees it on every call, not only via the cached
system prompt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an end-to-end bridge that lets a local Ollama model use bolt's MCP toolset without depending on Claude/Anthropic, plus a stdlib HTTP API so external services (e.g. a Rails threat-intel pipeline) can call it over the wire. Nothing in bolt's existing code paths is changed; everything here is additive.
What changed
ollama-bridge.shmcphost. Pulls bolt'sMCP_ADMIN_TOKENfrom the running container, health-checks ollama + bolt, applies the default system prompt, max-steps and temperature..mcphost.json${env://BOLT_ADMIN_TOKEN}substitution.Modelfile.qwen7b-bolt,Modelfile.qwen14b-boltnum_ctx=32768(default 4096 was too small for bolt's 25-tool schema + system prompt) and a low temperature for decisive tool calls.prompts/recon-agent.mdbridge-api.pyPOST /scanaccepts{target, targets, iocs, infra, instructions, prompt, system_prompt, model, max_steps, timeout, verbose}; auto-builds a prompt or accepts a raw one; supports per-request system-prompt override; returns JSON{ok, output, duration_ms, model, error}.GET /healthverifies ollama, bolt, and the bridge script.Why
Bolt today is consumed by Claude Desktop / Claude Code — i.e. an Anthropic-hosted client drives the toolset. This PR makes bolt usable from any local Ollama model and from any HTTP-speaking app. The pieces are intentionally split:
qwen2.5:*with a 4 K context by default, which silently truncates bolt's 25-tool schema and makes the model "forget" what tools it has.bridge-api.pyis the integration point for a Rails / FastAPI / other backend that wants to fire scans without spawning processes itself.How to use
Override per call:
MODEL=qwen2.5:14b-bolt,MAX_STEPS=60,SYSTEM_PROMPT=/path/to/role.md, etc.Known issue worth a follow-up
Bolt's
nmap(and any other plugin markedrequiresRoot: true) wraps the binary withsudoviapackages/core/src/executor.ts. The shipped Docker image runs as root and does not includesudo— so every root-required tool fails withExecutable not found in $PATH: "sudo". As a runtime workaround we drop a passthrough shim into the running container:The clean upstream fix is a one-liner in
executor.ts: skip thesudoprefix whenprocess.getuid?.() === 0. Happy to send a separate PR for that if maintainers prefer.Test plan
ollama-bridge.sh -p "say OK"returnsOK(~5s, plain chat path)ollama-bridge.sh -p "Target: scanme.nmap.org. Recon it."issues realbolt__nmap,bolt__httpxetc. tool calls visible indocker logs boltbridge-api.pyGET /healthreturns ok with all three checks greenPOST /scanwith bare{}returns 400 with the right messagePOST /scanwithtargetsarray of wrong type is rejected with a clear errorPOST /scanwithtargetonly,targetsonly, and both, all auto-build a prompt and run end-to-endsystem_promptoverride is written to a tempfile, passed viaSYSTEM_PROMPTenv var to mcphost, and cleaned up after the runqwen2.5:7b-boltreportscontext_length=32768from/api/ps🤖 Generated with Claude Code