Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
b2dbaa1
fix(claude): use proper namespace for tool (mcp__plugin_<plugin>_<ser…
esifea Jun 9, 2026
48947d9
docs(skill): update activate/configure MCP workflow
esifea Jun 9, 2026
51bb701
docs: fix mismatched reference
esifea Jun 9, 2026
c0c635c
fix(bootstrap): add fail-fast install to prevent consuming budget on …
esifea Jun 9, 2026
db90dfd
chore(cli): fix typo
esifea Jun 10, 2026
04dff6e
fix(cli): harden initial bootstrap considering network failures and s…
esifea Jun 9, 2026
91e3182
chore: fix typo and misleading comment
esifea Jun 10, 2026
6204444
fix(bootstrap): make test hardening for lock test
esifea Jun 11, 2026
2313204
test(cli): add tests for waitForFile
esifea Jun 11, 2026
ca2be0f
test(cli): add test for 'install --json'
esifea Jun 11, 2026
b9d21cb
test(supervisor): test SIGKILL escalation
esifea Jun 11, 2026
c0327a7
test(bootstrap): add more install/extract tests
esifea Jun 11, 2026
6d142f3
fix(cli): fall back to RUNE_MANIFEST env if empty argument
esifea Jun 11, 2026
47cc161
fix(cli): send flag parse and usage error to stderr rather than stdout
esifea Jun 11, 2026
7d51bf1
test(cli): add more specified tests
esifea Jun 12, 2026
0f3b3e6
feat(ci): run tests on release branches also
esifea Jun 12, 2026
31b5a29
feat(ci): run tests on release branches also
esifea Jun 12, 2026
1a65558
fix(bootstrap): repair corrupted/staled binaries on install
esifea Jun 15, 2026
85ee009
docs: remove internal v0.4 Go migration design docs
jh-lee-cryptolab Jun 16, 2026
4fc3608
Merge pull request #175 from jh-lee-cryptolab/chore/remove-internal-v…
jh-lee-cryptolab Jun 16, 2026
dc8bc28
feat(cli): 'version' subcommand shows installed rune-mcp/runed
esifea Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/pr-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ name: PR Tests

on:
pull_request:
branches: [main]
branches: [main, "release/**"]
push:
branches: [main]
branches: [main, "release/**"]

permissions:
contents: read
Expand Down
32 changes: 19 additions & 13 deletions AGENT_INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
# Agent Integration Guide

Rune works with all major AI agents via native MCP (Model Context Protocol)
support. In v0.4 the MCP server is a single Go binary
(`bin/rune-mcp`) that the host CLI auto-spawns over stdio — no Python
runtime, no `pip install`, no manual `mcp add` for the supported CLIs.
support. In v0.4 the MCP server is a single Go binary (`rune-mcp`) that the
host CLI auto-spawns over stdio through the committed bash wrapper
`bin/rune mcp-server` — no Python runtime, no `pip install`, no manual
`mcp add` for the supported CLIs.

## Integration Principles

### Cross-agent common (single source of truth)
- The Go binary at `cmd/rune-mcp/` is the only MCP server entry point.
Plugin / extension manifests point each CLI at the same binary.
- Runtime preparation happens at install time (the binary is already
built and shipped with the plugin tarball — see Task #30 for the
release pipeline). Nothing needs to be (re)bootstrapped at session
start.
- The CLI entry point is `cmd/rune/` (the `rune` binary). Plugin /
extension manifests point each CLI at the committed bash wrapper
`bin/rune` invoked as `rune mcp-server`, which execs the downloaded
`rune-mcp` MCP server.
- Runtime preparation happens on the first MCP spawn, not at plugin
install: the wrapper self-installs the `rune` CLI and downloads the
pinned `rune-mcp` binary (per `.release-pins.yaml`) into `~/.rune/bin/`,
then execs it — so the server comes online in the same session with no
manual `/mcp` reconnect or restart.

### Agent-specific adapters (thin layer only)
- Codex-only tasks: `codex mcp add/remove/list` registration flows
Expand Down Expand Up @@ -48,10 +52,12 @@ $ claude plugin install rune
> /plugin install rune
```

The plugin manifest (`.claude-plugin/plugin.json`) declares the binary
path; Claude Code spawns `${CLAUDE_PLUGIN_ROOT}/bin/rune-mcp` via stdio
on session start. enVector Cloud credentials are delivered automatically
via the Vault bundle — you never set `ENVECTOR_*` env vars directly.
The plugin manifest (`.claude-plugin/plugin.json`) declares the wrapper
path; Claude Code spawns `${CLAUDE_PLUGIN_ROOT}/bin/rune mcp-server` via
stdio on session start (on a fresh install the wrapper self-installs
rune-mcp first, then execs it). enVector Cloud credentials are delivered
automatically via the Vault bundle — you never set `ENVECTOR_*` env vars
directly.

### Configure credentials

Expand Down
54 changes: 33 additions & 21 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,15 @@ If in Active state but operations fail:

Note: enVector credentials are delivered automatically via the Vault bundle — no user input needed.

4. Create `~/.rune/config.json` with `state: "active"` and the values
above (`mkdir -p ~/.rune && chmod 700 ~/.rune`, then `chmod 600` the
file).
5. Call the `reload_pipelines` MCP tool. The MCP server's boot loop
dials Vault, fetches the agent manifest (EncKey + envector
creds), connects to enVector, and transitions to Active.
4. Call the `configure` MCP tool with the collected values
(`endpoint`, `token`, `ca_cert_path`, `tls_disable`). The server does
the atomic 0600 write to `~/.rune/config.json`, sets `state: "active"`,
refreshes `metadata.lastUpdated`, and runs a best-effort Vault probe.
The agent never writes the config file itself.
5. Call the `activate` MCP tool to bring pipelines online. It runs the
prereq checks server-side and drives the boot loop: dials Vault,
fetches the agent manifest (EncKey + enVector creds), connects to
enVector, and transitions to Active.
6. Confirm health by calling `diagnostics` and applying the
**Boot Failure — Fast-Fail Rule** (see section below). If
`vault.last_boot_error` is present, surface its `hint` verbatim
Expand Down Expand Up @@ -195,28 +198,37 @@ Recommendations:

**Note**: In most cases, simply asking naturally ("Why did we choose PostgreSQL?") triggers Retriever automatically — no command needed.

### `/rune:activate` (or `/rune:wakeup`)
### `/rune:activate`
(or `$rune activate` for Codex CLI)

**Purpose**: Attempt to activate plugin after infrastructure is ready

**Use Case**: Infrastructure was not ready during configure, but now it's deployed and running.

**Steps**:
1. Check if config exists
- NO → Redirect to `/rune:configure` (or `$rune configure` for Codex CLI)
- YES → Continue
2. If `state` is already `"active"`, skip to step 4 (just verify health).
3. If `state` is `"dormant"`, set it to `"active"` and clear any
`dormant_reason` / `dormant_since` fields.
4. Call the `reload_pipelines` MCP tool. From a terminal Dormant the
boot loop is re-spawned; from Active it is a no-op.
5. Call `diagnostics` and apply the **Boot Failure — Fast-Fail Rule**
(section below).
6. If `vault.last_boot_error` is present: surface its `hint` verbatim,
suggest the matching recovery action, and stop. Do NOT loop on
`reload_pipelines` or probe with shell tools — the classifier has
already done that work. Otherwise render the per-subsystem snapshot.
1. Call the `activate` MCP tool — no Read, no Edit, no manual state
inspection. It runs the prereq checks server-side (config present,
runed socket reachable + Health probe) and only triggers the boot
loop when everything is ready. It returns a `status`:
`configure_required` | `install_pending` | `waiting_for_bootstrap` |
`active` | `waiting_for_vault` | `dormant`.
2. Branch on `status`:
- `configure_required` → redirect to `/rune:configure`; use the `hint`
verbatim and stop.
- `install_pending` → invoke the recovery in `hint` (the agent runs
`rune install`, never the user), then retry `/rune:activate` once.
- `waiting_for_bootstrap` → runed is still downloading llama-server /
the embedding model; summarize `.bootstrap` progress, tell the user
no further action is needed, and stop (do NOT poll).
- `active` → optionally call `diagnostics` once and render the
per-subsystem snapshot.
- `waiting_for_vault` / `dormant` → apply the **Boot Failure —
Fast-Fail Rule** (below): surface `reload.last_boot_error.hint`
verbatim, suggest one recovery, and stop.

(Older rune-mcp binaries without the `activate` tool fall back to the
legacy flow: set `state: "active"`, call `reload_pipelines` directly, and
branch on `diagnostics.vault.last_boot_error`.)

### `/rune:reset`
(or `$rune reset` for Codex CLI)
Expand Down
152 changes: 132 additions & 20 deletions bin/rune
Original file line number Diff line number Diff line change
Expand Up @@ -44,30 +44,135 @@ mkdir -p "$RUNE_HOME"
LOCK_DIR="$RUNE_HOME/bootstrap.lock.d"
TMP=""
SUMS=""
OWNER_TOKEN=""
cleanup() {
[ -n "$TMP" ] && rm -f "$TMP"
[ -n "$SUMS" ] && rm -f "$SUMS"
rmdir "$LOCK_DIR" 2>/dev/null || true
# Release lock after checking token is valid or not
if [ -n "$OWNER_TOKEN" ] && [ "$(cat "$LOCK_DIR/owner" 2>/dev/null || true)" = "$OWNER_TOKEN" ]; then
rm -f "$LOCK_DIR/owner" 2>/dev/null || true
rmdir "$LOCK_DIR" 2>/dev/null || true
fi
}

# Network time budget
# - `mcp-server`: MCP entrypoint run by Claude Code session with ~30s timeout.
# SIGKILL after timeout skip cleanup which leave unreleased bootstrap lock;
# so overall time should be less than 30s.
# Worst case: API resolve up to 7s + binary download up to 13s + checksum up to 7s
# - other: matched with each downloaded binaries' deadline
if [ "${1:-}" = mcp-server ]; then
NET_RETRY=3; NET_RETRY_DELAY=1; NET_RETRY_MAXTIME=3
NET_API_MAXTIME=4; NET_BIN_MAXTIME=10; NET_CHECKSUM_MAXTIME=4
else
NET_RETRY=3; NET_RETRY_DELAY=2; NET_RETRY_MAXTIME=60
NET_API_MAXTIME=20; NET_BIN_MAXTIME=120; NET_CHECKSUM_MAXTIME=30
fi

# retries fast transient errors such as Github CDN failures (504, timeouts) only;
# slow/hung requests are intentionally not retried to stay within the spawn budget.
# Caller add NET_{API|BIN|CHECKSUM}_MAXTIME properly on each step
fetch() {
curl --fail --silent --show-error --location --connect-timeout 5 \
--retry "$NET_RETRY" --retry-delay "$NET_RETRY_DELAY" \
--retry-max-time "$NET_RETRY_MAXTIME" "$@"
}

# Lock waiting budget to exit before Claude code MCP spawn timeout (~30s)
LOCK_WAIT_BUDGET="${RUNE_LOCK_WAIT_BUDGET:-20}"
# Lock's wall-clock age to prevent alive but stuck holder
# Worst case: NET_RETRY_MAXTIME + NET_{API|BIN|CHECKSUM}_MAXTIME (about 350s) when !mcp-server
LOCK_STALE_AFTER="${RUNE_LOCK_STALE_AFTER:-360}"

# Atomically take stale lock and remove it
clear_stale_lock() {
if mv "$LOCK_DIR" "$LOCK_DIR.reclaim.$$" 2>/dev/null; then
rm -rf "$LOCK_DIR.reclaim.$$" 2>/dev/null || true
fi
return 0
}

waited=0
while ! mkdir "$LOCK_DIR" 2>/dev/null; do # another session hold lock
wait_count=0
while true; do
# Claim lock atomically
if mkdir "$LOCK_DIR" 2>/dev/null; then
OWNER_TOKEN="$$ $(date +%s)" # "<pid> <timestamp>"
if ( set -C; printf '%s\n' "$OWNER_TOKEN" > "$LOCK_DIR/owner" ) 2>/dev/null; then
trap cleanup EXIT INT TERM
# Double-check if mkdir -> write gap affect lock
if [ "$(cat "$LOCK_DIR/owner" 2>/dev/null || true)" = "$OWNER_TOKEN" ]; then
break
fi
trap - EXIT INT TERM
OWNER_TOKEN=""
continue
fi

OWNER_TOKEN=""
if [ ! -d "$LOCK_DIR" ]; then
continue # lock is cleared, retry claim
fi

# Real write error (disk full, permission, or others)
if [ ! -e "$LOCK_DIR/owner" ]; then
echo "rune: cannot record install bootstrap lock owner (file write failed)" >&2
exit 1
fi
fi

# Wait for another process as we failed to claim lock
if [ -x "$TARGET" ]; then
exec "$TARGET" "$@" # bootstrap finished
fi

# Validate owner
owner="$(cat "$LOCK_DIR/owner" 2>/dev/null || true)"
pid="${owner%% *}"
case "$owner" in
*" "*) ts="${owner##* }" ;;
*) ts="" ;;
esac

if [ -z "$owner" ]; then
# Dir is created but no owner yet; holder in the middle of claim or died
wait_count=$((wait_count + 1))
if [ "$wait_count" -ge 5 ]; then
echo "rune: bootstrap lock not claimed for ${wait_count}s; reclaiming" >&2
clear_stale_lock; wait_count=0; continue
fi
else
wait_count=0
if [ -n "$pid" ] && ! kill -0 "$pid" 2>/dev/null; then
# Holder process not found; lock is leaked
echo "rune: bootstrap lock holder (pid $pid) is not found; reclaiming" >&2
clear_stale_lock; continue
fi

# Check wall-clock age
case "$ts" in
''|*[!0-9]*) age=0 ;;
*) age=$(( $(date +%s) - ts )) ;;
esac

if [ "$age" -ge "$LOCK_STALE_AFTER" ]; then
echo "rune: bootstrap lock stale (${age}s); reclaiming" >&2
clear_stale_lock; continue
fi

if [ "$waited" -ge "$LOCK_WAIT_BUDGET" ]; then
echo "rune: another rune bootstrap is in progress over MCP spawn budget." >&2
echo " Retry in a moment, or run it out-of-band:" >&2
echo " bash -c \"\${CLAUDE_PLUGIN_ROOT:-.}/bin/rune install\"" >&2
exit 1
fi
fi

sleep 1
waited=$((waited + 1))
if [ "$waited" -ge 120 ]; then
echo "rune: bootstrap lock held >120s, reclaiming" >&2
rmdir "$LOCK_DIR" 2>/dev/null || true
waited=0
fi
done

# Check error
trap cleanup EXIT INT TERM

# Double-check: install is completed right before we won the lock
if [ -x "$TARGET" ]; then
cleanup
trap - EXIT INT TERM
Expand All @@ -85,12 +190,9 @@ if [ -z "$RUNE_VERSION" ]; then
# Use token if exist
token="${GITHUB_TOKEN:-${GH_TOKEN:-}}"
if [ -n "$token" ]; then
body="$(curl --fail --silent --show-error --location --connect-timeout 10 --max-time 20 \
--retry 3 --retry-delay 2 \
--header "Authorization: Bearer $token" "$api" || true)"
body="$(fetch --max-time "$NET_API_MAXTIME" --header "Authorization: Bearer $token" "$api" || true)"
else
body="$(curl --fail --silent --show-error --location --connect-timeout 10 --max-time 20 \
--retry 3 --retry-delay 2 "$api" || true)"
body="$(fetch --max-time "$NET_API_MAXTIME" "$api" || true)"
fi

RUNE_VERSION="$(printf '%s' "$body" \
Expand Down Expand Up @@ -128,12 +230,22 @@ mkdir -p "$(dirname "$TARGET")"
TMP="$(mktemp "$(dirname "$TARGET")/.rune-bootstrap-XXXXXX")"
SUMS="$(mktemp -t rune-bootstrap-sums-XXXXXX)"

# --retry rides out transient GitHub CDN failures (504, timeouts) instead
# of aborting the whole bootstrap on the first blip.
curl --fail --silent --show-error --location --connect-timeout 10 --max-time 120 --retry 3 --retry-delay 2 "$RELEASE_BASE/$ASSET" -o "$TMP"
curl --fail --silent --show-error --location --connect-timeout 10 --max-time 30 --retry 3 --retry-delay 2 "$RELEASE_BASE/checksums.txt" -o "$SUMS"
if ! fetch --max-time "$NET_BIN_MAXTIME" "$RELEASE_BASE/$ASSET" -o "$TMP"; then
echo "rune: could not download $ASSET ($RUNE_VERSION) after retries." >&2
echo " The release endpoint may be slow or temporarily unavailable (e.g. HTTP 504)." >&2
echo " Recover out-of-band, then reconnect /mcp:" >&2
echo " bash -c \"\${CLAUDE_PLUGIN_ROOT:-.}/bin/rune install\"" >&2
exit 1
fi
if ! fetch --max-time "$NET_CHECKSUM_MAXTIME" "$RELEASE_BASE/checksums.txt" -o "$SUMS"; then
echo "rune: could not download checksums.txt ($RUNE_VERSION) after retries." >&2
echo " The release endpoint may be slow or temporarily unavailable (e.g. HTTP 504)." >&2
echo " Recover out-of-band, then reconnect /mcp:" >&2
echo " bash -c \"\${CLAUDE_PLUGIN_ROOT:-.}/bin/rune install\"" >&2
exit 1
fi

EXPECTED="$(grep " $ASSET\$" "$SUMS" | cut -d' ' -f1)"
EXPECTED="$(grep " $ASSET\$" "$SUMS" | cut -d' ' -f1 || true)"
if [ -z "$EXPECTED" ]; then
echo "rune: $ASSET not listed in checksums.txt for $RUNE_VERSION" >&2
exit 1
Expand Down
16 changes: 15 additions & 1 deletion cmd/rune/install.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"flag"
"fmt"
"io"
"os"

"github.com/CryptoLabInc/rune-cli/internal/bootstrap"
)
Expand All @@ -20,8 +21,21 @@ func runInstall(ctx context.Context, args []string, stdout, stderr io.Writer) in
return 2
}

// Check RUNE_MANIFEST before fail
if *manifest == "" {
fmt.Fprintln(stderr, "rune install: no manifest URL configured (set --manifest-url or RUNE_MANIFEST)")
if env := os.Getenv("RUNE_MANIFEST"); env != "" {
*manifest = env
}
}

if *manifest == "" {
const msg = "no manifest URL configured (set --manifest-url or RUNE_MANIFEST)"
if *jsonOut {
_ = json.NewEncoder(stdout).Encode(jsonEvent{Event: "summary", Error: msg})
} else {
fmt.Fprintln(stderr, "rune install: "+msg)
}

return 2
}

Expand Down
Loading