Skip to content

Epic: Phase 1 — Go runtime migration + unified binary #61

@jh-lee-cryptolab

Description

@jh-lee-cryptolab

Context

Replace the Python gRPC vault (vault/vault_grpc_server.py + vault/vault_admin_cli.py + pyenvector) with a single Go binary runevault built on github.com/CryptoLabInc/envector-go-sdk@v0.1.0. The binary is both the daemon (runevault daemon start) and the admin CLI (runevault token …, runevault role …, runevault daemon stop, etc.). The vault_service.proto surface and vault-{tokens,roles}.yml schemas are preserved; runtime configuration moves from VAULT_* / ENVECTOR_* env vars to a single YAML file runevault.conf.

Docker artifacts (Dockerfile, docker-compose.yml, docker-entrypoint.sh, mise docker tasks, .github/workflows/docker-publish.yml) are deleted in this phase. Dev invocation is go run ./cmd/runevault daemon start; the public install flow is restored in Phase 3 (#64).

flowchart TB
  subgraph Daemon["runevault daemon start — single long-lived process"]
    direction TB
    GS[gRPC listener<br/>:50051]:::listener
    AS[admin HTTP listener<br/>admin.sock]:::listener
    Core[shared state<br/>TokenStore + VaultCore + rate limiters]
    GS --> Core
    AS --> Core
  end

  Plugin[[rune plugin]] -->|gRPC over TLS| GS

  subgraph Invocations["short-lived CLI invocations (separate processes)"]
    direction LR
    stp[runevault daemon stop<br/>sends SIGTERM via PID file]
    tok["runevault token ..."]
    rol["runevault role ..."]
    sts[runevault status]
    ver[runevault version<br/>no IPC, no daemon needed]
  end
  tok -->|UDS HTTP| AS
  rol -->|UDS HTTP| AS
  sts -->|UDS HTTP| AS

  classDef listener fill:#dff,stroke:#06a
Loading

Arrows are connect direction. The daemon owns both listeners and a shared in-memory TokenStore; admin subcommands are separate short-lived processes that dial admin.sock. daemon stop signals the running daemon by reading a PID file (not a UDS round-trip).

Non-goals

Design

Go module layout:

vault/
  cmd/runevault/
    main.go                 # thin entry: commands.Execute()
  internal/
    commands/
      root.go               # cobra root + global flags (--config)
      adminclient.go        # AdminClient interface + HTTP-over-UDS impl
      daemon.go             # `daemon {start,stop,restart}` — PID file lifecycle, signals
      token.go              # `token {issue,revoke,rotate,list}`
      role.go               # `role {list,create,update,delete}`
      status.go             # `status` (health + PID + liveness)
      version.go            # works offline
    server/
      grpc.go          # [server side] VaultService impl
      admin.go         # [server side] UDS HTTP handlers — shares JSON wire contract with commands/adminclient.go
      interceptors.go  # validation + auth + rate-limit gRPC interceptors
      audit.go         # audit log emission (file / stdout / both)
      config.go        # YAML loader for runevault.conf
    crypto/
      keys.go          # FHE key lifecycle (envector-go-sdk, SecKey only) + Decrypt wrapper for scores
      metadata.go      # HKDF-SHA256 + AES-GCM metadata decrypt (stdlib)
    tokens/
      store.go         # in-memory tokens + async YAML persistence (shared by both listeners)
      roles.go         # role model + scope/rate limits
  pkg/vaultpb/              # generated from vault_service.proto (buf)
  buf.gen.yaml, buf.yaml
go.mod / go.sum

Configuration: daemon reads runevault.conf (YAML). Default path lookup: /opt/rune-vault/configs/runevault.conf, else ./runevault.conf (dev / cwd). Override via --config <path> (global flag on any subcommand). Admin and daemon subcommands use the same lookup — admin reads server.admin.socket, daemon also reads daemon.pid_file.

Schema:

daemon:
  pid_file: /opt/rune-vault/.runevault.pid   # hidden; daemon internal bookkeeping

server:
  grpc:
    host: 0.0.0.0
    port: 50051
    tls:
      cert: /opt/rune-vault/certs/server.pem
      key: /opt/rune-vault/certs/server.key
      disable: false          # true for dev only
  admin:
    socket: /opt/rune-vault/admin.sock

keys:
  path: /opt/rune-vault/vault-keys       # Enc/Sec/Eval key files
  index_name: my-team
  embedding_dim: 1024

envector:
  endpoint: https://envector.example.com
  api_key: <opaque-token-string>

tokens:
  team_secret: <random-hex-32-bytes>
  roles_file: /opt/rune-vault/configs/roles.yml
  tokens_file: /opt/rune-vault/configs/tokens.yml

audit:
  mode: file+stdout                # file | stdout | file+stdout
  path: /opt/rune-vault/logs/audit.log

Secret fields (envector.api_key, tokens.team_secret) are both stored inline by default. Without an external keystore (HashiCorp Vault, AWS Secrets Manager, K8s Secrets) every sensitive value ultimately lives on the same vault-user-owned filesystem anyway — splitting them into separate 0600 files is cosmetic, not a real security boundary.

The schema still accepts envector.api_key_file / tokens.team_secret_file indirection for deployments that DO integrate with a secret manager (e.g., a K8s-mounted secret path) — but that is not the default.

runevault.conf is mode 0600, vault-user owned.

Daemon lifecycle via CLI:

  • runevault daemon start — runs the server in the foreground (blocks). Writes its PID atomically to daemon.pid_file. Clean shutdown (SIGTERM) removes the PID file. Crash leaves the PID stale; next start detects + overwrites with a warning. systemd / launchd invoke this form directly.
  • runevault daemon stop — reads daemon.pid_file, sends SIGTERM to that PID, waits up to 10s for exit (configurable via --timeout). Works regardless of who started the daemon (systemd, launchctl, or an operator running it under nohup).
  • runevault daemon restartstop then start.
  • runevault status — combined health + PID + socket liveness report (see Subcommand surface).

Only one daemon start can succeed per PID-file path — it uses flock or equivalent to fail cleanly when another daemon is already running.

FHE: OpenKeysFromFile(..., WithKeyParts(KeyPartSec)) — decrypt-only, enforced at the SDK boundary. keys.Decrypt(blob) maps directly onto DecryptScores.

Metadata decrypt: SDK returns opaque string; HKDF-SHA256 + AES-GCM ported to Go stdlib crypto (parity with vault_core.py).

Admin transport: net/http over Unix domain socket. See "Admin endpoint" below.

Tokens/roles: port token_store.py; same YAML schemas for vault-tokens.yml / vault-roles.yml runtime state.

Validation: gRPC unary interceptor backed by protovalidate-go.

Subcommand surface

New Replaces
runevault daemon start vault_grpc_server.py (plus PID file write)
runevault daemon stop docker compose down / systemctl stop rune-vault / kill $(cat ...)
runevault daemon restart stop + start convenience
runevault token {issue,revoke,rotate,list} vault_admin_cli.py token …
runevault role {list,create,update,delete} vault_admin_cli.py role …
runevault status curl 127.0.0.1:8081/health (plus PID + socket liveness)
runevault version — (build tag + commit SHA; offline)

Admin subcommands resolve the socket path by loading runevault.conf from the default lookup path and reading server.admin.socket. Override: --config <path> selects an alternate config, or --admin-socket <path> overrides just the socket field. runevault version skips config loading entirely.

vault_admin_cli.py becomes a thin exec runevault "$@" shim; removed after one release.

Compatibility table

Surface Contract
gRPC :50051 default, TLS via server.grpc.tls.{cert,key}, insecure via server.grpc.tls.disable
Admin HTTP UDS at server.admin.socket (default <install-dir>/admin.sock). Routes unchanged. Mode 0600.
Token evt_ + 32 hex = 36 chars
Runtime state configs/{tokens,roles}.yml — YAML schema unchanged from Python's vault-{tokens,roles}.yml, files just move + rename into configs/
Key files {keys.path}/vault-key/{Enc,Sec,Eval}Key.json
Config runevault.conf (YAML). Replaces VAULT_* / ENVECTOR_* env vars from the Python deployment.
PID file daemon.pid_file (default <install-dir>/.runevault.pid, hidden). Written by daemon start, removed on clean exit.

Admin endpoint — socket-based access

Admin surface moves from loopback TCP + no-auth (Python) to a Unix domain socket. Filesystem permissions are the access control.

  • Transport: net.Listen("unix", path) + http.Server. Routes identical to Python.
  • Path: server.admin.socket in runevault.conf (default <install-dir>/admin.sock). Override via --admin-socket on daemon start.
  • Mode: 0600, vault-user owned (daemon umasks 0077 before listening).
  • Stale socket: daemon unlinks any leftover path before Listen; graceful shutdown removes it.
  • Health probes: external monitoring uses the gRPC health service on :50051. Admin UDS still serves GET /health for local curl --unix-socket diagnostics.

Acceptance

  • Go ports of tests/{unit,integration}/ pass on linux/amd64 and linux/arm64
  • Golden compat test: Python vs Go gRPC responses match for the recorded request corpus (CI gate until Python runtime retired)
  • CLI compat test: shim-bridged vault_admin_cli.py and runevault produce identical stdout/exit for every documented command
  • go run ./cmd/runevault daemon start boots cleanly against existing ~/rune-vault/ install
  • runevault --help flags match vault_admin_cli.py 1:1
  • runevault daemon start writes PID file with the process's own PID; clean SIGTERM removes it
  • runevault daemon stop signals the PID from the file, waits for exit, returns non-zero if the process does not exit in time
  • Second runevault daemon start while another is running aborts with a clear conflict message (no PID file clobber)
  • runevault token issue --user alice --role member returns evt_ + 32 hex
  • Admin socket mode 0600, vault-user owned; connect() from another user → permission denied
  • runevault --config /tmp/alt.conf daemon start loads the alternate config; missing config file produces an error naming the lookup paths searched
  • server.admin.socket change in runevault.conf (or --admin-socket override) takes effect on daemon restart
  • Both inline secrets and *_file indirection work for envector.api_key / tokens.team_secret; *_file pointing to a non-0600 file logs a warning
  • Graceful shutdown removes the socket; stale socket on startup is recovered
  • runevault version works with no daemon and no socket
  • Secret key material never appears in logs, responses, or metrics
  • Dockerfile, docker-compose.yml, docker-entrypoint.sh, mise docker tasks, and docker-publish.yml removed from the repo

Open questions

  • Verify on-disk key format compatibility between pyenvector and envector-go-sdk before skeleton lands; add a one-shot migration helper if formats diverge.
  • Keep scripts/generate-test-fixtures.py in Python — dev-only.
  • systemd unit Type=simple (treat daemon start as foreground) vs Type=notify (daemon sends sd_notify(READY=1) after listeners are up) — decide during skeleton PR. notify gives cleaner ordering for After= dependencies.

Sequencing

Phase 1 of 3 → Phase 2 (#63, release artifacts) → Phase 3 (#64, installer rewrite).

Metadata

Metadata

Labels

epicLarge work item spanning multiple PRs / tasksmigrationPlatform / stack migration workvaultRune-Vault related

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions