Restricts Apollo's /services/* endpoints so that only known Lightning instances
can call them, and makes Apollo use its own per-client Anthropic API key for each
request rather than trusting anything the caller sends.
This is server-layer code: the runtime auth hook, the shared hash, and the internal-call
token live here under platform/src/auth/; the operator tooling sits alongside in
platform/src/auth/client/ (the client CLI). The lightning_clients table is
created and kept current by the migration runner (platform/src/db/migrate.ts,
migrations under platform/migrations/).
-
The credential is the
api_keythe caller already sends in the request body — the same field Lightning sends today. There is no bearer token, noAuthorizationheader, and no change required on the Lightning side. -
A single Postgres table,
lightning_clients, is the allow-list. Each row has aname, the SHA-256 hash of that client'sapi_key(never the plaintext), and an optionalanthropic_api_key. -
On every
/services/*request the server readsapi_keyfrom the body, hashes it, and looks for a matching row. The inboundapi_keyis treated purely as a credential and is never forwarded to the LLM on a known match. -
On a match it is replaced with the client's stored
anthropic_api_key, so all LLM usage for that request bills to the key Apollo controls. A known client must have a stored key: aNULLanthropic_api_keyis a server-side misconfiguration, so such a request is rejected with500(reported to Sentry), never silently billed to the global key. The caller's key never passes through to the LLM. -
Performance: lookups are cached per client on a ~60s TTL with single-flight, stale-while-revalidate refresh, so the database is queried at most once per minute per process per token, never on the per-request path to Anthropic. The per-request cost is a hash plus a map lookup.
-
Known-client-only: the auth hook is always active. A request that carries an
api_keymust resolve to a known Lightning client or it is rejected, whatever the key's shape (sk-ant-or not). The rejection splits on whose fault the failure is:401when the lookup completes and confirms no such client (a verified unknown key, which must not reach the LLM).503when the client store can't be reached (DB never came up, or the read threw). We can't verify the caller, so we don't guess: this is our outage and is retryable, never a misleading401.500when the lookup finds the client but its storedanthropic_api_keyisNULL. A recognised client with no key is a server-side misconfiguration, reported to Sentry, not a caller error.
A request with no
api_keyat all is served by the globalANTHROPIC_API_KEYwhen one is configured (the field is simply dropped); when no global key is configured there is nothing to serve it, so it is rejected with401. -
The health endpoints (
/livez,/status,/) sit outside/services/*and are never subject to the auth hook. Internal Apollo-to-Apolloapollo()calls are exempt via a per-process internal token (APOLLO_INTERNAL_TOKEN), not by network position.
The lightning_clients table is reached via APOLLO_CLIENTS_DB_URL, which falls
back to POSTGRES_URL when it isn't set. The TS auth code, the migration runner
(bun run migrate), and the client CLI all resolve the URL the same way, so they
always agree on which database they're touching.
- Local dev: set only
POSTGRES_URL. The clients table, the auth code, and the Python docs services all share that one database, exactly as before this var existed. You don't need to set a second URL to get started. - Production: point
APOLLO_CLIENTS_DB_URLat a separate database (its own least-privilege user) so the per-client credentials (including the encrypted Anthropic keys) don't co-locate with the docs data onPOSTGRES_URL. This is the advisable setup for any deployment holding real client secrets: a leak or a loose grant on the docs DB then doesn't expose the credentials table, and the clients DB can be locked down independently.
The Python docs services (adaptor_function_docs) always use POSTGRES_URL and are
unaffected by the split. One caveat to keep in mind: with the two URLs pointing at
different databases, the TS side (clients) and Python side (docs) genuinely live
apart, so when you run a migration or register a client, make sure
APOLLO_CLIENTS_DB_URL resolves to the database you mean. On startup Apollo logs
which one it opened (clients DB: using APOLLO_CLIENTS_DB_URL /
...falling back to POSTGRES_URL).
bun run client is the canonical way to manage Lightning clients. It carries four
subcommands — add / rotate / encrypt / verify. Run them from the repo root
so Bun loads .env (APOLLO_ENC_KEY, and APOLLO_CLIENTS_DB_URL or POSTGRES_URL). The Anthropic key is read
from stdin (a pipe or an interactive prompt), never from argv, so it never
lands in shell history or ps; the client name is a positional argument.
-
Bring the schema up to date. The migration runner does this automatically at Apollo startup when a clients DB URL is set, so usually no step is needed. To run it on its own (e.g. before provisioning against a fresh DB):
bun run migrate
This applies only the platform/auth schema (
lightning_clients,_migrations). The Python services own and self-initialise their own table (adaptor_function_docs), sobun run migratedoes not and should not touch it. -
Set a master encryption key in
.env(once) — the CLI uses it to encrypt each client's Anthropic key at rest:echo "APOLLO_ENC_KEY=$(openssl rand -base64 32)" >> .env
-
Add the client with a name and the Anthropic key Apollo should use for it (key on stdin; needs a clients DB URL set too, since it writes the row itself):
echo "$KEY" | bun run client add acme # or pull the key from a secret without it touching the shell: cat /run/secrets/anthropic | bun run client add acme
This writes the row to
lightning_clientsand prints only theapi_keyto give the Lightning instance. No SQL to run by hand. Re-runningaddfor an existing name fails with a "userotate" message rather than a raw constraint error. -
The client is active as soon as its row is in the table — there is no flag to set or restart needed. The startup log shows
Apollo instance auth: lightning_clients lookup ready.once the DB is reachable. (If the table is missing or the DB is down, the log warns and callers with anapi_keyget a retryable503— we can't verify them, so known-client swaps just won't resolve until the DB is back.) -
Give the printed
api_keyto the Lightning instance. It keeps sending it asapi_keyexactly as it does today — no other Lightning-side change.
-
Rotate the Anthropic key (keeping the same
api_key/credential, so the Lightning side needs no re-credentialling):echo "$NEWKEY" | bun run client rotate acme
-
Verify that a client's stored key resolves under the current
APOLLO_ENC_KEY— reportsdecrypts/plaintext/DECRYPT_FAILED, and exits non-zero on failure. ANULLstored key is an invalid (keyless) client row, not a usable state:bun run client verify acme
-
Revoke:
DELETE FROM lightning_clients WHERE name = '...';directly in the DB. Changes are picked up within ~60s (the server caches each client briefly); restart Apollo to apply a revocation immediately.
bun run client encrypt prints the enc:v1:… value for the key on stdin and makes
no DB write. Useful for manual SQL / row-seeding when you need to write the row
yourself. Every client row needs a non-NULL anthropic_api_key: a recognised
client with no stored key is a misconfiguration that the auth hook rejects with
500. Pair the printed value with an auth_token_hash you compute yourself:
echo "$KEY" | bun run client encryptanthropic_api_key is stored encrypted (AES-256-GCM) when written via the client
CLI (add/rotate/encrypt); plaintext rows are still accepted for backward
compatibility.
- Fail closed. If an
enc:v1:row can't be decrypted (wrong/missingAPOLLO_ENC_KEYor corrupt value), that client is dropped from the allow-list and its requests get401— Apollo never falls back to the global key for an encrypted-but-undecryptable row. A recognised client must have a usable stored key; aNULLanthropic_api_keyis a misconfiguration that yields500. - Rotation is manual: re-encrypt every
enc:v1:row with the new key, then swapAPOLLO_ENC_KEYand restart. - What it protects. The ciphertext is useless without
APOLLO_ENC_KEY, so this guards DB dumps, backups, read replicas, and accidentalSELECTs in logs. It does not protect a full Apollo host/process compromise: the running process necessarily holds both the key and the decrypted values in memory. Protect the table at rest (restricted access, DB encryption) regardless.
The clients' api_key credentials are only ever stored and compared as hashes.