This cookbook is the F1 deployment authority delivered by Epic #11720. It describes the current Agent OS deployment baseline, the D0-decided target topology, and the post-MVP residual map that remains outside the shipped MVP.
This is not the day-0 tutorial. The executable first-run path lives in Day-0 Cloud Deployment Tutorial. For the older shared-KB/MC background and threat model, see Shared Deployment MVP. For the cloud topology decision record, see ADR 0014.
The current reference compose file in ai/deploy/ is a
profile-structured Agent OS stack. The default profile starts the MCP baseline:
chroma, kb-server, and mc-server. The cloud profile adds the
cloud-safe orchestrator; the ingress profile adds the Caddy reverse proxy;
the optional local-model profile adds a self-hosted OpenAI-compatible provider
runtime without changing the default external-provider posture.
| Service / profile | Current baseline | D0 target |
|---|---|---|
| default profile | chroma, kb-server, and mc-server; all three declare per-service deploy.resources.limits and Docker readiness gates. Chroma uses a TCP probe; KB and MC use an MCP /mcp healthcheck tool call. |
Keep as the baseline MCP stack: Chroma as the unified vector-store primitive and KB/MC as separate request-serving MCP containers with production readiness semantics. |
cloud profile |
Adds the orchestrator service with NEO_AI_DEPLOYMENT_MODE=cloud, shared SQLite volume access, its own resource envelope, and startup gated on healthy KB/MC services. |
Keep as the Agent OS maintenance control-plane container, running only the cloud-safe scheduler lanes from ADR 0014 after the MCP substrate is ready. |
ingress profile |
Adds the Caddy reverse proxy for TLS termination and public /kb/* / /mc/* path routing while KB and MC remain internal-only via expose. |
Keep as the public boundary for auth/header stripping and MCP URL routing. |
local-model profile |
Adds a disabled-by-default local-model service using the configurable NEO_LOCAL_MODEL_IMAGE image, persistent model volume, healthcheck, and resource envelope. |
Optional self-hosted provider variant. Operators must opt KB/MC/orchestrator consumers into openAiCompatible; external provider endpoints remain the MVP default. |
The service boundary is intentional: KB and MC serve MCP requests, Chroma stores vectors, and the orchestrator owns background Agent OS maintenance. Do not collapse them into a mono-container unless a later ADR explicitly changes the resource-isolation model.
ADR 0014 classifies every orchestrator scheduler lane before the orchestrator is placed into a cloud container:
| Lane set | Cloud profile behavior |
|---|---|
summary, backup, dream, golden-path |
Cloud-deployable maintenance lanes. They need reachable model/provider and storage substrates, but no local maintainer checkout. |
bridgeDaemon, mlx, kbSync, primary-dev-sync |
Local-only lanes. They must be disabled in a tenant cloud deployment. |
chroma |
Shared primitive. Compose or the platform owns the Chroma process in cloud; the orchestrator does not supervise it. |
Sub A (#11722) delivered the config-level deployment-mode surface. A cloud
orchestrator profile sets NEO_AI_DEPLOYMENT_MODE=cloud; the config resolver
then disables local-only lanes unless an operator explicitly opts a narrower
lane back in. The explicit env overrides are:
| Env var | Cloud default intent |
|---|---|
NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_ENABLED=false |
Prevents git fetch / git pull, worktree discovery, .sync-metadata.json resets, and local KB-sync cascades. |
NEO_ORCHESTRATOR_KB_SYNC_ENABLED=false |
Prevents the local Neo checkout full-corpus ai:sync-kb loop. Tenant KB content arrives through push/bulk ingestion instead. |
NEO_ORCHESTRATOR_BRIDGE_DAEMON_ENABLED=false |
Prevents desktop wake delivery through osascript / tmux. A2A message storage remains Memory Core behavior. |
NEO_ORCHESTRATOR_GOLDEN_PATH_REPO_ENRICHMENT_ENABLED=false |
Keeps tenant deployments from emitting Neo-maintainer repo backlog/PR enrichment sections. |
NEO_ORCHESTRATOR_MLX_ENABLED=false |
Keeps Apple-Silicon local inference out of the cloud profile; the local-model profile is a separate provider service, not an orchestrator child process. |
NEO_MAILBOX_DEFAULT_REPLY_POLICY=blocked |
Keeps cloud A2A message writes tenant-bound through the Memory Core CAN_REPLY_TO / reachable-counterparty policy; local wake delivery remains disabled by the orchestrator bridge toggle. |
Sub D (#11725) owns the CI-safe negative proof that the cloud profile cannot run
the forbidden local-only behavior. The current unit substrate already asserts
that cloud-mode default resolution disables primary-dev-sync, kbSync,
bridgeDaemon, and Golden Path repo enrichment unless an operator explicitly
opts a lane back in.
Use per-service containers. Sub B (#11723) delivered the
profile-structured compose baseline: default MCP stack, cloud orchestrator
profile, ingress/local-model profile slots, and per-service resource limits.
Sub C (#11724) added the reference ingress and redeploy-safe backup volume
wiring. Sub D (#11725) delivered KB/MC container readiness semantics and gates
the cloud orchestrator on those MCP server healthchecks. Post-MVP residual
#11734 turns the local-model slot into an opt-in service profile while
preserving the external-provider default.
Required production-profile properties:
Delivered by Sub B:
- Dedicated containers for
chroma,kb-server,mc-server, and cloud-safeorchestrator. - Resource envelopes for each declared service.
- Default,
cloud,ingress, andlocal-modelcompose profile structure.
Delivered by Sub C / Sub D:
- Reverse proxy / TLS ingress and public MCP URL wiring.
- Volumes for backup bundles that survive container rebuilds.
- Healthcheck/readiness semantics for KB and MC, plus cloud-orchestrator startup gating on healthy MCP server containers.
Still owned by follow-up deployment hardening:
- Optional platform variants for Kubernetes, managed Chroma, managed SQL, and external model providers without changing the logical service model.
The reverse proxy is the public security boundary. It terminates TLS, enforces OAuth/OIDC or equivalent identity, strips spoofable client identity headers, and injects the trusted identity headers consumed by the MCP servers.
The current internal compose ports are:
| Service | Internal port |
|---|---|
kb-server |
3000 |
mc-server |
3001 |
Sub C (#11724) owns the production ingress wiring. A path-routed deployment can
publish /kb/* and /mc/* on one hostname, or the operator can use separate
hostnames. In either shape, set each server's NEO_PUBLIC_URL to the canonical
public MCP URL that agents will use.
Header rule: the proxy must remove any incoming X-PREFERRED-USERNAME or
X-AUTH-REQUEST-PREFERRED-USERNAME header before injecting its own verified
value. With proxy-auth mode enabled, set NEO_AUTH_TRUST_PROXY_IDENTITY=true.
For direct OIDC mode, configure the issuer/client values instead of trusting the
proxy header path.
The deployment substrates have different recovery properties:
- Chroma data is shared by KB and MC but collection-scoped by substrate.
- KB content is a cache/index over Neo's curated corpus plus tenant-pushed repo content. A KB wipe is recoverable by re-sync/re-push, but the operational cost scales with tenant count.
- Memory Core graph/session data is a primary store. A wipe between backups is data loss.
- Backup bundles persist via a host bind-mount (
./.neo-ai-data/backupson thecloud-profile orchestrator) that survives container rebuilds; off-site copy or a managed-object-storage target is the disaster-recovery layer above that.
The orchestrator consumes model-provider endpoints for summary, dream, and
similar lanes. External provider endpoints are the MVP default. A self-hosted
provider container is a profile variant and should not be coupled to the
orchestrator container.
The local-model profile starts an internal OpenAI-compatible provider runtime
on local-model:11434. It is inactive unless the operator explicitly includes
--profile local-model; merely running the default or cloud profiles does not
switch KB, MC, or the orchestrator away from external providers.
Use this profile in two phases so model provisioning failures are easy to separate from Agent OS startup failures:
NEO_LOCAL_MODEL_KEEP_ALIVE=-1 \
NEO_LOCAL_MODEL_CONTEXT_LENGTH=262144 \
NEO_LOCAL_MODEL_MEMORY_LIMIT=32g \
docker compose -f ai/deploy/docker-compose.yml --profile local-model up -d local-model
docker compose -f ai/deploy/docker-compose.yml --profile local-model exec local-model ollama pull <chat-model>
docker compose -f ai/deploy/docker-compose.yml --profile local-model exec local-model ollama pull <embedding-model>Then start the Agent OS stack with explicit provider env:
NEO_MODEL_PROVIDER=openAiCompatible \
NEO_EMBEDDING_PROVIDER=openAiCompatible \
NEO_OPENAI_COMPATIBLE_HOST=http://local-model:11434 \
NEO_OPENAI_COMPATIBLE_MODEL=<chat-model> \
NEO_OPENAI_COMPATIBLE_EMBEDDING_MODEL=<embedding-model> \
NEO_OPENAI_COMPATIBLE_KEEP_ALIVE=-1 \
NEO_OPENAI_COMPATIBLE_REQUIRE_PARALLEL_MODELS=2 \
docker compose -f ai/deploy/docker-compose.yml --profile cloud --profile local-model up --buildThe requireParallelModels setting is Neo's observable contract for local
providers: the chat and embedding model must fit resident at the same time. The
orchestrator warns when the configured graph provider only reports one loaded
model or is missing either configured model name. For native Ollama deployments,
set the provider process environment to at least the same count:
OLLAMA_KEEP_ALIVE=-1 \
OLLAMA_CONTEXT_LENGTH=262144 \
OLLAMA_MAX_LOADED_MODELS=2 \
ollama serveFor OpenAI-compatible providers, use the provider's own loaded-model cap or
pre-load setting (for example LM Studio's loaded-model/JIT setting) and verify
GET /v1/models lists both configured model ids before running Sandman.
The expected failure signatures are:
- Missing model: provider calls fail with model-not-found / pull-required errors
while the
local-modelcontainer healthcheck can still be healthy. - Provider not started or not on the compose network: KB/MC/orchestrator provider
calls fail to connect to
local-model:11434. - Resource pressure: the
local-modelcontainer restarts or fails its healthcheck; tuneNEO_LOCAL_MODEL_MEMORY_LIMIT,NEO_LOCAL_MODEL_CPU_LIMIT, or the chosen model before treating Agent OS services as faulty.
Supply these values per service/profile as needed:
| Variable | Target | Purpose |
|---|---|---|
NEO_TRANSPORT=sse |
KB, MC | HTTP/SSE transport for deployed MCP servers. |
MCP_HTTP_PORT |
KB, MC | Internal listener port. Current baseline: KB 3000, MC 3001. |
NEO_PUBLIC_URL |
KB, MC | Canonical public MCP URL used for advertised endpoints and auth callbacks. |
NEO_CHROMA_HOST |
KB, MC, Orchestrator | Internal Chroma host, for example chroma. |
NEO_CHROMA_PORT |
KB, MC, Orchestrator | Chroma port, normally 8000. |
NEO_MEMORY_DB_PATH |
KB, MC, Orchestrator | Shared SQLite graph path or mounted graph-store path. |
NEO_AUTH_TRUST_PROXY_IDENTITY=true |
KB, MC | Enables the trusted reverse-proxy identity-header path. |
NEO_AUTH_ISSUER_URL, NEO_OAUTH_CLIENT_ID, NEO_OAUTH_CLIENT_SECRET |
KB, MC | Direct OIDC/OAuth mode inputs when the MCP server handles auth instead of a trusted proxy. |
NEO_MAILBOX_DEFAULT_REPLY_POLICY=blocked |
MC | Enables the strict A2A reply policy for multi-tenant deployments. |
NEO_AI_DEPLOYMENT_MODE=cloud |
Orchestrator | Selects the cloud maintenance profile. |
NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_ENABLED=false |
Orchestrator | Disables local maintainer checkout sync. |
NEO_ORCHESTRATOR_KB_SYNC_ENABLED=false |
Orchestrator | Disables local full-corpus KB sync. |
NEO_ORCHESTRATOR_BRIDGE_DAEMON_ENABLED=false |
Orchestrator | Disables desktop wake delivery. |
NEO_ORCHESTRATOR_GOLDEN_PATH_REPO_ENRICHMENT_ENABLED=false |
Orchestrator | Disables Neo-maintainer repo enrichment sections. |
NEO_ORCHESTRATOR_MLX_ENABLED=false |
Orchestrator | Keeps local MLX supervision disabled. |
NEO_ORCHESTRATOR_TENANT_REPO_SYNC_ENABLED |
Orchestrator | Master toggle for the tenant-repo-sync pull lane. Cloud profile default-on when tenantRepos[] is configured; local profile default-off. Set explicitly to override the deployment-profile default. |
NEO_ORCHESTRATOR_TENANT_REPO_SYNC_INTERVAL_MS |
Orchestrator | Sweep cadence for the periodic pull lane. Default 1800000 (30 min). |
NEO_TENANT_REPO_MIRROR_ROOT |
Orchestrator | Parent directory under which the GitMirror primitive stores tenant-repos/<tenant>/<repo> mirrors. Per-repo tenantRepos[].mirrorRoot overrides this Tier-1 default. Canonical compose value /app/.neo-ai-data. |
NEO_MODEL_PROVIDER=openAiCompatible |
MC, Orchestrator | Optional local-model profile opt-in for summary/dream/model-consumer lanes. Leave unset for external-provider defaults. |
NEO_EMBEDDING_PROVIDER=openAiCompatible |
KB, MC | Optional local-model profile opt-in for server-side embedding generation. Leave unset when using an external embedding provider endpoint. |
NEO_OPENAI_COMPATIBLE_HOST=http://local-model:11434 |
KB, MC, Orchestrator | Internal compose-network URL for the local-model service when the optional profile is enabled. |
NEO_OPENAI_COMPATIBLE_MODEL, NEO_OPENAI_COMPATIBLE_EMBEDDING_MODEL |
KB, MC, Orchestrator | Chat and embedding model names already present in the local-model runtime. |
NEO_OPENAI_COMPATIBLE_API_KEY |
KB, MC, Orchestrator | Optional bearer token for OpenAI-compatible providers that require one; normally empty for the local compose service. |
NEO_OLLAMA_KEEP_ALIVE, NEO_OPENAI_COMPATIBLE_KEEP_ALIVE |
KB, MC, Orchestrator | Provider request keep-alive override. Default -1 keeps the selected local model resident unless the operator explicitly pins a shorter retention window or 0 unload control. |
NEO_OLLAMA_REQUIRE_PARALLEL_MODELS, NEO_OPENAI_COMPATIBLE_REQUIRE_PARALLEL_MODELS |
KB, MC, Orchestrator | Local-provider residency expectation for chat + embedding coexistence. Default 2; the provider-readiness check warns if the selected graph provider cannot observe that count and both configured model names. |
NEO_LOCAL_MODEL_IMAGE, NEO_LOCAL_MODEL_KEEP_ALIVE, NEO_LOCAL_MODEL_CONTEXT_LENGTH, NEO_LOCAL_MODEL_MEMORY_LIMIT, NEO_LOCAL_MODEL_CPU_LIMIT |
local-model |
Optional image/runtime/resource overrides for the self-hosted provider container. Defaults keep the model resident (-1), request 262144 context, and reserve a 32g memory envelope for dual-resident chat + embedding Gemma-class local deployments unless the operator explicitly tunes down. |
NEO_AUTO_SYNC=false |
KB | Prevents one-shot local KB sync during server startup. |
NEO_KB_AUTO_START_DATABASE=false |
KB | Prevents the KB server from starting a local Chroma process. |
NEO_MEM_AUTO_START_DATABASE=false |
MC | Prevents the MC server from starting a local Chroma process. |
NEO_MEM_AUTO_START_INFERENCE=false |
MC | Prevents the MC server from starting local inference. |
NEO_AUTO_SUMMARIZE, NEO_AUTO_DREAM, NEO_AUTO_GOLDEN_PATH, NEO_REAL_TIME_MEMORY_PARSING, NEO_AUTO_INGEST_FS |
MC | Local/server startup toggles; leave disabled unless the deployment owns those daemon behaviors explicitly. |
NEO_MCP_HEALTHCHECK_URL |
Healthcheck CLI | Optional override for npm run ai:mcp-healthcheck; compose passes explicit internal URLs instead. |
NEO_MCP_HEALTHCHECK_IDENTITY |
Healthcheck CLI | Trusted proxy identity value used by the MCP healthcheck probe when proxy-header auth is enabled. |
NEO_MCP_HEALTHCHECK_TOKEN_ENV / NEO_MCP_HEALTHCHECK_TOKEN |
Healthcheck CLI | Optional bearer-token slot for direct OIDC/OAuth protected MCP healthchecks. |
The top-level AI config template is ai/config.template.mjs.
The cloud-ingestion tenant config guide is
Configuration.
This section is for Neo maintainer machines only. It is not part of a tenant cloud deployment.
The local orchestrator can sync multiple local Neo checkouts through the
primary-dev-sync lane without committing machine-specific paths. Precedence is:
NEO_ORCHESTRATOR_DEV_SYNC_ROOTSai/config.mjsorchestrator.devSyncRoots- unset single owning-checkout behavior
For a durable local setup, edit the gitignored ai/config.mjs file. On a
fresh clone, npm run prepare auto-creates ai/config.mjs as a copy of
ai/config.template.mjs, so it always exists after install — edit the
relevant block (e.g. orchestrator.devSyncRoots) and leave the rest at
template defaults:
// In your local ai/config.mjs (gitignored; copy of ai/config.template.mjs):
const defaultConfig = {
// ... preserved template defaults ...
orchestrator: {
// ... preserved fields ...
devSyncRoots: [
'/absolute/path/to/neo-gpt/neo',
'/absolute/path/to/neo-gemini/neo',
'/absolute/path/to/neo-opus/neo'
]
}
};If the template evolves and your local file falls out of shape-parity, the
prepare script warns with the missing imports/exports. Run
npm run prepare -- --migrate-config to refresh ai/config.mjs from the
template (drops local edits — re-apply them afterward).
Then start the existing local orchestrator command:
npm run ai:orchestratorFor one-off process-manager overrides, keep using the env var:
NEO_ORCHESTRATOR_DEV_SYNC_ROOTS='["/absolute/path/to/neo-gpt/neo","/absolute/path/to/neo-gemini/neo","/absolute/path/to/neo-opus/neo"]' npm run ai:orchestratorDo not add real local clone paths to package.json or
ai/config.template.mjs; the template default remains
orchestrator.devSyncRoots: [].
Deployed proof uses MCP tool calls, not a direct HTTP /healthcheck route. The
production compose file runs node ./buildScripts/ai/mcpHealthcheck.mjs inside
the KB and MC containers. That script opens a StreamableHTTP client against the
local /mcp endpoint, calls the existing healthcheck tool, and exits non-zero
unless the payload reports the expected status. Operators can run the same probe
manually with npm run ai:mcp-healthcheck.
The Docker readiness contract is:
- Chroma is ready when its TCP listener is reachable.
- KB is ready when
healthchecksucceeds overhttp://127.0.0.1:3000/mcp. - MC is ready when
healthchecksucceeds overhttp://127.0.0.1:3001/mcp. - The
cloudorchestrator profile starts only after Chroma, KB, and MC are healthy via composecondition: service_healthy.
For public deployed proof, call each server's healthcheck tool over its /mcp
endpoint through the same public URL and auth path used by real agents.
Operator verification anchors:
identity.source === "proxy-header"confirms the reverse proxy is injecting trusted identity headers and the server is reading them.database.topology.mode === "unified"confirms the shared Chroma topology.- Provider fields confirm the selected embedding/summary provider profile.
- The Memory Core healthcheck remains the schema authority for MC provider/auth details; see Memory Core.
For the local Dockerized fixture, run npm run test-integration-unified. The
integration harness builds ai/deploy/docker-compose.test.yml, waits for
Chroma, KB, and MC readiness, then calls the KB and MC healthcheck tools over
/mcp.
Sub D (#11725) extended the proof to the cloud-safe orchestrator profile and
negative local-only behavior assertions.
Tenant KB content enters through the cloud-native ingestion facades, not through
the local kbSync scheduler lane. Use the
Cloud-Native KB Ingestion guide tree for:
- per-tenant identity and visibility rules;
ingest_source_filesand bulk CLI hook wiring;- custom parser/source registration;
- tenant config persistence.
Runnable ingestion examples live in
examples/cloud-deployment/. They are
ingestion-contract demonstrations, not production deployment profiles.
The linear first-run operator path is
Day-0 Cloud Deployment Tutorial.
For deployments that prefer the deployment to refresh tenant content autonomously
(instead of waiting on a tenant push hook), the additive pull mode is documented
in Server-Side Pull Mode.
The cloud profile MUST add a tenant-repo-mirrors named volume mounted at
<NEO_TENANT_REPO_MIRROR_ROOT>/tenant-repos (canonical compose: /app/.neo-ai-data/tenant-repos)
so the GitMirror primitive has a persistent clone target. The env-bound Tier-1 default
NEO_TENANT_REPO_MIRROR_ROOT=/app/.neo-ai-data names the parent of tenant-repos/;
the helper deriveTenantRepoMirrorPath appends the tenant-repos/<tenant>/<repo> segment. Per-repo lastIngestedRev
persistence lives in the orchestrator state dir
(<NEO_AI_ORCHESTRATOR_DIR>/tenant-repo-sync-revisions.json), so it survives a
container restart alongside the rest of the orchestrator state. The mirror cache
itself is reproducible from upstream git — backup is optional, not load-bearing.
The tenant-repo-sync lane is gated by NEO_ORCHESTRATOR_TENANT_REPO_SYNC_ENABLED
(cloud profile default-on; local maintainer profile default-off) and runs on a
30-minute default cadence (NEO_ORCHESTRATOR_TENANT_REPO_SYNC_INTERVAL_MS).
Manual sweep: node ./ai/scripts/maintenance/syncTenantRepos.mjs.
The #11720 deployment-readiness MVP is complete. The items below are retained as traceability anchors, not active implementation gaps:
- #11723 - Sub B production container topology.
- #11724 - Sub C reference compose/profile, ingress, persistence, and provider wiring.
- #11725 - Sub D healthcheck, journey proof, and negative cloud-profile assertions.
- #11728 - Sub F2 day-0 tutorial plus Docker-capable fresh-run validation.
Post-MVP residual work is tracked separately under:
- #11730 - Post-MVP residual architecture after the #11720 closeout.
- #11731 - server-side repo-clone ingestion exploration, only if push-based tenant ingestion proves insufficient.
- #11732 - graph-store evolution beyond the SQLite + mounted-volume MVP baseline. Resolved by ADR 0015: keep SQLite + WAL for now; reopen networked SQL only on graph-specific multi-writer, scale, HA, or storage-platform evidence.
- #11733 - downstream external deployment-pipeline wiring. Delivered post-MVP — see Downstream Pipeline Wiring.
- #11734 - optional local-model runtime container profile.
- #11735 - tenant-source inventory deepening beyond the day-0 proof path.
- #11736 - broader deployment guide/security hardening outside the F1 MVP cleanup; see Security for the first post-MVP hardening ledger.
Related boundary item closed separately:
- #11719 / PR #11748 - Separate narrow Markdown table rendering fix for the old Section 6. This rewrite removes the old table from the cookbook and intentionally does not use #11719 as a close target.
Completed baseline inputs for this cookbook: