Is your feature request related to a problem? Please describe.
The project is deployable today through Docker Compose only. That works for local and single-node demos, but it does not provide a Kubernetes-native deployment path for teams that need declarative releases, configurable persistence, ingress-based routing, secret management, and optional GPU-backed local NVIDIA NIM services.
The current deployment model is split across:
docker-compose.yml for the app stack: nginx, ui, merchant, psp, apps-sdk, 4 NAT agents, and a one-shot milvus-seeder
docker-compose.infra.yml for milvus-etcd, milvus-minio, milvus-standalone, and phoenix
docker-compose-nim.yml for optional self-hosted nemotron-nano and embedqa
This leaves Kubernetes users without a supported way to deploy the system end-to-end while preserving the existing ACP/UCP flows, Apps SDK routing, Milvus-backed recommendation/search path, Phoenix observability, and optional in-cluster NIM inference.
Describe the solution you'd like
Add Helm charts and Kubernetes deployment documentation that can deploy the platform end-to-end.
Scope:
- Core app services:
ui, merchant, psp, apps-sdk
- NAT agent services:
promotion-agent, post-purchase-agent, recommendation-agent, search-agent
- Supporting services:
milvus, etcd, minio, phoenix
- One-shot Milvus initialization job
- Ingress/service routing that preserves current external behavior now handled by
nginx
- Optional self-hosted NVIDIA NIM services:
nemotron-nano, embedqa
Implementation expectations:
- Add a top-level Helm chart under
deploy/helm/retail-agentic-commerce/ with values-driven enablement of optional components
- Use Kubernetes
Ingress plus Service resources instead of deploying the current standalone nginx container
- Preserve the current route behavior:
/ -> ui
/api/webhooks/, /api/agents/, /api/proxy/ -> ui
/api/ -> merchant
/psp/ -> psp
/apps-sdk/ -> apps-sdk
- Preserve current service-to-service environment wiring from Compose:
merchant -> promotion/post-purchase/recommendation agents
apps-sdk -> merchant/psp/recommendation/search
ui -> merchant/psp/phoenix and UCP profile URL
- agents -> phoenix/milvus/NIM endpoints
- Replace Compose health checks with Kubernetes
startupProbe, readinessProbe, and livenessProbe
- Convert
milvus-seeder into a Kubernetes Job or Helm hook that runs after Milvus is reachable
- Ship values files for:
- default mode using public NVIDIA API endpoints
- optional self-hosted NIM mode using in-cluster
nemotron-nano and embedqa
- Document secrets/config needed for:
NVIDIA_API_KEY
NGC_API_KEY
MERCHANT_API_KEY
PSP_API_KEY
WEBHOOK_SECRET
- NIM base/model overrides
Persistence requirements:
- Model the current persistent state explicitly:
- shared SQLite data currently mounted as
acp-data
- Milvus data
- MinIO data
- etcd data
- Phoenix working directory
- Initial Kubernetes support may assume a storage class that can satisfy the current shared SQLite requirement. If that is not feasible in a target cluster, document it as a deployment prerequisite or limitation in this issue rather than silently redesigning persistence.
Deliverables:
- Helm chart(s) committed to the repo
- Values files for public-NIM and self-hosted-NIM deployment modes
- Kubernetes deployment docs replacing or complementing
deploy/docker-deployment.md
- Validation steps with exact
helm and kubectl commands
Acceptance criteria:
helm template succeeds for default and NIM-enabled values
helm install deploys the core stack successfully on a local Kubernetes target for non-GPU mode
- UI is reachable through ingress and can talk to backend services through the expected routes
- Merchant, PSP, Apps SDK, and all 4 NAT agents expose healthy pods/services
- Recommendation and search agents can reach Milvus
- Phoenix is reachable
- Milvus seeding completes automatically
- Public NIM mode works without in-cluster GPU services
- Self-hosted NIM mode can be enabled through values and includes GPU scheduling/resource configuration for
nemotron-nano and embedqa
- Docs include install, upgrade, rollback, and verification commands
Suggested verification commands for the eventual implementation:
helm template deploy/helm/retail-agentic-commerce -f deploy/helm/retail-agentic-commerce/values.yaml
helm template deploy/helm/retail-agentic-commerce -f deploy/helm/retail-agentic-commerce/values-nim.yaml
kubectl get pods,svc,ingress
kubectl logs job/<milvus-seeder-job>
- smoke checks for
/, /api/health, /psp/health, /apps-sdk/health
Describe alternatives you've considered
- Keep Docker Compose as the only supported deployment path: simple, but not suitable for Kubernetes environments
- Use raw Kubernetes manifests instead of Helm: possible, but harder to maintain and configure across environments
- Split this immediately into multiple issues: likely useful later, but a single implementation issue is a better first step as long as scope stays bounded and explicit
Additional context
Relevant repo files and behavior the implementation should mirror:
deploy/docker-deployment.md
deploy/local-development.md
docs/architecture.md
docker-compose.yml
docker-compose.infra.yml
docker-compose-nim.yml
nginx.conf
src/merchant/Dockerfile
src/payment/Dockerfile
src/apps_sdk/Dockerfile
src/ui/Dockerfile
src/agents/Dockerfile
Out of scope:
- Changing ACP/UCP protocol semantics
- Re-architecting service boundaries
- Replacing the current data model as part of this issue
- Production-grade autoscaling or multi-cluster design beyond what is required to run reliably on Kubernetes
Is your feature request related to a problem? Please describe.
The project is deployable today through Docker Compose only. That works for local and single-node demos, but it does not provide a Kubernetes-native deployment path for teams that need declarative releases, configurable persistence, ingress-based routing, secret management, and optional GPU-backed local NVIDIA NIM services.
The current deployment model is split across:
docker-compose.ymlfor the app stack:nginx,ui,merchant,psp,apps-sdk, 4 NAT agents, and a one-shotmilvus-seederdocker-compose.infra.ymlformilvus-etcd,milvus-minio,milvus-standalone, andphoenixdocker-compose-nim.ymlfor optional self-hostednemotron-nanoandembedqaThis leaves Kubernetes users without a supported way to deploy the system end-to-end while preserving the existing ACP/UCP flows, Apps SDK routing, Milvus-backed recommendation/search path, Phoenix observability, and optional in-cluster NIM inference.
Describe the solution you'd like
Add Helm charts and Kubernetes deployment documentation that can deploy the platform end-to-end.
Scope:
ui,merchant,psp,apps-sdkpromotion-agent,post-purchase-agent,recommendation-agent,search-agentmilvus,etcd,minio,phoenixnginxnemotron-nano,embedqaImplementation expectations:
deploy/helm/retail-agentic-commerce/with values-driven enablement of optional componentsIngressplusServiceresources instead of deploying the current standalonenginxcontainer/->ui/api/webhooks/,/api/agents/,/api/proxy/->ui/api/->merchant/psp/->psp/apps-sdk/->apps-sdkmerchant-> promotion/post-purchase/recommendation agentsapps-sdk-> merchant/psp/recommendation/searchui-> merchant/psp/phoenix and UCP profile URLstartupProbe,readinessProbe, andlivenessProbemilvus-seederinto a KubernetesJobor Helm hook that runs after Milvus is reachablenemotron-nanoandembedqaNVIDIA_API_KEYNGC_API_KEYMERCHANT_API_KEYPSP_API_KEYWEBHOOK_SECRETPersistence requirements:
acp-dataDeliverables:
deploy/docker-deployment.mdhelmandkubectlcommandsAcceptance criteria:
helm templatesucceeds for default and NIM-enabled valueshelm installdeploys the core stack successfully on a local Kubernetes target for non-GPU modenemotron-nanoandembedqaSuggested verification commands for the eventual implementation:
helm template deploy/helm/retail-agentic-commerce -f deploy/helm/retail-agentic-commerce/values.yamlhelm template deploy/helm/retail-agentic-commerce -f deploy/helm/retail-agentic-commerce/values-nim.yamlkubectl get pods,svc,ingresskubectl logs job/<milvus-seeder-job>/,/api/health,/psp/health,/apps-sdk/healthDescribe alternatives you've considered
Additional context
Relevant repo files and behavior the implementation should mirror:
deploy/docker-deployment.mddeploy/local-development.mddocs/architecture.mddocker-compose.ymldocker-compose.infra.ymldocker-compose-nim.ymlnginx.confsrc/merchant/Dockerfilesrc/payment/Dockerfilesrc/apps_sdk/Dockerfilesrc/ui/Dockerfilesrc/agents/DockerfileOut of scope: