|
| 1 | +# Multi-Backend Routes on NebariApp |
| 2 | + |
| 3 | +**Status:** Draft **Author:** @viniciusdc |
| 4 | +**Created:** 2026-05-19 |
| 5 | + |
| 6 | +This document proposes extending the `NebariApp` routing contract so a |
| 7 | +single app can expose **multiple path-based routes that target different |
| 8 | +backend services** under one hostname, and locks the boundary that an |
| 9 | +HTTPRoute generated by the operator must only reference services in the |
| 10 | +same namespace as the NebariApp itself. |
| 11 | + |
| 12 | +It tightens the existing `ServiceReference` shape (drops the |
| 13 | +`namespace` field), introduces an optional per-route `backend` override |
| 14 | +on `RouteMatch`, and codifies the "one NebariApp = one hostname" |
| 15 | +constraint that has been implicit so far. |
| 16 | + |
| 17 | +A separate concern — exposing Envoy `BackendTrafficPolicy` knobs for |
| 18 | +streaming/SSE timeouts — is referenced in *Follow-ups* but is not part |
| 19 | +of this design. |
| 20 | + |
| 21 | +## Problem |
| 22 | + |
| 23 | +`NebariApp.spec` today reads like a single hostname pointing at a single |
| 24 | +backend service: |
| 25 | + |
| 26 | +```yaml |
| 27 | +spec: |
| 28 | + hostname: chat.nebari.openteams.ai |
| 29 | + service: |
| 30 | + name: chat-frontend |
| 31 | + port: 80 |
| 32 | + routing: |
| 33 | + routes: |
| 34 | + - pathPrefix: / |
| 35 | +``` |
| 36 | +
|
| 37 | +The `routing.routes[]` list lets users *narrow* which paths reach the |
| 38 | +backend, but every route resolves to the same `spec.service`. There is |
| 39 | +no way to say "`/api/v1/*` goes to the API container's service and |
| 40 | +everything else goes to the frontend's service" within a single NebariApp. |
| 41 | + |
| 42 | +Packs that genuinely have two exposed components (frontend + |
| 43 | +backend, app + admin, UI + API) hit one of two unsatisfying paths: |
| 44 | + |
| 45 | +1. **Two NebariApps, two hostnames.** This is what |
| 46 | + [`nebari-chat-pack`](https://github.com/nebari-dev/nebari-chat-pack) |
| 47 | + does — `chat.…` and `ravnar.…` are two NebariApps because the chart |
| 48 | + has no way to fold both into one. The operator then emits two |
| 49 | + HTTPRoutes, two Certificates, two SecurityPolicies — duplicated |
| 50 | + per-app infrastructure where conceptually it's one app. |
| 51 | +2. **Hand-rolled `HTTPRoute` outside the operator.** Loses everything |
| 52 | + the operator provides (TLS, auth, landing-page integration, status |
| 53 | + surfacing) for the sake of fanning out by path. |
| 54 | + |
| 55 | +Both options exist because the operator doesn't model "one hostname, |
| 56 | +multiple backends" — which is the Gateway API's *native* shape. |
| 57 | +`HTTPRoute.spec.rules[]` already supports per-rule `backendRefs[]`. We |
| 58 | +just don't surface it. |
| 59 | + |
| 60 | +## Goals |
| 61 | + |
| 62 | +- Let a single NebariApp expose multiple path-based routes, each |
| 63 | + optionally pointing at a different backend service in the same |
| 64 | + namespace, under one hostname. |
| 65 | +- Keep `spec.service` working unchanged as the default backend for |
| 66 | + routes that don't override — existing NebariApp manifests continue to |
| 67 | + validate and behave identically. |
| 68 | +- Make the same-namespace boundary an explicit, enforced contract: |
| 69 | + the operator-generated HTTPRoute must only `backendRef` services in |
| 70 | + the NebariApp's own namespace. |
| 71 | +- Document "one NebariApp = one hostname" as an intentional constraint, |
| 72 | + not an accident of the current schema. |
| 73 | + |
| 74 | +## Non-goals |
| 75 | + |
| 76 | +- **Multiple hostnames per NebariApp.** Out of scope for this design. |
| 77 | + If a future use case needs it, that's a separate discussion. The |
| 78 | + current chat-pack split into two NebariApps would *not* be resolved |
| 79 | + by this proposal — that pack's split is a pack-design issue |
| 80 | + (it should either consolidate to one hostname with per-route |
| 81 | + backends, or treat the two pieces as genuinely independent packs). |
| 82 | +- **Cross-namespace backends.** Today's `spec.service.namespace` |
| 83 | + permits this but the operator does not create the `ReferenceGrant` |
| 84 | + the Gateway API requires for it to actually work — so the field has |
| 85 | + been silently incomplete. This proposal *removes* it. Tools that need |
| 86 | + to reach into other namespaces should do so via in-cluster DNS |
| 87 | + (`svc.other-ns.svc.cluster.local`), not via the operator's |
| 88 | + HTTPRoute. |
| 89 | +- **Weights, header/query matchers, request/response filters.** This |
| 90 | + proposal narrows on per-route backend selection. Anything else |
| 91 | + Gateway API offers on `HTTPRouteRule` stays out until a use case |
| 92 | + demands it. |
| 93 | +- **Envoy `BackendTrafficPolicy` (streaming/SSE timeouts).** Tracked as |
| 94 | + a follow-up; see *Follow-ups* below. The shape of that change is |
| 95 | + independent of this one. |
| 96 | + |
| 97 | +## Design principles |
| 98 | + |
| 99 | +Same principles that govern the rest of the NebariApp contract: |
| 100 | + |
| 101 | +1. **No is temporary, yes is forever.** Per-route `backend` is the only |
| 102 | + field this design adds to `RouteMatch`. Nothing speculative. |
| 103 | +2. **Contract independence.** The CRD shape stays expressible in the |
| 104 | + Gateway API's mechanics without leaking Envoy specifics. A |
| 105 | + per-route `backend` maps directly to an `HTTPRouteRule.backendRefs` |
| 106 | + entry. |
| 107 | +3. **Graceful degradation.** A route with no `backend` falls back to |
| 108 | + `spec.service`. A NebariApp with no `routing.routes` keeps current |
| 109 | + behavior (one rule, single backend, `/` prefix). |
| 110 | +4. **Same-namespace by construction.** The CRD has no field that lets |
| 111 | + a user express a cross-namespace backend, so the operator never has |
| 112 | + to validate or refuse one. |
| 113 | + |
| 114 | +## Proposed contract |
| 115 | + |
| 116 | +### `RouteMatch` gains an optional `backend` |
| 117 | + |
| 118 | +```go |
| 119 | +type RouteMatch struct { |
| 120 | + // PathPrefix specifies the path prefix to match for routing. |
| 121 | + PathPrefix string `json:"pathPrefix"` |
| 122 | + |
| 123 | + // PathType specifies how the path should be matched. |
| 124 | + PathType string `json:"pathType,omitempty"` |
| 125 | + |
| 126 | + // Backend optionally overrides the default backend (spec.service) |
| 127 | + // for this route. When omitted, traffic matching this route is |
| 128 | + // sent to spec.service. The referenced Service must exist in the |
| 129 | + // NebariApp's own namespace. |
| 130 | + // +optional |
| 131 | + Backend *ServiceReference `json:"backend,omitempty"` |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +`ServiceReference` is the same struct used at the top level, minus its |
| 136 | +`Namespace` field (see below). |
| 137 | + |
| 138 | +### `ServiceReference.Namespace` is removed |
| 139 | + |
| 140 | +```go |
| 141 | +type ServiceReference struct { |
| 142 | + // Name is the name of the Kubernetes Service in the NebariApp's |
| 143 | + // own namespace. |
| 144 | + // +kubebuilder:validation:Required |
| 145 | + Name string `json:"name"` |
| 146 | + |
| 147 | + // Port is the port number on the Service to route traffic to. |
| 148 | + // +kubebuilder:validation:Required |
| 149 | + Port int32 `json:"port"` |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +The current `Namespace` field is dropped. Reasons: |
| 154 | + |
| 155 | +- The operator-generated HTTPRoute would render a `BackendObjectReference` |
| 156 | + with a foreign `Namespace`, which Gateway API requires a |
| 157 | + `ReferenceGrant` to honor — and the operator does not create one. |
| 158 | + The field has always been a half-feature. |
| 159 | +- The architectural stance is that pack resources live in the pack's |
| 160 | + own namespace. Cross-namespace pod-to-pod talk goes through in-cluster |
| 161 | + DNS, not through an operator-managed public route. |
| 162 | +- All known callers (helm charts under `nebari-dev`) deploy their |
| 163 | + Service into the same namespace as the NebariApp via Argo CD's |
| 164 | + per-application namespace. No internal user is known to rely on the |
| 165 | + field today. |
| 166 | + |
| 167 | +### `spec.service` stays required |
| 168 | + |
| 169 | +Keeping `spec.service` required means: |
| 170 | + |
| 171 | +- Every existing NebariApp manifest continues to validate. |
| 172 | +- Routes without a per-route `backend` have a single, unambiguous |
| 173 | + fallback target — no special-case "I forgot to configure a backend |
| 174 | + anywhere" path to reason about. |
| 175 | +- The simple case (one hostname, one backend, no per-route overrides) |
| 176 | + stays one struct field. |
| 177 | + |
| 178 | +### One hostname per NebariApp — codified |
| 179 | + |
| 180 | +The CRD docstring on `spec.hostname` gains an explicit note: |
| 181 | + |
| 182 | +> Each NebariApp exposes exactly one public hostname. Packs that need |
| 183 | +> multiple hostnames must be split into multiple NebariApps. This is an |
| 184 | +> intentional boundary so a NebariApp's TLS, auth, landing-page card, |
| 185 | +> and routing concerns all scope to a single user-visible URL. |
| 186 | +
|
| 187 | +No schema change — `hostname` is already a single string — but the |
| 188 | +constraint moves from accidental to documented. |
| 189 | + |
| 190 | +## End-to-end example |
| 191 | + |
| 192 | +What chat-pack would look like with this contract, *if* it chose to |
| 193 | +collapse to one hostname: |
| 194 | + |
| 195 | +```yaml |
| 196 | +apiVersion: reconcilers.nebari.dev/v1 |
| 197 | +kind: NebariApp |
| 198 | +metadata: |
| 199 | + name: nebari-chat |
| 200 | + namespace: nebari-chat |
| 201 | +spec: |
| 202 | + hostname: chat.nebari.openteams.ai |
| 203 | + service: |
| 204 | + name: chat-frontend |
| 205 | + port: 80 |
| 206 | + routing: |
| 207 | + routes: |
| 208 | + - pathPrefix: /api |
| 209 | + backend: |
| 210 | + name: ravnar-backend |
| 211 | + port: 8000 |
| 212 | + - pathPrefix: / |
| 213 | + # no backend → falls back to spec.service (chat-frontend:80) |
| 214 | + auth: |
| 215 | + enabled: true |
| 216 | + provider: keycloak |
| 217 | +``` |
| 218 | +
|
| 219 | +This emits a single HTTPRoute with two rules — `/api` → |
| 220 | +`ravnar-backend:8000`, `/` → `chat-frontend:80` — both on hostname |
| 221 | +`chat.nebari.openteams.ai`. One Certificate, one SecurityPolicy, one |
| 222 | +landing-page card. |
| 223 | + |
| 224 | +Whether chat-pack *should* collapse this way is a pack-design choice, |
| 225 | +not a contract issue. The proposal here just unblocks the option. |
| 226 | + |
| 227 | +## Operator changes |
| 228 | + |
| 229 | +Concrete files touched. |
| 230 | + |
| 231 | +### `api/v1/nebariapp_types.go` |
| 232 | + |
| 233 | +- Remove `Namespace` field from `ServiceReference`. |
| 234 | +- Add `Backend *ServiceReference` to `RouteMatch`. |
| 235 | +- Update docstrings on `spec.hostname` and `RoutingConfig` to |
| 236 | + document the one-hostname-per-NebariApp constraint. |
| 237 | +- Tighten the existing comment on `ServiceReference` to "must be in |
| 238 | + the NebariApp's own namespace." |
| 239 | + |
| 240 | +### `internal/controller/reconcilers/core/reconciler.go` |
| 241 | + |
| 242 | +`ValidateService` (around line 120): |
| 243 | + |
| 244 | +- Drop the defaulting branch — `serviceNamespace := nebariApp.Namespace` |
| 245 | + directly. |
| 246 | +- Extend the function (or add a sibling) to validate per-route |
| 247 | + `backend` references: each must resolve to an existing Service in |
| 248 | + the NebariApp's namespace, with the named port exposed. |
| 249 | +- Surface validation failures via the existing condition pattern |
| 250 | + (e.g. a `RouteBackendNotFound` reason on the appropriate condition). |
| 251 | + |
| 252 | +### `internal/controller/reconcilers/routing/httproute.go` |
| 253 | + |
| 254 | +`buildBackendRefs` (around line 260): |
| 255 | + |
| 256 | +- Stop reading `Spec.Service.Namespace`. |
| 257 | +- Drop the cross-namespace branch at lines 273–278 entirely. |
| 258 | +- Refactor to take a `*ServiceReference` (the route's `backend` if |
| 259 | + set, else `spec.service`) and return that route's `HTTPBackendRef`s. |
| 260 | +- HTTPRoute construction emits one `HTTPRouteRule` per `RouteMatch`, |
| 261 | + each with its own `backendRefs` resolved from the route. |
| 262 | + |
| 263 | +### `config/rbac/` |
| 264 | + |
| 265 | +- The `Services` ClusterRole rule can be narrowed from cluster-scoped |
| 266 | + to namespace-scoped reads now that no NebariApp can legally |
| 267 | + reference a Service outside its own namespace. The exact scoping |
| 268 | + belongs in implementation; the design decision is just "tighten." |
| 269 | + |
| 270 | +### Generated artifacts |
| 271 | + |
| 272 | +- `config/crd/bases/reconcilers.nebari.dev_nebariapps.yaml` regenerates. |
| 273 | +- `docs/api-reference.md` regenerates. |
| 274 | +- The CRD diff will show `service.namespace` removed and |
| 275 | + `routing.routes[].backend` added. |
| 276 | + |
| 277 | +## Validation |
| 278 | + |
| 279 | +- `RouteMatch.backend.name` and `RouteMatch.backend.port` are required |
| 280 | + when `backend` is set (no port-only override — see *Open questions*). |
| 281 | +- The CRD itself cannot enforce "Service exists in the NebariApp's |
| 282 | + namespace" — that check stays in the reconciler's `ValidateService` |
| 283 | + pass. |
| 284 | +- The operator should reject a NebariApp where `routing.routes[]` |
| 285 | + contains duplicate `pathPrefix + pathType` tuples, but that's a |
| 286 | + pre-existing concern not introduced by this design. |
| 287 | + |
| 288 | +## Backwards compatibility |
| 289 | + |
| 290 | +- **Existing manifests:** any NebariApp that did not set |
| 291 | + `spec.service.namespace` is wholly unaffected — the field's removal |
| 292 | + is invisible. |
| 293 | +- **Manifests that did set `spec.service.namespace`:** the API server |
| 294 | + will refuse the field on the new CRD. There is no known internal |
| 295 | + user. The release-notes / changelog must call this out explicitly so |
| 296 | + any external user catches it at upgrade time. |
| 297 | +- **The API version stays `v1`.** Field removal would normally argue |
| 298 | + for a version bump, but the project's README explicitly flags the API |
| 299 | + as "may change without notice" during the NIC bring-up phase. Once |
| 300 | + the API is declared stable, this kind of removal should require a |
| 301 | + version bump. |
| 302 | + |
| 303 | +## Migration |
| 304 | + |
| 305 | +For internal callers (the only known callers): |
| 306 | + |
| 307 | +1. None — Argo CD installs each pack into a single namespace, and every |
| 308 | + surveyed chart already omits `spec.service.namespace`. |
| 309 | + |
| 310 | +For any external caller relying on the field: |
| 311 | + |
| 312 | +1. Move the target Service into the NebariApp's namespace (typical |
| 313 | + case), **or** |
| 314 | +2. Keep the Service where it is and have the workload connect to it |
| 315 | + via in-cluster DNS rather than through the NebariApp's HTTPRoute. |
| 316 | + |
| 317 | +## Follow-ups (not in this design) |
| 318 | + |
| 319 | +- **`routing.streaming` / `BackendTrafficPolicy`.** Envoy's default |
| 320 | + 15s request timeout breaks SSE and long-poll. The next iteration |
| 321 | + should add a per-NebariApp boolean (or small struct) on |
| 322 | + `RoutingConfig` that, when set, makes the operator emit a |
| 323 | + `BackendTrafficPolicy` targeting its own HTTPRoute. Independent of |
| 324 | + this design; its only intersection is that the policy targets the |
| 325 | + one HTTPRoute that this design's multi-rule output still produces — |
| 326 | + the same policy covers all rules. |
| 327 | +- **API version bump to a stable channel.** Once the surface settles, |
| 328 | + promoting from `v1` (currently labelled unstable) to a properly |
| 329 | + stable version is its own piece of work. Field removals like the one |
| 330 | + in this design are the last that can happen before that bump. |
| 331 | + |
| 332 | +## Open questions |
| 333 | + |
| 334 | +- **Should `RouteMatch.backend` support port-only overrides?** The |
| 335 | + current proposal says no — `name` and `port` are both required when |
| 336 | + `backend` is set. Allowing port-only ("same service, different |
| 337 | + port") would save a few characters at the cost of an implicit |
| 338 | + `name = spec.service.name` inheritance. Decision: keep verbose. |
| 339 | +- **Should `publicRoutes` accept per-route backends too?** For |
| 340 | + symmetry, probably yes — `publicRoutes` and `routes` share |
| 341 | + `RouteMatch`, so the field appears on both automatically. Whether |
| 342 | + the operator should *honor* a per-route backend on a public route |
| 343 | + is the actual question. Defaulting to "yes, public routes can have |
| 344 | + their own backend too" matches the rest of the contract; the |
| 345 | + alternative (only `routes` can override) would require splitting |
| 346 | + the type. Decision lean: honor on both; revisit if there's a |
| 347 | + security argument against. |
| 348 | +- **Status surface.** Should `NebariApp.status` expose per-route |
| 349 | + resolution (which backend each route resolved to)? Useful for |
| 350 | + debugging but adds status surface area that the operator must |
| 351 | + maintain. Decision lean: not in v1 of this change; users can |
| 352 | + inspect the rendered HTTPRoute directly. |
| 353 | + |
| 354 | +## References |
| 355 | + |
| 356 | +- Gateway API `HTTPRoute`: <https://gateway-api.sigs.k8s.io/api-types/httproute/> |
| 357 | +- `BackendObjectReference` cross-namespace mechanics |
| 358 | + (`ReferenceGrant`): <https://gateway-api.sigs.k8s.io/api-types/referencegrant/> |
| 359 | +- `nebari-chat-pack` chart that motivated this: |
| 360 | + <https://github.com/nebari-dev/nebari-chat-pack/tree/main/helm/nebari-chat> |
| 361 | +- Companion concern (streaming/SSE timeouts) PR observed in the |
| 362 | + `openteams-ai/nebari.openteams.ai` deployment repo (PR #12). |
0 commit comments