|
| 1 | +# Multi-Port Routes on NebariApp |
| 2 | + |
| 3 | +**Status:** Draft **Author:** @viniciusdc |
| 4 | +**Created:** 2026-05-19 |
| 5 | + |
| 6 | +This document proposes extending the `NebariApp` routing contract so a |
| 7 | +single app can expose **multiple path-based routes that target different |
| 8 | +ports on the same backend service** under one hostname. It also tightens |
| 9 | +the same-namespace contract by removing `ServiceReference.Namespace`, |
| 10 | +and codifies the "one NebariApp = one hostname = one backend Service" |
| 11 | +boundary that has been implicit so far. |
| 12 | + |
| 13 | +A separate concern — exposing Envoy `BackendTrafficPolicy` knobs for |
| 14 | +streaming/SSE timeouts — is referenced in *Follow-ups* but is not part |
| 15 | +of this design. |
| 16 | + |
| 17 | +> **Note on file name.** This document was originally titled |
| 18 | +> *multi-backend routes* and proposed a per-route `backend: {name, port}` |
| 19 | +> override. Iteration narrowed the scope to *multi-port on a single |
| 20 | +> service*. The filename is kept for URL stability; the content reflects |
| 21 | +> the narrower design. |
| 22 | +
|
| 23 | +## Problem |
| 24 | + |
| 25 | +`NebariApp.spec` today says "one hostname → one Service → one port, |
| 26 | +optionally narrowed by path." The `routing.routes[]` list lets users |
| 27 | +filter which paths reach the backend, but every route resolves to the |
| 28 | +same `{spec.service.name, spec.service.port}` tuple. There is no way |
| 29 | +to say "everything under `/api/*` should land on port `8000` and |
| 30 | +everything else on port `80`" within a single NebariApp, even though |
| 31 | +**a single Kubernetes Service can expose multiple ports** and routing |
| 32 | +to different ports per path is a normal Gateway-API pattern. |
| 33 | + |
| 34 | +Concretely: |
| 35 | + |
| 36 | +- Services that expose a UI on one port and an admin or metrics |
| 37 | + endpoint on another can't differentiate routing by port today. |
| 38 | +- Services that expose an HTTP API and a long-poll / SSE endpoint on |
| 39 | + separate ports can't apply path-based selection within one NebariApp. |
| 40 | +- Charts that would naturally collapse their exposure into one Service |
| 41 | + with two ports are forced to either expose only one port (and lose |
| 42 | + the other) or split into two NebariApps (and duplicate TLS, auth, |
| 43 | + landing-page infrastructure). |
| 44 | + |
| 45 | +The fix is small: let a `RouteMatch` carry an optional `port` that |
| 46 | +overrides the default `spec.service.port` for that path. |
| 47 | + |
| 48 | +## Goals |
| 49 | + |
| 50 | +- Let a single NebariApp expose multiple path-based routes under one |
| 51 | + hostname, each optionally targeting a different port on the |
| 52 | + NebariApp's single backend service. |
| 53 | +- Keep `spec.service` working unchanged as the default for routes that |
| 54 | + don't override the port — existing NebariApp manifests continue to |
| 55 | + validate and behave identically. |
| 56 | +- Make the same-namespace boundary an explicit, enforced contract: |
| 57 | + the operator-generated HTTPRoute must only `backendRef` Services in |
| 58 | + the NebariApp's own namespace. |
| 59 | +- Document "one NebariApp = one hostname = one backend Service" as an |
| 60 | + intentional constraint, not an accident of the current schema. |
| 61 | + |
| 62 | +## Non-goals |
| 63 | + |
| 64 | +- **Per-route backend `Service`.** This proposal is intentionally |
| 65 | + narrower than the earlier "per-route `backend: {name, port}`" |
| 66 | + iteration. A NebariApp targets exactly one Service. Use cases that |
| 67 | + genuinely need two Services (e.g. a chart with separate frontend and |
| 68 | + backend Deployments backing separate Services) should either |
| 69 | + consolidate into a single Service exposing multiple ports, or model |
| 70 | + the two halves as two separate NebariApps. Keeping the "one app = |
| 71 | + one Service" boundary keeps TLS, auth, and landing-page concerns |
| 72 | + scoped to one user-visible URL. |
| 73 | +- **Multiple hostnames per NebariApp.** Out of scope. If a future |
| 74 | + use case needs it, that's a separate discussion. |
| 75 | +- **Cross-namespace backends.** Today's `spec.service.namespace` |
| 76 | + permits this but the operator does not create the `ReferenceGrant` |
| 77 | + the Gateway API requires for it to actually work — so the field has |
| 78 | + been silently incomplete. This proposal *removes* it. Tools that |
| 79 | + need to reach into other namespaces should do so via in-cluster DNS |
| 80 | + (`svc.other-ns.svc.cluster.local`), not via the operator's |
| 81 | + HTTPRoute. |
| 82 | +- **Weights, header/query matchers, request/response filters.** This |
| 83 | + proposal narrows on per-route port selection. Anything else |
| 84 | + Gateway API offers on `HTTPRouteRule` stays out until a use case |
| 85 | + demands it. |
| 86 | +- **Envoy `BackendTrafficPolicy` (streaming/SSE timeouts).** Tracked as |
| 87 | + a follow-up; see *Follow-ups* below. The shape of that change is |
| 88 | + independent of this one. |
| 89 | + |
| 90 | +## Design principles |
| 91 | + |
| 92 | +Same principles that govern the rest of the NebariApp contract: |
| 93 | + |
| 94 | +1. **No is temporary, yes is forever.** Per-route `port` is the only |
| 95 | + field this design adds to `RouteMatch`. Nothing speculative. |
| 96 | +2. **Contract independence.** The CRD shape stays expressible in the |
| 97 | + Gateway API's mechanics without leaking Envoy specifics. A |
| 98 | + per-route `port` maps directly to the `port` field of an |
| 99 | + `HTTPRouteRule.backendRefs` entry. |
| 100 | +3. **Graceful degradation.** A route with no `port` falls back to |
| 101 | + `spec.service.port`. A NebariApp with no `routing.routes` keeps |
| 102 | + current behavior (one rule, single backend ref, `/` prefix). |
| 103 | +4. **Same-namespace by construction.** The CRD has no field that lets |
| 104 | + a user express a cross-namespace backend, so the operator never has |
| 105 | + to validate or refuse one. |
| 106 | +5. **One Service per NebariApp.** The boundary is intentional: a |
| 107 | + NebariApp's TLS, auth, landing-page card, and routing concerns |
| 108 | + scope to a single backing Service. Use-cases that don't fit that |
| 109 | + boundary split into multiple NebariApps. |
| 110 | + |
| 111 | +## Proposed contract |
| 112 | + |
| 113 | +### `RouteMatch` gains an optional `port` |
| 114 | + |
| 115 | +```go |
| 116 | +type RouteMatch struct { |
| 117 | + // PathPrefix specifies the path prefix to match for routing. |
| 118 | + PathPrefix string `json:"pathPrefix"` |
| 119 | + |
| 120 | + // PathType specifies how the path should be matched. |
| 121 | + PathType string `json:"pathType,omitempty"` |
| 122 | + |
| 123 | + // Port optionally overrides the default backend port (spec.service.port) |
| 124 | + // for this route. The port must be exposed by spec.service. When omitted, |
| 125 | + // the route forwards to spec.service.port. This is the only mechanism for |
| 126 | + // path-based port differentiation; per-route backend Services are not |
| 127 | + // supported (see Non-goals). |
| 128 | + // +optional |
| 129 | + // +kubebuilder:validation:Minimum=1 |
| 130 | + // +kubebuilder:validation:Maximum=65535 |
| 131 | + Port *int32 `json:"port,omitempty"` |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +### `ServiceReference.Namespace` is removed |
| 136 | + |
| 137 | +```go |
| 138 | +type ServiceReference struct { |
| 139 | + // Name is the name of the Kubernetes Service in the NebariApp's |
| 140 | + // own namespace. |
| 141 | + // +kubebuilder:validation:Required |
| 142 | + Name string `json:"name"` |
| 143 | + |
| 144 | + // Port is the default port number on the Service to route traffic to. |
| 145 | + // +kubebuilder:validation:Required |
| 146 | + Port int32 `json:"port"` |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +The current `Namespace` field is dropped. Reasons: |
| 151 | + |
| 152 | +- The operator-generated HTTPRoute would render a `BackendObjectReference` |
| 153 | + with a foreign `Namespace`, which Gateway API requires a |
| 154 | + `ReferenceGrant` to honor — and the operator does not create one. |
| 155 | + The field has always been a half-feature. |
| 156 | +- The architectural stance is that pack resources live in the pack's |
| 157 | + own namespace. Cross-namespace pod-to-pod talk goes through in-cluster |
| 158 | + DNS, not through an operator-managed public route. |
| 159 | +- All known callers (helm charts under `nebari-dev`) deploy their |
| 160 | + Service into the same namespace as the NebariApp via Argo CD's |
| 161 | + per-application namespace. No internal user is known to rely on the |
| 162 | + field today. |
| 163 | + |
| 164 | +### `spec.service` stays required and singular |
| 165 | + |
| 166 | +Keeping `spec.service` as a single, required `ServiceReference` means: |
| 167 | + |
| 168 | +- Every existing NebariApp manifest continues to validate. |
| 169 | +- Every route, with or without a per-route `port` override, resolves |
| 170 | + to `spec.service.name` — no ambiguity about which Service backs a |
| 171 | + route. |
| 172 | +- The simple case (one hostname, one Service, one port, no path-based |
| 173 | + differentiation) stays one struct field. |
| 174 | + |
| 175 | +### One hostname and one Service per NebariApp — codified |
| 176 | + |
| 177 | +The CRD docstring on `spec.hostname` and `spec.service` gains explicit |
| 178 | +language: |
| 179 | + |
| 180 | +> Each NebariApp exposes exactly one public hostname and is backed by |
| 181 | +> exactly one Kubernetes Service. Packs that need to expose multiple |
| 182 | +> hostnames, or that genuinely need to fan out to multiple Services, |
| 183 | +> must be split into multiple NebariApps. This is an intentional |
| 184 | +> boundary so a NebariApp's TLS, auth, landing-page card, and routing |
| 185 | +> concerns all scope to a single user-visible URL backed by a single |
| 186 | +> Service. |
| 187 | +
|
| 188 | +No schema change — `hostname` and `service` are already singular — |
| 189 | +but the constraint moves from accidental to documented. |
| 190 | + |
| 191 | +## End-to-end example |
| 192 | + |
| 193 | +A NebariApp whose Service exposes both a UI port and an API port: |
| 194 | + |
| 195 | +```yaml |
| 196 | +apiVersion: reconcilers.nebari.dev/v1 |
| 197 | +kind: NebariApp |
| 198 | +metadata: |
| 199 | + name: my-app |
| 200 | + namespace: my-app |
| 201 | +spec: |
| 202 | + hostname: my-app.example.com |
| 203 | + service: |
| 204 | + name: my-app-svc # one Service exposing two ports below |
| 205 | + port: 80 # default port: UI |
| 206 | + routing: |
| 207 | + routes: |
| 208 | + - pathPrefix: /api |
| 209 | + port: 8000 # this route forwards to my-app-svc:8000 |
| 210 | + - pathPrefix: / # no port → falls back to spec.service.port (80) |
| 211 | + auth: |
| 212 | + enabled: true |
| 213 | + provider: keycloak |
| 214 | +``` |
| 215 | +
|
| 216 | +This emits a single HTTPRoute with two rules — `/api` → |
| 217 | +`my-app-svc:8000`, `/` → `my-app-svc:80` — both on hostname |
| 218 | +`my-app.example.com`. One Certificate, one SecurityPolicy, one |
| 219 | +landing-page card, one Service. |
| 220 | + |
| 221 | +The Service is expected to look like: |
| 222 | + |
| 223 | +```yaml |
| 224 | +apiVersion: v1 |
| 225 | +kind: Service |
| 226 | +metadata: |
| 227 | + name: my-app-svc |
| 228 | + namespace: my-app |
| 229 | +spec: |
| 230 | + ports: |
| 231 | + - name: http |
| 232 | + port: 80 |
| 233 | + targetPort: 80 |
| 234 | + - name: api |
| 235 | + port: 8000 |
| 236 | + targetPort: 8000 |
| 237 | + selector: { ... } |
| 238 | +``` |
| 239 | + |
| 240 | +If the route's `port` is not present in `service.spec.ports`, the |
| 241 | +operator marks the NebariApp not-Ready with a clear reason (see |
| 242 | +*Validation* below). |
| 243 | + |
| 244 | +## Operator changes |
| 245 | + |
| 246 | +Concrete files touched. |
| 247 | + |
| 248 | +### `api/v1/nebariapp_types.go` |
| 249 | + |
| 250 | +- Remove `Namespace` field from `ServiceReference`. |
| 251 | +- Add `Port *int32` to `RouteMatch` with `Minimum=1`, `Maximum=65535`, |
| 252 | + optional, default-on-omit. |
| 253 | +- Update docstrings on `spec.hostname` and `spec.service` to document |
| 254 | + the one-hostname / one-Service constraint. |
| 255 | +- Tighten the existing comment on `ServiceReference` to "must be in |
| 256 | + the NebariApp's own namespace." |
| 257 | + |
| 258 | +### `internal/controller/reconcilers/core/reconciler.go` |
| 259 | + |
| 260 | +`ValidateService`: |
| 261 | + |
| 262 | +- Drop the cross-namespace defaulting branch. |
| 263 | +- Look up `spec.service.name` once in the NebariApp's namespace. |
| 264 | +- Verify `spec.service.port` is exposed by it (current behavior). |
| 265 | +- For each route in `routing.routes[]` and `routing.publicRoutes[]` |
| 266 | + that sets `Port`, verify that port is **also** exposed by the same |
| 267 | + Service. Routes that don't override `Port` inherit the already-validated |
| 268 | + `spec.service.port`. |
| 269 | +- A route Port that the Service doesn't expose surfaces as a clear |
| 270 | + error — same pattern as today's "service does not expose port N", |
| 271 | + prefixed with the route's PathPrefix for diagnosability. |
| 272 | + |
| 273 | +### `internal/controller/reconcilers/routing/httproute.go` |
| 274 | + |
| 275 | +- `buildBackendRefs` is simplified: it always references |
| 276 | + `spec.service.name`, with the port resolved per call (either |
| 277 | + `spec.service.port` or the route's override). |
| 278 | +- `buildHTTPRouteRules` emits one `HTTPRouteRule` per `RouteMatch`, |
| 279 | + each with its own `backendRefs` (resolved port). When |
| 280 | + `routing.routes` is empty, behavior is unchanged: one rule with |
| 281 | + empty matches (so Gateway API applies the `/` default) and |
| 282 | + `spec.service.port` as the backend. |
| 283 | +- `buildPublicHTTPRoute` mirrors the same shape so per-route ports |
| 284 | + work on `publicRoutes[]` too. |
| 285 | + |
| 286 | +### `config/rbac/` |
| 287 | + |
| 288 | +- The `Services` ClusterRole rule can be narrowed from cluster-scoped |
| 289 | + to namespace-scoped reads now that no NebariApp can legally |
| 290 | + reference a Service outside its own namespace. The exact scoping |
| 291 | + belongs in implementation; the design decision is just "tighten." |
| 292 | + |
| 293 | +### Generated artifacts |
| 294 | + |
| 295 | +- `config/crd/bases/reconcilers.nebari.dev_nebariapps.yaml` regenerates. |
| 296 | +- `docs/api-reference.md` regenerates. |
| 297 | +- The CRD diff will show `service.namespace` removed and |
| 298 | + `routing.routes[].port` added. |
| 299 | + |
| 300 | +## Validation |
| 301 | + |
| 302 | +- `RouteMatch.port` (when set) is validated at the CRD level to be in |
| 303 | + the range `[1, 65535]`. |
| 304 | +- The CRD itself cannot enforce "this port is exposed by spec.service" |
| 305 | + — that check stays in the reconciler's `ValidateService` pass. |
| 306 | +- Failure mode: NebariApp goes not-Ready with reason indicating which |
| 307 | + route's port is missing from the Service, e.g. |
| 308 | + `route "/api": service my-app-svc does not expose port 8000`. |
| 309 | + |
| 310 | +## Backwards compatibility |
| 311 | + |
| 312 | +- **Existing manifests:** any NebariApp that did not set |
| 313 | + `spec.service.namespace` is wholly unaffected — the field's removal |
| 314 | + is invisible. |
| 315 | +- **Manifests that did set `spec.service.namespace`:** the API server |
| 316 | + will refuse the field on the new CRD. There is no known internal |
| 317 | + user. The release-notes / changelog must call this out explicitly so |
| 318 | + any external user catches it at upgrade time. |
| 319 | +- **The API version stays `v1`.** Field removal would normally argue |
| 320 | + for a version bump, but the project's README explicitly flags the |
| 321 | + API as "may change without notice" during the NIC bring-up phase. |
| 322 | + Once the API is declared stable, this kind of removal should |
| 323 | + require a version bump. |
| 324 | + |
| 325 | +## Migration |
| 326 | + |
| 327 | +For internal callers (the only known callers): |
| 328 | + |
| 329 | +1. None — Argo CD installs each pack into a single namespace, and |
| 330 | + every surveyed chart already omits `spec.service.namespace`. |
| 331 | + |
| 332 | +For any external caller relying on the field: |
| 333 | + |
| 334 | +1. Move the target Service into the NebariApp's namespace (typical |
| 335 | + case), **or** |
| 336 | +2. Keep the Service where it is and have the workload connect to it |
| 337 | + via in-cluster DNS rather than through the NebariApp's HTTPRoute. |
| 338 | + |
| 339 | +## Follow-ups (not in this design) |
| 340 | + |
| 341 | +- **`routing.streaming` / `BackendTrafficPolicy`.** Envoy's default |
| 342 | + 15s request timeout breaks SSE and long-poll. The next iteration |
| 343 | + should add a per-NebariApp boolean (or small struct) on |
| 344 | + `RoutingConfig` that, when set, makes the operator emit a |
| 345 | + `BackendTrafficPolicy` targeting its own HTTPRoute. Independent of |
| 346 | + this design; its only intersection is that the policy targets the |
| 347 | + one HTTPRoute that this design's multi-rule output still produces — |
| 348 | + the same policy covers all rules. |
| 349 | +- **API version bump to a stable channel.** Once the surface settles, |
| 350 | + promoting from `v1` (currently labelled unstable) to a properly |
| 351 | + stable version is its own piece of work. Field removals like the |
| 352 | + one in this design are the last that can happen before that bump. |
| 353 | + |
| 354 | +## Open questions |
| 355 | + |
| 356 | +- **Should `publicRoutes` accept per-route ports too?** For symmetry, |
| 357 | + yes — `publicRoutes` and `routes` share the `RouteMatch` type, so |
| 358 | + the field appears on both automatically. Decision lean: honor on |
| 359 | + both; revisit if there's a security argument against. |
| 360 | +- **Status surface.** Should `NebariApp.status` expose per-route |
| 361 | + resolution (which port each route resolved to)? Useful for |
| 362 | + debugging but adds status surface area that the operator must |
| 363 | + maintain. Decision lean: not in v1 of this change; users can |
| 364 | + inspect the rendered HTTPRoute directly. |
| 365 | + |
| 366 | +## References |
| 367 | + |
| 368 | +- Gateway API `HTTPRoute`: <https://gateway-api.sigs.k8s.io/api-types/httproute/> |
| 369 | +- `BackendObjectReference` cross-namespace mechanics |
| 370 | + (`ReferenceGrant`): <https://gateway-api.sigs.k8s.io/api-types/referencegrant/> |
| 371 | +- Companion concern (streaming/SSE timeouts) PR observed in the |
| 372 | + `openteams-ai/nebari.openteams.ai` deployment repo (PR #12). |
0 commit comments