Skip to content

Commit 7a5ea5b

Browse files
committed
docs(design): propose multi-backend routes for NebariApp
Adds a design doc proposing an optional per-route backend override on RouteMatch so a single NebariApp can route different path prefixes to different services under one hostname. Tightens the same-namespace contract by removing ServiceReference.Namespace, and codifies the "one NebariApp = one hostname" boundary that has been implicit until now. The streaming/BackendTrafficPolicy concern (Envoy SSE timeouts) is referenced as a follow-up but is intentionally not part of this proposal.
1 parent 843ea87 commit 7a5ea5b

1 file changed

Lines changed: 372 additions & 0 deletions

File tree

Lines changed: 372 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,372 @@
1+
# Multi-Port Routes on NebariApp
2+
3+
**Status:** Draft **Author:** @viniciusdc
4+
**Created:** 2026-05-19
5+
6+
This document proposes extending the `NebariApp` routing contract so a
7+
single app can expose **multiple path-based routes that target different
8+
ports on the same backend service** under one hostname. It also tightens
9+
the same-namespace contract by removing `ServiceReference.Namespace`,
10+
and codifies the "one NebariApp = one hostname = one backend Service"
11+
boundary that has been implicit so far.
12+
13+
A separate concern — exposing Envoy `BackendTrafficPolicy` knobs for
14+
streaming/SSE timeouts — is referenced in *Follow-ups* but is not part
15+
of this design.
16+
17+
> **Note on file name.** This document was originally titled
18+
> *multi-backend routes* and proposed a per-route `backend: {name, port}`
19+
> override. Iteration narrowed the scope to *multi-port on a single
20+
> service*. The filename is kept for URL stability; the content reflects
21+
> the narrower design.
22+
23+
## Problem
24+
25+
`NebariApp.spec` today says "one hostname → one Service → one port,
26+
optionally narrowed by path." The `routing.routes[]` list lets users
27+
filter which paths reach the backend, but every route resolves to the
28+
same `{spec.service.name, spec.service.port}` tuple. There is no way
29+
to say "everything under `/api/*` should land on port `8000` and
30+
everything else on port `80`" within a single NebariApp, even though
31+
**a single Kubernetes Service can expose multiple ports** and routing
32+
to different ports per path is a normal Gateway-API pattern.
33+
34+
Concretely:
35+
36+
- Services that expose a UI on one port and an admin or metrics
37+
endpoint on another can't differentiate routing by port today.
38+
- Services that expose an HTTP API and a long-poll / SSE endpoint on
39+
separate ports can't apply path-based selection within one NebariApp.
40+
- Charts that would naturally collapse their exposure into one Service
41+
with two ports are forced to either expose only one port (and lose
42+
the other) or split into two NebariApps (and duplicate TLS, auth,
43+
landing-page infrastructure).
44+
45+
The fix is small: let a `RouteMatch` carry an optional `port` that
46+
overrides the default `spec.service.port` for that path.
47+
48+
## Goals
49+
50+
- Let a single NebariApp expose multiple path-based routes under one
51+
hostname, each optionally targeting a different port on the
52+
NebariApp's single backend service.
53+
- Keep `spec.service` working unchanged as the default for routes that
54+
don't override the port — existing NebariApp manifests continue to
55+
validate and behave identically.
56+
- Make the same-namespace boundary an explicit, enforced contract:
57+
the operator-generated HTTPRoute must only `backendRef` Services in
58+
the NebariApp's own namespace.
59+
- Document "one NebariApp = one hostname = one backend Service" as an
60+
intentional constraint, not an accident of the current schema.
61+
62+
## Non-goals
63+
64+
- **Per-route backend `Service`.** This proposal is intentionally
65+
narrower than the earlier "per-route `backend: {name, port}`"
66+
iteration. A NebariApp targets exactly one Service. Use cases that
67+
genuinely need two Services (e.g. a chart with separate frontend and
68+
backend Deployments backing separate Services) should either
69+
consolidate into a single Service exposing multiple ports, or model
70+
the two halves as two separate NebariApps. Keeping the "one app =
71+
one Service" boundary keeps TLS, auth, and landing-page concerns
72+
scoped to one user-visible URL.
73+
- **Multiple hostnames per NebariApp.** Out of scope. If a future
74+
use case needs it, that's a separate discussion.
75+
- **Cross-namespace backends.** Today's `spec.service.namespace`
76+
permits this but the operator does not create the `ReferenceGrant`
77+
the Gateway API requires for it to actually work — so the field has
78+
been silently incomplete. This proposal *removes* it. Tools that
79+
need to reach into other namespaces should do so via in-cluster DNS
80+
(`svc.other-ns.svc.cluster.local`), not via the operator's
81+
HTTPRoute.
82+
- **Weights, header/query matchers, request/response filters.** This
83+
proposal narrows on per-route port selection. Anything else
84+
Gateway API offers on `HTTPRouteRule` stays out until a use case
85+
demands it.
86+
- **Envoy `BackendTrafficPolicy` (streaming/SSE timeouts).** Tracked as
87+
a follow-up; see *Follow-ups* below. The shape of that change is
88+
independent of this one.
89+
90+
## Design principles
91+
92+
Same principles that govern the rest of the NebariApp contract:
93+
94+
1. **No is temporary, yes is forever.** Per-route `port` is the only
95+
field this design adds to `RouteMatch`. Nothing speculative.
96+
2. **Contract independence.** The CRD shape stays expressible in the
97+
Gateway API's mechanics without leaking Envoy specifics. A
98+
per-route `port` maps directly to the `port` field of an
99+
`HTTPRouteRule.backendRefs` entry.
100+
3. **Graceful degradation.** A route with no `port` falls back to
101+
`spec.service.port`. A NebariApp with no `routing.routes` keeps
102+
current behavior (one rule, single backend ref, `/` prefix).
103+
4. **Same-namespace by construction.** The CRD has no field that lets
104+
a user express a cross-namespace backend, so the operator never has
105+
to validate or refuse one.
106+
5. **One Service per NebariApp.** The boundary is intentional: a
107+
NebariApp's TLS, auth, landing-page card, and routing concerns
108+
scope to a single backing Service. Use-cases that don't fit that
109+
boundary split into multiple NebariApps.
110+
111+
## Proposed contract
112+
113+
### `RouteMatch` gains an optional `port`
114+
115+
```go
116+
type RouteMatch struct {
117+
// PathPrefix specifies the path prefix to match for routing.
118+
PathPrefix string `json:"pathPrefix"`
119+
120+
// PathType specifies how the path should be matched.
121+
PathType string `json:"pathType,omitempty"`
122+
123+
// Port optionally overrides the default backend port (spec.service.port)
124+
// for this route. The port must be exposed by spec.service. When omitted,
125+
// the route forwards to spec.service.port. This is the only mechanism for
126+
// path-based port differentiation; per-route backend Services are not
127+
// supported (see Non-goals).
128+
// +optional
129+
// +kubebuilder:validation:Minimum=1
130+
// +kubebuilder:validation:Maximum=65535
131+
Port *int32 `json:"port,omitempty"`
132+
}
133+
```
134+
135+
### `ServiceReference.Namespace` is removed
136+
137+
```go
138+
type ServiceReference struct {
139+
// Name is the name of the Kubernetes Service in the NebariApp's
140+
// own namespace.
141+
// +kubebuilder:validation:Required
142+
Name string `json:"name"`
143+
144+
// Port is the default port number on the Service to route traffic to.
145+
// +kubebuilder:validation:Required
146+
Port int32 `json:"port"`
147+
}
148+
```
149+
150+
The current `Namespace` field is dropped. Reasons:
151+
152+
- The operator-generated HTTPRoute would render a `BackendObjectReference`
153+
with a foreign `Namespace`, which Gateway API requires a
154+
`ReferenceGrant` to honor — and the operator does not create one.
155+
The field has always been a half-feature.
156+
- The architectural stance is that pack resources live in the pack's
157+
own namespace. Cross-namespace pod-to-pod talk goes through in-cluster
158+
DNS, not through an operator-managed public route.
159+
- All known callers (helm charts under `nebari-dev`) deploy their
160+
Service into the same namespace as the NebariApp via Argo CD's
161+
per-application namespace. No internal user is known to rely on the
162+
field today.
163+
164+
### `spec.service` stays required and singular
165+
166+
Keeping `spec.service` as a single, required `ServiceReference` means:
167+
168+
- Every existing NebariApp manifest continues to validate.
169+
- Every route, with or without a per-route `port` override, resolves
170+
to `spec.service.name` — no ambiguity about which Service backs a
171+
route.
172+
- The simple case (one hostname, one Service, one port, no path-based
173+
differentiation) stays one struct field.
174+
175+
### One hostname and one Service per NebariApp — codified
176+
177+
The CRD docstring on `spec.hostname` and `spec.service` gains explicit
178+
language:
179+
180+
> Each NebariApp exposes exactly one public hostname and is backed by
181+
> exactly one Kubernetes Service. Packs that need to expose multiple
182+
> hostnames, or that genuinely need to fan out to multiple Services,
183+
> must be split into multiple NebariApps. This is an intentional
184+
> boundary so a NebariApp's TLS, auth, landing-page card, and routing
185+
> concerns all scope to a single user-visible URL backed by a single
186+
> Service.
187+
188+
No schema change — `hostname` and `service` are already singular —
189+
but the constraint moves from accidental to documented.
190+
191+
## End-to-end example
192+
193+
A NebariApp whose Service exposes both a UI port and an API port:
194+
195+
```yaml
196+
apiVersion: reconcilers.nebari.dev/v1
197+
kind: NebariApp
198+
metadata:
199+
name: my-app
200+
namespace: my-app
201+
spec:
202+
hostname: my-app.example.com
203+
service:
204+
name: my-app-svc # one Service exposing two ports below
205+
port: 80 # default port: UI
206+
routing:
207+
routes:
208+
- pathPrefix: /api
209+
port: 8000 # this route forwards to my-app-svc:8000
210+
- pathPrefix: / # no port → falls back to spec.service.port (80)
211+
auth:
212+
enabled: true
213+
provider: keycloak
214+
```
215+
216+
This emits a single HTTPRoute with two rules — `/api` →
217+
`my-app-svc:8000`, `/` → `my-app-svc:80` — both on hostname
218+
`my-app.example.com`. One Certificate, one SecurityPolicy, one
219+
landing-page card, one Service.
220+
221+
The Service is expected to look like:
222+
223+
```yaml
224+
apiVersion: v1
225+
kind: Service
226+
metadata:
227+
name: my-app-svc
228+
namespace: my-app
229+
spec:
230+
ports:
231+
- name: http
232+
port: 80
233+
targetPort: 80
234+
- name: api
235+
port: 8000
236+
targetPort: 8000
237+
selector: { ... }
238+
```
239+
240+
If the route's `port` is not present in `service.spec.ports`, the
241+
operator marks the NebariApp not-Ready with a clear reason (see
242+
*Validation* below).
243+
244+
## Operator changes
245+
246+
Concrete files touched.
247+
248+
### `api/v1/nebariapp_types.go`
249+
250+
- Remove `Namespace` field from `ServiceReference`.
251+
- Add `Port *int32` to `RouteMatch` with `Minimum=1`, `Maximum=65535`,
252+
optional, default-on-omit.
253+
- Update docstrings on `spec.hostname` and `spec.service` to document
254+
the one-hostname / one-Service constraint.
255+
- Tighten the existing comment on `ServiceReference` to "must be in
256+
the NebariApp's own namespace."
257+
258+
### `internal/controller/reconcilers/core/reconciler.go`
259+
260+
`ValidateService`:
261+
262+
- Drop the cross-namespace defaulting branch.
263+
- Look up `spec.service.name` once in the NebariApp's namespace.
264+
- Verify `spec.service.port` is exposed by it (current behavior).
265+
- For each route in `routing.routes[]` and `routing.publicRoutes[]`
266+
that sets `Port`, verify that port is **also** exposed by the same
267+
Service. Routes that don't override `Port` inherit the already-validated
268+
`spec.service.port`.
269+
- A route Port that the Service doesn't expose surfaces as a clear
270+
error — same pattern as today's "service does not expose port N",
271+
prefixed with the route's PathPrefix for diagnosability.
272+
273+
### `internal/controller/reconcilers/routing/httproute.go`
274+
275+
- `buildBackendRefs` is simplified: it always references
276+
`spec.service.name`, with the port resolved per call (either
277+
`spec.service.port` or the route's override).
278+
- `buildHTTPRouteRules` emits one `HTTPRouteRule` per `RouteMatch`,
279+
each with its own `backendRefs` (resolved port). When
280+
`routing.routes` is empty, behavior is unchanged: one rule with
281+
empty matches (so Gateway API applies the `/` default) and
282+
`spec.service.port` as the backend.
283+
- `buildPublicHTTPRoute` mirrors the same shape so per-route ports
284+
work on `publicRoutes[]` too.
285+
286+
### `config/rbac/`
287+
288+
- The `Services` ClusterRole rule can be narrowed from cluster-scoped
289+
to namespace-scoped reads now that no NebariApp can legally
290+
reference a Service outside its own namespace. The exact scoping
291+
belongs in implementation; the design decision is just "tighten."
292+
293+
### Generated artifacts
294+
295+
- `config/crd/bases/reconcilers.nebari.dev_nebariapps.yaml` regenerates.
296+
- `docs/api-reference.md` regenerates.
297+
- The CRD diff will show `service.namespace` removed and
298+
`routing.routes[].port` added.
299+
300+
## Validation
301+
302+
- `RouteMatch.port` (when set) is validated at the CRD level to be in
303+
the range `[1, 65535]`.
304+
- The CRD itself cannot enforce "this port is exposed by spec.service"
305+
— that check stays in the reconciler's `ValidateService` pass.
306+
- Failure mode: NebariApp goes not-Ready with reason indicating which
307+
route's port is missing from the Service, e.g.
308+
`route "/api": service my-app-svc does not expose port 8000`.
309+
310+
## Backwards compatibility
311+
312+
- **Existing manifests:** any NebariApp that did not set
313+
`spec.service.namespace` is wholly unaffected — the field's removal
314+
is invisible.
315+
- **Manifests that did set `spec.service.namespace`:** the API server
316+
will refuse the field on the new CRD. There is no known internal
317+
user. The release-notes / changelog must call this out explicitly so
318+
any external user catches it at upgrade time.
319+
- **The API version stays `v1`.** Field removal would normally argue
320+
for a version bump, but the project's README explicitly flags the
321+
API as "may change without notice" during the NIC bring-up phase.
322+
Once the API is declared stable, this kind of removal should
323+
require a version bump.
324+
325+
## Migration
326+
327+
For internal callers (the only known callers):
328+
329+
1. None — Argo CD installs each pack into a single namespace, and
330+
every surveyed chart already omits `spec.service.namespace`.
331+
332+
For any external caller relying on the field:
333+
334+
1. Move the target Service into the NebariApp's namespace (typical
335+
case), **or**
336+
2. Keep the Service where it is and have the workload connect to it
337+
via in-cluster DNS rather than through the NebariApp's HTTPRoute.
338+
339+
## Follow-ups (not in this design)
340+
341+
- **`routing.streaming` / `BackendTrafficPolicy`.** Envoy's default
342+
15s request timeout breaks SSE and long-poll. The next iteration
343+
should add a per-NebariApp boolean (or small struct) on
344+
`RoutingConfig` that, when set, makes the operator emit a
345+
`BackendTrafficPolicy` targeting its own HTTPRoute. Independent of
346+
this design; its only intersection is that the policy targets the
347+
one HTTPRoute that this design's multi-rule output still produces —
348+
the same policy covers all rules.
349+
- **API version bump to a stable channel.** Once the surface settles,
350+
promoting from `v1` (currently labelled unstable) to a properly
351+
stable version is its own piece of work. Field removals like the
352+
one in this design are the last that can happen before that bump.
353+
354+
## Open questions
355+
356+
- **Should `publicRoutes` accept per-route ports too?** For symmetry,
357+
yes — `publicRoutes` and `routes` share the `RouteMatch` type, so
358+
the field appears on both automatically. Decision lean: honor on
359+
both; revisit if there's a security argument against.
360+
- **Status surface.** Should `NebariApp.status` expose per-route
361+
resolution (which port each route resolved to)? Useful for
362+
debugging but adds status surface area that the operator must
363+
maintain. Decision lean: not in v1 of this change; users can
364+
inspect the rendered HTTPRoute directly.
365+
366+
## References
367+
368+
- Gateway API `HTTPRoute`: <https://gateway-api.sigs.k8s.io/api-types/httproute/>
369+
- `BackendObjectReference` cross-namespace mechanics
370+
(`ReferenceGrant`): <https://gateway-api.sigs.k8s.io/api-types/referencegrant/>
371+
- Companion concern (streaming/SSE timeouts) PR observed in the
372+
`openteams-ai/nebari.openteams.ai` deployment repo (PR #12).

0 commit comments

Comments
 (0)