Skip to content

Commit b7375d9

Browse files
committed
docs(design): propose multi-backend routes for NebariApp
Adds a design doc proposing an optional per-route backend override on RouteMatch so a single NebariApp can route different path prefixes to different services under one hostname. Tightens the same-namespace contract by removing ServiceReference.Namespace, and codifies the "one NebariApp = one hostname" boundary that has been implicit until now. The streaming/BackendTrafficPolicy concern (Envoy SSE timeouts) is referenced as a follow-up but is intentionally not part of this proposal.
1 parent 843ea87 commit b7375d9

1 file changed

Lines changed: 362 additions & 0 deletions

File tree

Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# Multi-Backend Routes on NebariApp
2+
3+
**Status:** Draft **Author:** @viniciusdc
4+
**Created:** 2026-05-19
5+
6+
This document proposes extending the `NebariApp` routing contract so a
7+
single app can expose **multiple path-based routes that target different
8+
backend services** under one hostname, and locks the boundary that an
9+
HTTPRoute generated by the operator must only reference services in the
10+
same namespace as the NebariApp itself.
11+
12+
It tightens the existing `ServiceReference` shape (drops the
13+
`namespace` field), introduces an optional per-route `backend` override
14+
on `RouteMatch`, and codifies the "one NebariApp = one hostname"
15+
constraint that has been implicit so far.
16+
17+
A separate concern — exposing Envoy `BackendTrafficPolicy` knobs for
18+
streaming/SSE timeouts — is referenced in *Follow-ups* but is not part
19+
of this design.
20+
21+
## Problem
22+
23+
`NebariApp.spec` today reads like a single hostname pointing at a single
24+
backend service:
25+
26+
```yaml
27+
spec:
28+
hostname: chat.nebari.openteams.ai
29+
service:
30+
name: chat-frontend
31+
port: 80
32+
routing:
33+
routes:
34+
- pathPrefix: /
35+
```
36+
37+
The `routing.routes[]` list lets users *narrow* which paths reach the
38+
backend, but every route resolves to the same `spec.service`. There is
39+
no way to say "`/api/v1/*` goes to the API container's service and
40+
everything else goes to the frontend's service" within a single NebariApp.
41+
42+
Packs that genuinely have two exposed components (frontend +
43+
backend, app + admin, UI + API) hit one of two unsatisfying paths:
44+
45+
1. **Two NebariApps, two hostnames.** This is what
46+
[`nebari-chat-pack`](https://github.com/nebari-dev/nebari-chat-pack)
47+
does — `chat.…` and `ravnar.…` are two NebariApps because the chart
48+
has no way to fold both into one. The operator then emits two
49+
HTTPRoutes, two Certificates, two SecurityPolicies — duplicated
50+
per-app infrastructure where conceptually it's one app.
51+
2. **Hand-rolled `HTTPRoute` outside the operator.** Loses everything
52+
the operator provides (TLS, auth, landing-page integration, status
53+
surfacing) for the sake of fanning out by path.
54+
55+
Both options exist because the operator doesn't model "one hostname,
56+
multiple backends" — which is the Gateway API's *native* shape.
57+
`HTTPRoute.spec.rules[]` already supports per-rule `backendRefs[]`. We
58+
just don't surface it.
59+
60+
## Goals
61+
62+
- Let a single NebariApp expose multiple path-based routes, each
63+
optionally pointing at a different backend service in the same
64+
namespace, under one hostname.
65+
- Keep `spec.service` working unchanged as the default backend for
66+
routes that don't override — existing NebariApp manifests continue to
67+
validate and behave identically.
68+
- Make the same-namespace boundary an explicit, enforced contract:
69+
the operator-generated HTTPRoute must only `backendRef` services in
70+
the NebariApp's own namespace.
71+
- Document "one NebariApp = one hostname" as an intentional constraint,
72+
not an accident of the current schema.
73+
74+
## Non-goals
75+
76+
- **Multiple hostnames per NebariApp.** Out of scope for this design.
77+
If a future use case needs it, that's a separate discussion. The
78+
current chat-pack split into two NebariApps would *not* be resolved
79+
by this proposal — that pack's split is a pack-design issue
80+
(it should either consolidate to one hostname with per-route
81+
backends, or treat the two pieces as genuinely independent packs).
82+
- **Cross-namespace backends.** Today's `spec.service.namespace`
83+
permits this but the operator does not create the `ReferenceGrant`
84+
the Gateway API requires for it to actually work — so the field has
85+
been silently incomplete. This proposal *removes* it. Tools that need
86+
to reach into other namespaces should do so via in-cluster DNS
87+
(`svc.other-ns.svc.cluster.local`), not via the operator's
88+
HTTPRoute.
89+
- **Weights, header/query matchers, request/response filters.** This
90+
proposal narrows on per-route backend selection. Anything else
91+
Gateway API offers on `HTTPRouteRule` stays out until a use case
92+
demands it.
93+
- **Envoy `BackendTrafficPolicy` (streaming/SSE timeouts).** Tracked as
94+
a follow-up; see *Follow-ups* below. The shape of that change is
95+
independent of this one.
96+
97+
## Design principles
98+
99+
Same principles that govern the rest of the NebariApp contract:
100+
101+
1. **No is temporary, yes is forever.** Per-route `backend` is the only
102+
field this design adds to `RouteMatch`. Nothing speculative.
103+
2. **Contract independence.** The CRD shape stays expressible in the
104+
Gateway API's mechanics without leaking Envoy specifics. A
105+
per-route `backend` maps directly to an `HTTPRouteRule.backendRefs`
106+
entry.
107+
3. **Graceful degradation.** A route with no `backend` falls back to
108+
`spec.service`. A NebariApp with no `routing.routes` keeps current
109+
behavior (one rule, single backend, `/` prefix).
110+
4. **Same-namespace by construction.** The CRD has no field that lets
111+
a user express a cross-namespace backend, so the operator never has
112+
to validate or refuse one.
113+
114+
## Proposed contract
115+
116+
### `RouteMatch` gains an optional `backend`
117+
118+
```go
119+
type RouteMatch struct {
120+
// PathPrefix specifies the path prefix to match for routing.
121+
PathPrefix string `json:"pathPrefix"`
122+
123+
// PathType specifies how the path should be matched.
124+
PathType string `json:"pathType,omitempty"`
125+
126+
// Backend optionally overrides the default backend (spec.service)
127+
// for this route. When omitted, traffic matching this route is
128+
// sent to spec.service. The referenced Service must exist in the
129+
// NebariApp's own namespace.
130+
// +optional
131+
Backend *ServiceReference `json:"backend,omitempty"`
132+
}
133+
```
134+
135+
`ServiceReference` is the same struct used at the top level, minus its
136+
`Namespace` field (see below).
137+
138+
### `ServiceReference.Namespace` is removed
139+
140+
```go
141+
type ServiceReference struct {
142+
// Name is the name of the Kubernetes Service in the NebariApp's
143+
// own namespace.
144+
// +kubebuilder:validation:Required
145+
Name string `json:"name"`
146+
147+
// Port is the port number on the Service to route traffic to.
148+
// +kubebuilder:validation:Required
149+
Port int32 `json:"port"`
150+
}
151+
```
152+
153+
The current `Namespace` field is dropped. Reasons:
154+
155+
- The operator-generated HTTPRoute would render a `BackendObjectReference`
156+
with a foreign `Namespace`, which Gateway API requires a
157+
`ReferenceGrant` to honor — and the operator does not create one.
158+
The field has always been a half-feature.
159+
- The architectural stance is that pack resources live in the pack's
160+
own namespace. Cross-namespace pod-to-pod talk goes through in-cluster
161+
DNS, not through an operator-managed public route.
162+
- All known callers (helm charts under `nebari-dev`) deploy their
163+
Service into the same namespace as the NebariApp via Argo CD's
164+
per-application namespace. No internal user is known to rely on the
165+
field today.
166+
167+
### `spec.service` stays required
168+
169+
Keeping `spec.service` required means:
170+
171+
- Every existing NebariApp manifest continues to validate.
172+
- Routes without a per-route `backend` have a single, unambiguous
173+
fallback target — no special-case "I forgot to configure a backend
174+
anywhere" path to reason about.
175+
- The simple case (one hostname, one backend, no per-route overrides)
176+
stays one struct field.
177+
178+
### One hostname per NebariApp — codified
179+
180+
The CRD docstring on `spec.hostname` gains an explicit note:
181+
182+
> Each NebariApp exposes exactly one public hostname. Packs that need
183+
> multiple hostnames must be split into multiple NebariApps. This is an
184+
> intentional boundary so a NebariApp's TLS, auth, landing-page card,
185+
> and routing concerns all scope to a single user-visible URL.
186+
187+
No schema change — `hostname` is already a single string — but the
188+
constraint moves from accidental to documented.
189+
190+
## End-to-end example
191+
192+
What chat-pack would look like with this contract, *if* it chose to
193+
collapse to one hostname:
194+
195+
```yaml
196+
apiVersion: reconcilers.nebari.dev/v1
197+
kind: NebariApp
198+
metadata:
199+
name: nebari-chat
200+
namespace: nebari-chat
201+
spec:
202+
hostname: chat.nebari.openteams.ai
203+
service:
204+
name: chat-frontend
205+
port: 80
206+
routing:
207+
routes:
208+
- pathPrefix: /api
209+
backend:
210+
name: ravnar-backend
211+
port: 8000
212+
- pathPrefix: /
213+
# no backend → falls back to spec.service (chat-frontend:80)
214+
auth:
215+
enabled: true
216+
provider: keycloak
217+
```
218+
219+
This emits a single HTTPRoute with two rules — `/api` →
220+
`ravnar-backend:8000`, `/` → `chat-frontend:80` — both on hostname
221+
`chat.nebari.openteams.ai`. One Certificate, one SecurityPolicy, one
222+
landing-page card.
223+
224+
Whether chat-pack *should* collapse this way is a pack-design choice,
225+
not a contract issue. The proposal here just unblocks the option.
226+
227+
## Operator changes
228+
229+
Concrete files touched.
230+
231+
### `api/v1/nebariapp_types.go`
232+
233+
- Remove `Namespace` field from `ServiceReference`.
234+
- Add `Backend *ServiceReference` to `RouteMatch`.
235+
- Update docstrings on `spec.hostname` and `RoutingConfig` to
236+
document the one-hostname-per-NebariApp constraint.
237+
- Tighten the existing comment on `ServiceReference` to "must be in
238+
the NebariApp's own namespace."
239+
240+
### `internal/controller/reconcilers/core/reconciler.go`
241+
242+
`ValidateService` (around line 120):
243+
244+
- Drop the defaulting branch — `serviceNamespace := nebariApp.Namespace`
245+
directly.
246+
- Extend the function (or add a sibling) to validate per-route
247+
`backend` references: each must resolve to an existing Service in
248+
the NebariApp's namespace, with the named port exposed.
249+
- Surface validation failures via the existing condition pattern
250+
(e.g. a `RouteBackendNotFound` reason on the appropriate condition).
251+
252+
### `internal/controller/reconcilers/routing/httproute.go`
253+
254+
`buildBackendRefs` (around line 260):
255+
256+
- Stop reading `Spec.Service.Namespace`.
257+
- Drop the cross-namespace branch at lines 273–278 entirely.
258+
- Refactor to take a `*ServiceReference` (the route's `backend` if
259+
set, else `spec.service`) and return that route's `HTTPBackendRef`s.
260+
- HTTPRoute construction emits one `HTTPRouteRule` per `RouteMatch`,
261+
each with its own `backendRefs` resolved from the route.
262+
263+
### `config/rbac/`
264+
265+
- The `Services` ClusterRole rule can be narrowed from cluster-scoped
266+
to namespace-scoped reads now that no NebariApp can legally
267+
reference a Service outside its own namespace. The exact scoping
268+
belongs in implementation; the design decision is just "tighten."
269+
270+
### Generated artifacts
271+
272+
- `config/crd/bases/reconcilers.nebari.dev_nebariapps.yaml` regenerates.
273+
- `docs/api-reference.md` regenerates.
274+
- The CRD diff will show `service.namespace` removed and
275+
`routing.routes[].backend` added.
276+
277+
## Validation
278+
279+
- `RouteMatch.backend.name` and `RouteMatch.backend.port` are required
280+
when `backend` is set (no port-only override — see *Open questions*).
281+
- The CRD itself cannot enforce "Service exists in the NebariApp's
282+
namespace" — that check stays in the reconciler's `ValidateService`
283+
pass.
284+
- The operator should reject a NebariApp where `routing.routes[]`
285+
contains duplicate `pathPrefix + pathType` tuples, but that's a
286+
pre-existing concern not introduced by this design.
287+
288+
## Backwards compatibility
289+
290+
- **Existing manifests:** any NebariApp that did not set
291+
`spec.service.namespace` is wholly unaffected — the field's removal
292+
is invisible.
293+
- **Manifests that did set `spec.service.namespace`:** the API server
294+
will refuse the field on the new CRD. There is no known internal
295+
user. The release-notes / changelog must call this out explicitly so
296+
any external user catches it at upgrade time.
297+
- **The API version stays `v1`.** Field removal would normally argue
298+
for a version bump, but the project's README explicitly flags the API
299+
as "may change without notice" during the NIC bring-up phase. Once
300+
the API is declared stable, this kind of removal should require a
301+
version bump.
302+
303+
## Migration
304+
305+
For internal callers (the only known callers):
306+
307+
1. None — Argo CD installs each pack into a single namespace, and every
308+
surveyed chart already omits `spec.service.namespace`.
309+
310+
For any external caller relying on the field:
311+
312+
1. Move the target Service into the NebariApp's namespace (typical
313+
case), **or**
314+
2. Keep the Service where it is and have the workload connect to it
315+
via in-cluster DNS rather than through the NebariApp's HTTPRoute.
316+
317+
## Follow-ups (not in this design)
318+
319+
- **`routing.streaming` / `BackendTrafficPolicy`.** Envoy's default
320+
15s request timeout breaks SSE and long-poll. The next iteration
321+
should add a per-NebariApp boolean (or small struct) on
322+
`RoutingConfig` that, when set, makes the operator emit a
323+
`BackendTrafficPolicy` targeting its own HTTPRoute. Independent of
324+
this design; its only intersection is that the policy targets the
325+
one HTTPRoute that this design's multi-rule output still produces —
326+
the same policy covers all rules.
327+
- **API version bump to a stable channel.** Once the surface settles,
328+
promoting from `v1` (currently labelled unstable) to a properly
329+
stable version is its own piece of work. Field removals like the one
330+
in this design are the last that can happen before that bump.
331+
332+
## Open questions
333+
334+
- **Should `RouteMatch.backend` support port-only overrides?** The
335+
current proposal says no — `name` and `port` are both required when
336+
`backend` is set. Allowing port-only ("same service, different
337+
port") would save a few characters at the cost of an implicit
338+
`name = spec.service.name` inheritance. Decision: keep verbose.
339+
- **Should `publicRoutes` accept per-route backends too?** For
340+
symmetry, probably yes — `publicRoutes` and `routes` share
341+
`RouteMatch`, so the field appears on both automatically. Whether
342+
the operator should *honor* a per-route backend on a public route
343+
is the actual question. Defaulting to "yes, public routes can have
344+
their own backend too" matches the rest of the contract; the
345+
alternative (only `routes` can override) would require splitting
346+
the type. Decision lean: honor on both; revisit if there's a
347+
security argument against.
348+
- **Status surface.** Should `NebariApp.status` expose per-route
349+
resolution (which backend each route resolved to)? Useful for
350+
debugging but adds status surface area that the operator must
351+
maintain. Decision lean: not in v1 of this change; users can
352+
inspect the rendered HTTPRoute directly.
353+
354+
## References
355+
356+
- Gateway API `HTTPRoute`: <https://gateway-api.sigs.k8s.io/api-types/httproute/>
357+
- `BackendObjectReference` cross-namespace mechanics
358+
(`ReferenceGrant`): <https://gateway-api.sigs.k8s.io/api-types/referencegrant/>
359+
- `nebari-chat-pack` chart that motivated this:
360+
<https://github.com/nebari-dev/nebari-chat-pack/tree/main/helm/nebari-chat>
361+
- Companion concern (streaming/SSE timeouts) PR observed in the
362+
`openteams-ai/nebari.openteams.ai` deployment repo (PR #12).

0 commit comments

Comments
 (0)