Skip to content

feat(gateway): URL-stable identity-based cluster naming for API-level upstreams#2137

Draft
mehara-rothila wants to merge 2 commits into
wso2:mainfrom
mehara-rothila:feat/api-level-url-stable-clusters
Draft

feat(gateway): URL-stable identity-based cluster naming for API-level upstreams#2137
mehara-rothila wants to merge 2 commits into
wso2:mainfrom
mehara-rothila:feat/api-level-url-stable-clusters

Conversation

@mehara-rothila

Copy link
Copy Markdown
Contributor

Summary

Changes how the gateway names the Envoy clusters for an API's main and sandbox upstreams. Cluster names are now derived from the API's identity, main_<hash> / sandbox_<hash> where the hash is the first 8 bytes of sha256(apiID|env) in hex, instead of from the backend URL (cluster_<scheme>_<sanitized-host>). Both xDS builders (the Envoy xDS translator and the RuntimeDeployConfig transform path) derive the name through one shared helper, so they can never disagree.

Why

With URL-derived names, editing a backend URL renames the cluster: the old cluster is destroyed, a new one is created, and routes are repointed, draining in-flight connections on every URL change. With identity-derived names the cluster name never changes for the lifetime of the API, so a URL edit propagates as an endpoint update on the same cluster and live traffic is not dropped. The name carries no URL information, so host, port, scheme, and path edits are all equally safe.

Design decisions

Identity in, URL out. The hash input is apiID plus env (main or sandbox) only. Anything URL-derived in the name would defeat the purpose, since some class of URL edit would still rename the cluster.

One naming helper for both builders. The gateway has two builders that must agree on cluster names (the main Envoy xDS translator and the policy-side RuntimeDeployConfig transform). Both now call pkg/utils/clusterkey, replacing the two hand-rolled sanitize functions that had to be kept in sync manually.

ClusterKey and EnvoyClusterName unified. In the RDC path the internal cluster key and the Envoy-facing cluster name are now the same string. If they differ, the policy engine's default upstream cluster metadata points at a name Envoy does not know, which surfaces as 503 NoRoute.

Latent collision fixed. Under URL-derived naming, a main and sandbox upstream pointing at the same URL collided into one cluster, and two different APIs sharing a backend URL shared one cluster. With env in the hash input main and sandbox are always distinct, and each API owns its clusters.

Migration notes (one-time effects on existing deployments)

  • Each existing API's main/sandbox clusters are renamed once, on its next configuration regeneration or redeploy. The new cluster and repointed routes are delivered atomically in the same snapshot.
  • Envoy per-cluster connection pools and stats names change once; dashboards or alerts keyed on the old cluster_<scheme>_<host> names in the upstream_cluster label need a one-time update.
  • Total cluster count can increase, since two APIs targeting the same backend URL no longer share a cluster.

What changed

  • pkg/utils/clusterkey (new): APILevel(apiID, env) naming helper plus unit tests.
  • pkg/xds/translator.go: resolveUpstreamCluster takes the API id and derives the hash name; sanitizeClusterName removed.
  • pkg/transform/restapi.go: addUpstreamCluster derives the same hash name; EnvoyClusterName set to the cluster key; sanitizeEnvoyClusterName removed.
  • cmd/controller/main.go: comment documenting that both xDS paths must keep cluster-name derivation in sync (plus an import-order fix in the touched file).
  • gateway/spec/impls/1-basic-gateway-with-controller/data-model.md: cluster-naming section updated from the old URL-sanitized scheme to the identity-derived scheme.
  • Tests updated across both builders, plus new cases proving the contract.

Test coverage

  • Unit: clusterkey tests; translator tests assert the hash name shape and URL-independence (TestResolveUpstreamCluster_DedupSameAPIDifferentURLs, TestResolveUpstreamCluster_MainSandboxNeverCollide); transformer tests assert the name shape on routes and clusters, that a route's default cluster always references a registered cluster (TestRestAPITransformer_APILevelDefaultClusterMatchesRealCluster), and that a URL edit does not rename the cluster (TestRestAPITransformer_APILevelURLStableAcrossURLEdit).
  • Integration: new api-level-url-stable.feature with three scenarios: a main upstream URL update that changes both the host and the path (the host change is what URL-derived naming could not survive), a sandbox upstream URL update, and default cluster resolution under cluster_header routing. The main and sandbox scenarios also assert cluster identity through the Envoy admin endpoint (identity-named clusters present, URL-derived cluster_<scheme>_<host> names absent, checked before and after the update), so these scenarios fail under the previous naming scheme.
  • Verified live on the integration stack: the cluster name survived a backend host change with the endpoint updated in place, and 150 requests fired continuously across a live URL update returned 150x HTTP 200 with zero failures.

Related PRs

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 61e490aa-333e-4ed5-9731-3caaf0f60b3e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mehara-rothila mehara-rothila changed the base branch from feature/operation-level-ep to main June 10, 2026 08:53
@mehara-rothila mehara-rothila force-pushed the feat/api-level-url-stable-clusters branch 4 times, most recently from 812acc1 to 534247e Compare June 10, 2026 15:20
… upstreams

Derive API-level main/sandbox cluster names from sha256(apiID) instead of the backend URL, in both the Envoy xDS translator and the RuntimeDeployConfig transform path. The final name shape is main_<16hex> / sandbox_<16hex>; an API's main and sandbox clusters share the hash fragment (the env prefix provides the distinction), so an operator can pair an API's clusters at a glance in stats and config dumps. Because the name no longer encodes the URL, a host, port, or scheme edit is delivered to Envoy as an update to the same named cluster, which Envoy warms and swaps, instead of removing one cluster name and adding another, and a path-only edit touches just the route rewrite: routes never repoint, name-keyed stats stay continuous, and no request failures were observed while updating under continuous traffic. ClusterKey and EnvoyClusterName are unified to the same string so the policy engine's default upstream cluster always resolves to a real Envoy cluster, and the same API's main and sandbox upstreams can no longer collide into one cluster when they share a URL. Cross-API name uniqueness rests on the 64-bit hash truncation (collision-resistant, not impossible).

The integration test asserts cluster identity through the Envoy admin endpoint: identity-named clusters present, URL-derived names absent, and the exact cluster-name set unchanged across host-changing URL updates (a new capture/compare step pair), so the scenarios fail under the previous naming scheme. Rollout note: on upgrade, each existing API's main/sandbox clusters are renamed once on their next translation, so per-cluster connection pools and upstream_cluster metric labels churn once; dashboards keyed on the old cluster_<scheme>_<host> names need a one-time update. The data-model spec doc is updated to describe the new naming.
@mehara-rothila mehara-rothila force-pushed the feat/api-level-url-stable-clusters branch from 534247e to a99b916 Compare June 10, 2026 16:16
@mehara-rothila

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@mehara-rothila mehara-rothila changed the base branch from main to feature/operation-level-ep June 10, 2026 18:39
@mehara-rothila mehara-rothila changed the base branch from feature/operation-level-ep to main June 12, 2026 02:24
Both xDS builders now build "<env>_<hash>" through clusterkey.APILevelName so the two paths cannot drift. Add known-answer vectors pinning SHA-256[:8].
@mehara-rothila mehara-rothila force-pushed the feat/api-level-url-stable-clusters branch from 5c3aed7 to 840cb13 Compare June 12, 2026 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant