-
Notifications
You must be signed in to change notification settings - Fork 58
kep: pod port allocation #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shangsmo
wants to merge
16
commits into
sgl-project:main
Choose a base branch
from
shangsmo:kep-171-pod-port-allocator
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
b991126
feature: add port allocation kep
a9f27c2
Update keps/171-pod-port-allocation/README.md
shangsmo b038fef
Update keps/171-pod-port-allocation/README.md
shangsmo b95e17b
Update keps/171-pod-port-allocation/README.md
shangsmo b34d447
Update keps/171-pod-port-allocation/README.md
1de8e43
Update keps/171-pod-port-allocation/README.md
shangsmo 818be16
feature: add port allocation kep
f06cdd4
Update keps/171-pod-port-allocation/README.md
shangsmo 5c77a8f
Update keps/171-pod-port-allocation/README.md
shangsmo a237139
Update keps/171-pod-port-allocation/README.md
shangsmo bb1662e
Update keps/171-pod-port-allocation/README.md
eb79bb3
Update keps/171-pod-port-allocation/README.md
shangsmo e8313e2
Merge remote-tracking branch 'origin/kep-171-pod-port-allocator' into…
shangsmo 6110903
update kep-171 pod port allocation‘s README.md
shangsmo a907d9f
update kep-171 pod port allocation‘s README.md
shangsmo 5c46ddd
Merge branch 'sgl-project:main' into kep-171-pod-port-allocator
shangsmo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,294 @@ | ||
| # KEP-171 Pod Port Allocation | ||
| <!-- toc --> | ||
| - [Motivation](#motivation) | ||
| - [Goals](#goals) | ||
| - [Proposal](#proposal) | ||
| - [User Stories](#user-stories) | ||
| - [Story 1](#story-1) | ||
| - [Story 2](#story-2) | ||
| - [Risks and Mitigations](#risks-and-mitigations) | ||
| - [Design Details](#design-details) | ||
| - [API](#api) | ||
| - [Port Allocator Interface](#port-allocator-interface) | ||
| - [Implementation](#implementation) | ||
| - [Test Plan](#test-plan) | ||
| - [Unit Tests](#unit-tests) | ||
| - [Integration tests](#integration-tests) | ||
| - [End to End Tests](#end-to-end-tests) | ||
| <!-- /toc --> | ||
|
|
||
| ## Motivation | ||
| When deploying LLM services using the PD (Prefill-Decode) disaggregated approach, containers often need to use `hostNetwork: true` to leverage RDMA networks. In the hostNetwork scenario, if two replicas of the same Role are scheduled to the same node, port conflicts will occur because both containers are configured with the same port, resulting in only one replica being able to start successfully. | ||
|
|
||
| Therefore, RBG needs to provide a dynamic port allocation solution that can automatically assign different port numbers to replicas during service deployment, thereby improving the deployment density of inference service Pods. | ||
|
|
||
| ## Goals | ||
| - RBG applications should be able to automatically allocate service ports for containers on-demand during deployment, avoiding port conflicts in hostNetwork scenarios. | ||
| - The dynamically assigned ports for each Pod can be injected into the generated Pod through "container environment variables" and "Pod annotations", allowing containers to be aware of their assigned port values. | ||
| - It should be possible to obtain the port values allocated to other Pods within the current Role replicas on-demand, for Pod service discovery purposes. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### User Stories | ||
|
|
||
| #### Story 1 | ||
| > Deploying inference services in a cluster | ||
|
|
||
| As a cluster administrator, I want to deploy my model service using RBG in the cluster. | ||
| The service requires hostNetwork to be enabled for both the Prefill and Decoder roles. | ||
| However, since there are existing services also using hostNetwork, I need to avoid port conflicts. | ||
|
|
||
|
|
||
| #### Story 2 | ||
| > Scale out Role replicas | ||
|
|
||
| As a LLM service operator, I noticed that the Decoder Role's load is very high recently and I need to scale out more replicas for the Decoder Role. | ||
| If each replica can use different ports, I can complete the deployment with fewer nodes when resources are sufficient. | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| - The dynamic port allocation feature is only supported in *InstanceSet* mode. | ||
|
|
||
| ## Design Details | ||
| When the Role of an RBG resource is InstanceSet, the resource directly associated with the Pod is the Instance resource. | ||
| Therefore, the current design requires dynamic port allocation to be completed within the scope of the Instance. | ||
|
|
||
| ### API | ||
| If users want to configure dynamic ports for a Pod generated by an RBG object, they need to add the following annotation in the corresponding Pod Template of the RBG object: | ||
| ```yaml | ||
| rolebasedgroup.workloads.x-k8s.io/port-allocator: | | ||
| { | ||
| "allocations": [ | ||
| { | ||
| "name": "grpc", // Logical name | ||
| "env": "GRPC_PORT", // Env var to inject | ||
| "annotationKey": "test/grpc-port", // annotationKey to inject | ||
| "policy": "Dynamic" // Dynamic (per-pod) or Static (per-role) | ||
| } | ||
| ], | ||
| "references": [ | ||
| { | ||
| "env": "LEADER_ADDR_PORT", // Env var to inject | ||
| "from": "leader.grpc" // Format: "<role_name>.<port_name>" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| The JSON field definitions in the annotation are as follows: | ||
| ```go | ||
| type PortPolicy string | ||
|
|
||
| const ( | ||
| // Port is only valid for the current Pod | ||
| Dynamic PortPolicy = "Dynamic" | ||
| // Port is valid for all Pod replicas in the current role | ||
| Static PortPolicy = "Static" | ||
| ) | ||
|
|
||
| type PortAllocatorConfig struct { | ||
| // Allocations specifies the ports to be allocated | ||
| Allocations []PortAllocation `json:"allocations"` | ||
| // References specifies the ports to be referenced from other pod | ||
| References []PortReference `json:"references"` | ||
| } | ||
|
|
||
| type PortAllocation struct { | ||
| // Not Empty | ||
| // Name specifies the name of the port | ||
| Name string `json:"name"` | ||
| // Not Empty | ||
| // Env specifies the name of the environment variable to be injected into the container | ||
| Env string `json:"env"` | ||
| // AnnotationKey specifies the key of the annotation to be injected into the Pod | ||
| AnnotationKey string `json:"annotationKey"` | ||
| // Not Empty | ||
| // Default is Dynamic | ||
| // Policy specifies the scope of the port | ||
| Policy PortPolicy `json:"policy"` | ||
| } | ||
|
|
||
| type PortReference struct { | ||
| // Not Empty | ||
| // Env specifies the name of the environment variable to be injected into the container | ||
| Env string `json:"env"` | ||
| // Not Empty | ||
| // From specifies the name of the port to be referenced | ||
| From string `json:"from"` | ||
| } | ||
| ``` | ||
| For example, when using the following configuration: | ||
| - The leader Pod will be allocated one port | ||
| - Port 1: Injected into the container as the `LEADER_PORT` environment variable. | ||
| - Each worker Pod will be allocated two ports and reference one from the leader Pod: | ||
| - **Dynamic Port** (`WORKER_PORT1`): Unique per Pod, annotated as `test/worker-port1` | ||
| - **Static Port** (`WORKER_PORT2`): Shared across Pods, annotated as `test/worker-port2` | ||
| - **Reference Port** (`LEADER_PORT_REF`): Points to leader's Port 1 | ||
| ```yaml | ||
| apiVersion: workloads.x-k8s.io/v1alpha1 | ||
| kind: Instance | ||
| metadata: | ||
| name: test | ||
| namespace: default | ||
| spec: | ||
| readyPolicy: AllPodReady | ||
| components: | ||
| - name: leader | ||
| size: 1 | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| rolebasedgroup.workloads.x-k8s.io/port-allocator: | | ||
| { | ||
| "allocations": [ | ||
| { | ||
| "name": "leader-port", | ||
| "env": "LEADER_PORT", | ||
| "annotationKey": "test/grpc-port", | ||
| "policy": "Dynamic" | ||
| } | ||
| ] | ||
| } | ||
| spec: | ||
| containers: | ||
| - name: nginx | ||
| image: nginx:1.28.0 | ||
| - name: worker | ||
| size: 2 | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| rolebasedgroup.workloads.x-k8s.io/port-allocator: | | ||
| { | ||
| "allocations": [ | ||
| { | ||
| "name": "worker-port1", | ||
| "env": "WORKER_PORT1", | ||
| "annotationKey": "test/worker-port1", | ||
| "policy": "Dynamic" | ||
| }, | ||
| { | ||
| "name": "worker-port2", | ||
| "env": "WORKER_PORT2", | ||
| "annotationKey": "test/worker-port2", | ||
| "policy": "Static" | ||
| } | ||
| ], | ||
| "references": [ | ||
| { | ||
| "env": "LEADER_ADDR_PORT", | ||
| "from": "leader.leader-port" | ||
| } | ||
| ] | ||
| } | ||
| spec: | ||
| containers: | ||
| - name: nginx | ||
| image: nginx:1.28.0 | ||
| ``` | ||
| ### Port Allocator Interface | ||
|
|
||
| To facilitate future extensibility, an interface can be defined for dynamic port allocation, with different port allocation strategies implemented. The port allocation strategy interface is defined as follows: | ||
|
|
||
| ```go | ||
| // AllocateStrategy program startup flags | ||
| type AllocateStrategy string | ||
|
|
||
| type PortAllocator struct { | ||
| strategy AllocateStrategy | ||
| pa PortAllocatorInterface | ||
| client client.Client | ||
| } | ||
|
|
||
| type PortAllocatorInterface interface { | ||
| // Start is used to initialize the port allocator when the program starts | ||
| Start(client client.Client) error | ||
| // Release releases a port, input the port to release | ||
| Release(port int32) error | ||
| // AllocateBatch allocates multiple ports, input the number of ports to allocate, output the list of allocated port numbers | ||
| AllocateBatch(num int32) ([]int32, error) | ||
| } | ||
|
|
||
| // Singleton pattern, created at program startup based on the port allocation strategy | ||
| var portAllocator *PortAllocator | ||
|
|
||
| // GetPortAllocator | ||
| func GetPortAllocator() PortAllocatorInterface { | ||
| return portAllocator.pa | ||
| } | ||
| ``` | ||
| ### Implementation | ||
|
|
||
| #### Controller Startup Modifications | ||
|
|
||
| - **Controller Initialization**: | ||
| - Parse and validate port allocator configuration flags, including `AllocateStrategy`, `StartPort`, and `PortRange` | ||
| - Instantiate the corresponding port allocator implementation based on the configured `AllocateStrategy` | ||
| - Initialize the port allocator by calling its `Start` method with the Kubernetes client | ||
|
|
||
| #### Instance Reconciler Modifications for `Dynamic` Policy Ports | ||
|
|
||
| For ports with `Dynamic` policy, the Instance reconciler manages port allocation on a per-Pod basis: | ||
|
|
||
| - **Pod Creation**: | ||
| 1. Parse the port allocator annotation to extract `Dynamic` policy port configurations | ||
| 2. Call the port allocator's `AllocateBatch` method to obtain the required number of ports | ||
| 3. Create or update a ConfigMap named `instace-<instance-name>-ports` to store allocated ports: | ||
| - Data keys follow the format: `<pod-name>.<port-name>` (e.g., `worker-0.grpc-port: "30001"`) | ||
| - The ConfigMap is owned by the Instance resource for automatic cleanup | ||
| 4. Inject allocated ports into the Pod specification: | ||
| - Add environment variables to containers using `valueFrom.configMapKeyRef` to reference the ConfigMap as specified by the `env` field | ||
| - Add Pod annotations as specified by the `annotationKey` field with the port value from ConfigMap | ||
| - Mount the ConfigMap for cross-pod service discovery if references are configured | ||
|
|
||
| - **Pod Update**: | ||
| 1. Detect changes in port allocation requirements by comparing current and updated pod templates | ||
| 2. Allocate new ports via `AllocateBatch` for additional or modified port requirements | ||
| 3. Update the ConfigMap with new port allocations | ||
| 4. Update the Pod spec to reflect the new ConfigMap references: | ||
| - Update environment variable references (`valueFrom.configMapKeyRef`) to point to the new keys in ConfigMap | ||
| - Update Pod annotations with new port values from ConfigMap | ||
| 5. Release obsolete ports via the `Release` method after the updated Pod becomes ready | ||
| 6. Clean up removed port entries from the ConfigMap | ||
|
|
||
| - **Pod Deletion**: | ||
| 1. Retrieve allocated port information from the ConfigMap for the deleted Pod | ||
| 2. Release all ports associated with the deleted Pod via the `Release` method | ||
| 3. Remove corresponding entries from the ConfigMap | ||
| 4. Delete the ConfigMap if no ports remain (optional cleanup policy) | ||
|
|
||
| #### InstanceSet Reconciler Modifications for `Static` Policy Ports | ||
|
|
||
| For ports with `Static` policy, the InstanceSet reconciler manages port allocation at the Role level: | ||
|
|
||
| - **Instance Creation**: | ||
| 1. Parse the port allocator annotation to extract `Static` policy port configurations | ||
| 2. Call the port allocator's `AllocateBatch` method to obtain one port per port definition (shared across all replicas) | ||
| 3. Create or update a ConfigMap named `instanceset-<instanceset-name>-ports` to store static port allocations: | ||
| - Data keys follow the format: `<instance-name>.<port-name>` (e.g., `test.worker-port2: "30002"`) | ||
| - The ConfigMap is owned by the InstanceSet resource for automatic cleanup | ||
| 4. Inject the ConfigMap reference into the Pod template: | ||
| - Add volume referencing the ConfigMap to the Pod spec | ||
|
|
||
| - **Instance Update**: | ||
| 1. Detect changes in static port requirements between old and new pod templates | ||
| 2. Allocate new ports via `AllocateBatch` for new or modified port requirements | ||
| 3. Update the ConfigMap with new static port allocations | ||
| 4. Update the Pod template volume references to the ConfigMap | ||
| 5. Release obsolete ports via the `Release` method once all Pods using the old template are terminated | ||
|
|
||
| - **Instance Deletion**: | ||
| 1. Remove the corresponding key from the ConfigMap for the deleted instance | ||
| 2. If all instances in the InstanceSet are deleted, release all ports occupied by the InstanceSet via the `Release` method and delete the ConfigMap | ||
| ### Test Plan | ||
|
|
||
| #### Unit Tests | ||
|
|
||
|
|
||
| #### Integration tests | ||
|
|
||
|
|
||
| #### End to End Tests | ||
|
|
||
|
|
||
|
Comment on lines
+282
to
+292
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Test Plan section is currently empty. For a feature of this nature, it's crucial to outline a comprehensive testing strategy. Please detail the planned unit, integration, and end-to-end tests. For example:
|
||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| title: Pod Port Allocation | ||
| kep-number: 171 | ||
| authors: | ||
| - "@shangsimou" | ||
| status: provisional | ||
| creation-date: 2026-03-03 | ||
| reviewers: | ||
| - "@cheyang" | ||
| approvers: | ||
|
|
||
| stage: alpha | ||
|
|
||
| latest-milestone: "v0.7.0" | ||
|
|
||
| milestone: | ||
| alpha: "v0.7.0" |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed design uses a global singleton pattern for the
portAllocator. This introduces global state, which can make testing difficult and the system harder to reason about. Consider using dependency injection instead. ThePortAllocatorcould be instantiated inmain.goand passed down to the reconcilers that need it. This would make the dependencies explicit and improve testability.