diff --git a/keps/171-pod-port-allocation/README.md b/keps/171-pod-port-allocation/README.md new file mode 100644 index 00000000..36770284 --- /dev/null +++ b/keps/171-pod-port-allocation/README.md @@ -0,0 +1,294 @@ +# KEP-171 Pod Port Allocation + +- [Motivation](#motivation) +- [Goals](#goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Port Allocator Interface](#port-allocator-interface) + - [Implementation](#implementation) + - [Test Plan](#test-plan) + - [Unit Tests](#unit-tests) + - [Integration tests](#integration-tests) + - [End to End Tests](#end-to-end-tests) + + +## Motivation +When deploying LLM services using the PD (Prefill-Decode) disaggregated approach, containers often need to use `hostNetwork: true` to leverage RDMA networks. In the hostNetwork scenario, if two replicas of the same Role are scheduled to the same node, port conflicts will occur because both containers are configured with the same port, resulting in only one replica being able to start successfully. + +Therefore, RBG needs to provide a dynamic port allocation solution that can automatically assign different port numbers to replicas during service deployment, thereby improving the deployment density of inference service Pods. + +## Goals +- RBG applications should be able to automatically allocate service ports for containers on-demand during deployment, avoiding port conflicts in hostNetwork scenarios. +- The dynamically assigned ports for each Pod can be injected into the generated Pod through "container environment variables" and "Pod annotations", allowing containers to be aware of their assigned port values. +- It should be possible to obtain the port values allocated to other Pods within the current Role replicas on-demand, for Pod service discovery purposes. + +## Proposal + +### User Stories + +#### Story 1 +> Deploying inference services in a cluster + +As a cluster administrator, I want to deploy my model service using RBG in the cluster. +The service requires hostNetwork to be enabled for both the Prefill and Decoder roles. +However, since there are existing services also using hostNetwork, I need to avoid port conflicts. + + +#### Story 2 +> Scale out Role replicas + +As a LLM service operator, I noticed that the Decoder Role's load is very high recently and I need to scale out more replicas for the Decoder Role. +If each replica can use different ports, I can complete the deployment with fewer nodes when resources are sufficient. + +### Risks and Mitigations + +- The dynamic port allocation feature is only supported in *InstanceSet* mode. + +## Design Details +When the Role of an RBG resource is InstanceSet, the resource directly associated with the Pod is the Instance resource. +Therefore, the current design requires dynamic port allocation to be completed within the scope of the Instance. + +### API +If users want to configure dynamic ports for a Pod generated by an RBG object, they need to add the following annotation in the corresponding Pod Template of the RBG object: +```yaml +rolebasedgroup.workloads.x-k8s.io/port-allocator: | + { + "allocations": [ + { + "name": "grpc", // Logical name + "env": "GRPC_PORT", // Env var to inject + "annotationKey": "test/grpc-port", // annotationKey to inject + "policy": "Dynamic" // Dynamic (per-pod) or Static (per-role) + } + ], + "references": [ + { + "env": "LEADER_ADDR_PORT", // Env var to inject + "from": "leader.grpc" // Format: "." + } + ] + } +``` +The JSON field definitions in the annotation are as follows: +```go +type PortPolicy string + +const ( + // Port is only valid for the current Pod + Dynamic PortPolicy = "Dynamic" + // Port is valid for all Pod replicas in the current role + Static PortPolicy = "Static" +) + +type PortAllocatorConfig struct { + // Allocations specifies the ports to be allocated + Allocations []PortAllocation `json:"allocations"` + // References specifies the ports to be referenced from other pod + References []PortReference `json:"references"` +} + +type PortAllocation struct { + // Not Empty + // Name specifies the name of the port + Name string `json:"name"` + // Not Empty + // Env specifies the name of the environment variable to be injected into the container + Env string `json:"env"` + // AnnotationKey specifies the key of the annotation to be injected into the Pod + AnnotationKey string `json:"annotationKey"` + // Not Empty + // Default is Dynamic + // Policy specifies the scope of the port + Policy PortPolicy `json:"policy"` +} + +type PortReference struct { + // Not Empty + // Env specifies the name of the environment variable to be injected into the container + Env string `json:"env"` + // Not Empty + // From specifies the name of the port to be referenced + From string `json:"from"` +} +``` +For example, when using the following configuration: +- The leader Pod will be allocated one port + - Port 1: Injected into the container as the `LEADER_PORT` environment variable. +- Each worker Pod will be allocated two ports and reference one from the leader Pod: + - **Dynamic Port** (`WORKER_PORT1`): Unique per Pod, annotated as `test/worker-port1` + - **Static Port** (`WORKER_PORT2`): Shared across Pods, annotated as `test/worker-port2` + - **Reference Port** (`LEADER_PORT_REF`): Points to leader's Port 1 +```yaml +apiVersion: workloads.x-k8s.io/v1alpha1 +kind: Instance +metadata: + name: test + namespace: default +spec: + readyPolicy: AllPodReady + components: + - name: leader + size: 1 + template: + metadata: + annotations: + rolebasedgroup.workloads.x-k8s.io/port-allocator: | + { + "allocations": [ + { + "name": "leader-port", + "env": "LEADER_PORT", + "annotationKey": "test/grpc-port", + "policy": "Dynamic" + } + ] + } + spec: + containers: + - name: nginx + image: nginx:1.28.0 + - name: worker + size: 2 + template: + metadata: + annotations: + rolebasedgroup.workloads.x-k8s.io/port-allocator: | + { + "allocations": [ + { + "name": "worker-port1", + "env": "WORKER_PORT1", + "annotationKey": "test/worker-port1", + "policy": "Dynamic" + }, + { + "name": "worker-port2", + "env": "WORKER_PORT2", + "annotationKey": "test/worker-port2", + "policy": "Static" + } + ], + "references": [ + { + "env": "LEADER_ADDR_PORT", + "from": "leader.leader-port" + } + ] + } + spec: + containers: + - name: nginx + image: nginx:1.28.0 +``` +### Port Allocator Interface + +To facilitate future extensibility, an interface can be defined for dynamic port allocation, with different port allocation strategies implemented. The port allocation strategy interface is defined as follows: + +```go +// AllocateStrategy program startup flags +type AllocateStrategy string + +type PortAllocator struct { + strategy AllocateStrategy + pa PortAllocatorInterface + client client.Client +} + +type PortAllocatorInterface interface { + // Start is used to initialize the port allocator when the program starts + Start(client client.Client) error + // Release releases a port, input the port to release + Release(port int32) error + // AllocateBatch allocates multiple ports, input the number of ports to allocate, output the list of allocated port numbers + AllocateBatch(num int32) ([]int32, error) +} + +// Singleton pattern, created at program startup based on the port allocation strategy +var portAllocator *PortAllocator + +// GetPortAllocator +func GetPortAllocator() PortAllocatorInterface { + return portAllocator.pa +} +``` +### Implementation + +#### Controller Startup Modifications + +- **Controller Initialization**: + - Parse and validate port allocator configuration flags, including `AllocateStrategy`, `StartPort`, and `PortRange` + - Instantiate the corresponding port allocator implementation based on the configured `AllocateStrategy` + - Initialize the port allocator by calling its `Start` method with the Kubernetes client + +#### Instance Reconciler Modifications for `Dynamic` Policy Ports + +For ports with `Dynamic` policy, the Instance reconciler manages port allocation on a per-Pod basis: + +- **Pod Creation**: + 1. Parse the port allocator annotation to extract `Dynamic` policy port configurations + 2. Call the port allocator's `AllocateBatch` method to obtain the required number of ports + 3. Create or update a ConfigMap named `instace--ports` to store allocated ports: + - Data keys follow the format: `.` (e.g., `worker-0.grpc-port: "30001"`) + - The ConfigMap is owned by the Instance resource for automatic cleanup + 4. Inject allocated ports into the Pod specification: + - Add environment variables to containers using `valueFrom.configMapKeyRef` to reference the ConfigMap as specified by the `env` field + - Add Pod annotations as specified by the `annotationKey` field with the port value from ConfigMap + - Mount the ConfigMap for cross-pod service discovery if references are configured + +- **Pod Update**: + 1. Detect changes in port allocation requirements by comparing current and updated pod templates + 2. Allocate new ports via `AllocateBatch` for additional or modified port requirements + 3. Update the ConfigMap with new port allocations + 4. Update the Pod spec to reflect the new ConfigMap references: + - Update environment variable references (`valueFrom.configMapKeyRef`) to point to the new keys in ConfigMap + - Update Pod annotations with new port values from ConfigMap + 5. Release obsolete ports via the `Release` method after the updated Pod becomes ready + 6. Clean up removed port entries from the ConfigMap + +- **Pod Deletion**: + 1. Retrieve allocated port information from the ConfigMap for the deleted Pod + 2. Release all ports associated with the deleted Pod via the `Release` method + 3. Remove corresponding entries from the ConfigMap + 4. Delete the ConfigMap if no ports remain (optional cleanup policy) + +#### InstanceSet Reconciler Modifications for `Static` Policy Ports + +For ports with `Static` policy, the InstanceSet reconciler manages port allocation at the Role level: + +- **Instance Creation**: + 1. Parse the port allocator annotation to extract `Static` policy port configurations + 2. Call the port allocator's `AllocateBatch` method to obtain one port per port definition (shared across all replicas) + 3. Create or update a ConfigMap named `instanceset--ports` to store static port allocations: + - Data keys follow the format: `.` (e.g., `test.worker-port2: "30002"`) + - The ConfigMap is owned by the InstanceSet resource for automatic cleanup + 4. Inject the ConfigMap reference into the Pod template: + - Add volume referencing the ConfigMap to the Pod spec + +- **Instance Update**: + 1. Detect changes in static port requirements between old and new pod templates + 2. Allocate new ports via `AllocateBatch` for new or modified port requirements + 3. Update the ConfigMap with new static port allocations + 4. Update the Pod template volume references to the ConfigMap + 5. Release obsolete ports via the `Release` method once all Pods using the old template are terminated + +- **Instance Deletion**: + 1. Remove the corresponding key from the ConfigMap for the deleted instance + 2. If all instances in the InstanceSet are deleted, release all ports occupied by the InstanceSet via the `Release` method and delete the ConfigMap +### Test Plan + +#### Unit Tests + + +#### Integration tests + + +#### End to End Tests + + + + diff --git a/keps/171-pod-port-allocation/kep.yaml b/keps/171-pod-port-allocation/kep.yaml new file mode 100644 index 00000000..bdbef5ca --- /dev/null +++ b/keps/171-pod-port-allocation/kep.yaml @@ -0,0 +1,16 @@ +title: Pod Port Allocation +kep-number: 171 +authors: + - "@shangsimou" +status: provisional +creation-date: 2026-03-03 +reviewers: + - "@cheyang" +approvers: + +stage: alpha + +latest-milestone: "v0.7.0" + +milestone: + alpha: "v0.7.0"