Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
294 changes: 294 additions & 0 deletions keps/171-pod-port-allocation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# KEP-171 Pod Port Allocation
<!-- toc -->
- [Motivation](#motivation)
- [Goals](#goals)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [API](#api)
- [Port Allocator Interface](#port-allocator-interface)
- [Implementation](#implementation)
- [Test Plan](#test-plan)
- [Unit Tests](#unit-tests)
- [Integration tests](#integration-tests)
- [End to End Tests](#end-to-end-tests)
<!-- /toc -->

## Motivation
When deploying LLM services using the PD (Prefill-Decode) disaggregated approach, containers often need to use `hostNetwork: true` to leverage RDMA networks. In the hostNetwork scenario, if two replicas of the same Role are scheduled to the same node, port conflicts will occur because both containers are configured with the same port, resulting in only one replica being able to start successfully.

Therefore, RBG needs to provide a dynamic port allocation solution that can automatically assign different port numbers to replicas during service deployment, thereby improving the deployment density of inference service Pods.

## Goals
- RBG applications should be able to automatically allocate service ports for containers on-demand during deployment, avoiding port conflicts in hostNetwork scenarios.
- The dynamically assigned ports for each Pod can be injected into the generated Pod through "container environment variables" and "Pod annotations", allowing containers to be aware of their assigned port values.
- It should be possible to obtain the port values allocated to other Pods within the current Role replicas on-demand, for Pod service discovery purposes.

## Proposal

### User Stories

#### Story 1
> Deploying inference services in a cluster

As a cluster administrator, I want to deploy my model service using RBG in the cluster.
The service requires hostNetwork to be enabled for both the Prefill and Decoder roles.
However, since there are existing services also using hostNetwork, I need to avoid port conflicts.


#### Story 2
> Scale out Role replicas

As a LLM service operator, I noticed that the Decoder Role's load is very high recently and I need to scale out more replicas for the Decoder Role.
If each replica can use different ports, I can complete the deployment with fewer nodes when resources are sufficient.

### Risks and Mitigations

- The dynamic port allocation feature is only supported in *InstanceSet* mode.

## Design Details
When the Role of an RBG resource is InstanceSet, the resource directly associated with the Pod is the Instance resource.
Therefore, the current design requires dynamic port allocation to be completed within the scope of the Instance.

### API
If users want to configure dynamic ports for a Pod generated by an RBG object, they need to add the following annotation in the corresponding Pod Template of the RBG object:
```yaml
rolebasedgroup.workloads.x-k8s.io/port-allocator: |
{
"allocations": [
{
"name": "grpc", // Logical name
"env": "GRPC_PORT", // Env var to inject
"annotationKey": "test/grpc-port", // annotationKey to inject
"policy": "Dynamic" // Dynamic (per-pod) or Static (per-role)
}
],
"references": [
{
"env": "LEADER_ADDR_PORT", // Env var to inject
"from": "leader.grpc" // Format: "<role_name>.<port_name>"
}
]
}
```
The JSON field definitions in the annotation are as follows:
```go
type PortPolicy string

const (
// Port is only valid for the current Pod
Dynamic PortPolicy = "Dynamic"
// Port is valid for all Pod replicas in the current role
Static PortPolicy = "Static"
)

type PortAllocatorConfig struct {
// Allocations specifies the ports to be allocated
Allocations []PortAllocation `json:"allocations"`
// References specifies the ports to be referenced from other pod
References []PortReference `json:"references"`
}

type PortAllocation struct {
// Not Empty
// Name specifies the name of the port
Name string `json:"name"`
// Not Empty
// Env specifies the name of the environment variable to be injected into the container
Env string `json:"env"`
// AnnotationKey specifies the key of the annotation to be injected into the Pod
AnnotationKey string `json:"annotationKey"`
// Not Empty
// Default is Dynamic
// Policy specifies the scope of the port
Policy PortPolicy `json:"policy"`
}

type PortReference struct {
// Not Empty
// Env specifies the name of the environment variable to be injected into the container
Env string `json:"env"`
// Not Empty
// From specifies the name of the port to be referenced
From string `json:"from"`
}
```
For example, when using the following configuration:
- The leader Pod will be allocated one port
- Port 1: Injected into the container as the `LEADER_PORT` environment variable.
- Each worker Pod will be allocated two ports and reference one from the leader Pod:
- **Dynamic Port** (`WORKER_PORT1`): Unique per Pod, annotated as `test/worker-port1`
- **Static Port** (`WORKER_PORT2`): Shared across Pods, annotated as `test/worker-port2`
- **Reference Port** (`LEADER_PORT_REF`): Points to leader's Port 1
```yaml
apiVersion: workloads.x-k8s.io/v1alpha1
kind: Instance
metadata:
name: test
namespace: default
spec:
readyPolicy: AllPodReady
components:
- name: leader
size: 1
template:
metadata:
annotations:
rolebasedgroup.workloads.x-k8s.io/port-allocator: |
{
"allocations": [
{
"name": "leader-port",
"env": "LEADER_PORT",
"annotationKey": "test/grpc-port",
"policy": "Dynamic"
}
]
}
spec:
containers:
- name: nginx
image: nginx:1.28.0
- name: worker
size: 2
template:
metadata:
annotations:
rolebasedgroup.workloads.x-k8s.io/port-allocator: |
{
"allocations": [
{
"name": "worker-port1",
"env": "WORKER_PORT1",
"annotationKey": "test/worker-port1",
"policy": "Dynamic"
},
{
"name": "worker-port2",
"env": "WORKER_PORT2",
"annotationKey": "test/worker-port2",
"policy": "Static"
}
],
"references": [
{
"env": "LEADER_ADDR_PORT",
"from": "leader.leader-port"
}
]
}
spec:
containers:
- name: nginx
image: nginx:1.28.0
```
### Port Allocator Interface

To facilitate future extensibility, an interface can be defined for dynamic port allocation, with different port allocation strategies implemented. The port allocation strategy interface is defined as follows:

```go
// AllocateStrategy program startup flags
type AllocateStrategy string

type PortAllocator struct {
strategy AllocateStrategy
pa PortAllocatorInterface
client client.Client
}

type PortAllocatorInterface interface {
// Start is used to initialize the port allocator when the program starts
Start(client client.Client) error
// Release releases a port, input the port to release
Release(port int32) error
// AllocateBatch allocates multiple ports, input the number of ports to allocate, output the list of allocated port numbers
AllocateBatch(num int32) ([]int32, error)
}

// Singleton pattern, created at program startup based on the port allocation strategy
var portAllocator *PortAllocator

// GetPortAllocator
func GetPortAllocator() PortAllocatorInterface {
return portAllocator.pa
}
Comment on lines +211 to +217
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposed design uses a global singleton pattern for the portAllocator. This introduces global state, which can make testing difficult and the system harder to reason about. Consider using dependency injection instead. The PortAllocator could be instantiated in main.go and passed down to the reconcilers that need it. This would make the dependencies explicit and improve testability.

```
### Implementation

#### Controller Startup Modifications

- **Controller Initialization**:
- Parse and validate port allocator configuration flags, including `AllocateStrategy`, `StartPort`, and `PortRange`
- Instantiate the corresponding port allocator implementation based on the configured `AllocateStrategy`
- Initialize the port allocator by calling its `Start` method with the Kubernetes client

#### Instance Reconciler Modifications for `Dynamic` Policy Ports

For ports with `Dynamic` policy, the Instance reconciler manages port allocation on a per-Pod basis:

- **Pod Creation**:
1. Parse the port allocator annotation to extract `Dynamic` policy port configurations
2. Call the port allocator's `AllocateBatch` method to obtain the required number of ports
3. Create or update a ConfigMap named `instace-<instance-name>-ports` to store allocated ports:
- Data keys follow the format: `<pod-name>.<port-name>` (e.g., `worker-0.grpc-port: "30001"`)
- The ConfigMap is owned by the Instance resource for automatic cleanup
4. Inject allocated ports into the Pod specification:
- Add environment variables to containers using `valueFrom.configMapKeyRef` to reference the ConfigMap as specified by the `env` field
- Add Pod annotations as specified by the `annotationKey` field with the port value from ConfigMap
- Mount the ConfigMap for cross-pod service discovery if references are configured

- **Pod Update**:
1. Detect changes in port allocation requirements by comparing current and updated pod templates
2. Allocate new ports via `AllocateBatch` for additional or modified port requirements
3. Update the ConfigMap with new port allocations
4. Update the Pod spec to reflect the new ConfigMap references:
- Update environment variable references (`valueFrom.configMapKeyRef`) to point to the new keys in ConfigMap
- Update Pod annotations with new port values from ConfigMap
5. Release obsolete ports via the `Release` method after the updated Pod becomes ready
6. Clean up removed port entries from the ConfigMap

- **Pod Deletion**:
1. Retrieve allocated port information from the ConfigMap for the deleted Pod
2. Release all ports associated with the deleted Pod via the `Release` method
3. Remove corresponding entries from the ConfigMap
4. Delete the ConfigMap if no ports remain (optional cleanup policy)

#### InstanceSet Reconciler Modifications for `Static` Policy Ports

For ports with `Static` policy, the InstanceSet reconciler manages port allocation at the Role level:

- **Instance Creation**:
1. Parse the port allocator annotation to extract `Static` policy port configurations
2. Call the port allocator's `AllocateBatch` method to obtain one port per port definition (shared across all replicas)
3. Create or update a ConfigMap named `instanceset-<instanceset-name>-ports` to store static port allocations:
- Data keys follow the format: `<instance-name>.<port-name>` (e.g., `test.worker-port2: "30002"`)
- The ConfigMap is owned by the InstanceSet resource for automatic cleanup
4. Inject the ConfigMap reference into the Pod template:
- Add volume referencing the ConfigMap to the Pod spec

- **Instance Update**:
1. Detect changes in static port requirements between old and new pod templates
2. Allocate new ports via `AllocateBatch` for new or modified port requirements
3. Update the ConfigMap with new static port allocations
4. Update the Pod template volume references to the ConfigMap
5. Release obsolete ports via the `Release` method once all Pods using the old template are terminated

- **Instance Deletion**:
1. Remove the corresponding key from the ConfigMap for the deleted instance
2. If all instances in the InstanceSet are deleted, release all ports occupied by the InstanceSet via the `Release` method and delete the ConfigMap
### Test Plan

#### Unit Tests


#### Integration tests


#### End to End Tests


Comment on lines +282 to +292
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Test Plan section is currently empty. For a feature of this nature, it's crucial to outline a comprehensive testing strategy. Please detail the planned unit, integration, and end-to-end tests. For example:

  • Unit Tests: Cover the port allocation logic, including edge cases like port range exhaustion.
  • Integration Tests: Verify that the controller correctly injects ports into Pods and releases them upon deletion.
  • End-to-End Tests: Test the full workflow in a cluster, including scenarios with hostNetwork: true and multiple replicas on the same node.



16 changes: 16 additions & 0 deletions keps/171-pod-port-allocation/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
title: Pod Port Allocation
kep-number: 171
authors:
- "@shangsimou"
status: provisional
creation-date: 2026-03-03
reviewers:
- "@cheyang"
approvers:

stage: alpha

latest-milestone: "v0.7.0"

milestone:
alpha: "v0.7.0"