Skip to content

Commit a2ad0ff

Browse files
authored
Merge pull request #170 from willie-yao/improve-docs
Rename demo examples and add per-example documentation
2 parents a538bc8 + 95ed732 commit a2ad0ff

14 files changed

Lines changed: 280 additions & 118 deletions

README.md

Lines changed: 42 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -244,43 +244,40 @@ Next, deploy four example apps that demonstrate how `ResourceClaim`s,
244244
`ResourceClaimTemplate`s, and custom `GpuConfig` objects can be used to
245245
select and configure resources in various ways:
246246
```bash
247-
kubectl apply --filename=demo/gpu-test{1,2,3,4,5}.yaml
247+
kubectl apply --filename=demo/basic-resourceclaimtemplate.yaml \
248+
--filename=demo/basic-multiple-requests.yaml \
249+
--filename=demo/basic-shared-claim-across-containers.yaml \
250+
--filename=demo/basic-shared-claim-across-pods.yaml \
251+
--filename=demo/basic-resourceclaim-opaque-config.yaml
248252
```
249253

250254
And verify that they are coming up successfully:
251255
```console
252256
$ kubectl get pod -A
253-
NAMESPACE NAME READY STATUS RESTARTS AGE
257+
NAMESPACE NAME READY STATUS RESTARTS AGE
254258
...
255-
gpu-test1 pod0 0/1 Pending 0 2s
256-
gpu-test1 pod1 0/1 Pending 0 2s
257-
gpu-test2 pod0 0/2 Pending 0 2s
258-
gpu-test3 pod0 0/1 ContainerCreating 0 2s
259-
gpu-test3 pod1 0/1 ContainerCreating 0 2s
260-
gpu-test4 pod0 0/1 Pending 0 2s
261-
gpu-test5 pod0 0/4 Pending 0 2s
259+
basic-resourceclaimtemplate pod0 0/1 Pending 0 2s
260+
basic-resourceclaimtemplate pod1 0/1 Pending 0 2s
261+
basic-multiple-requests pod0 0/2 Pending 0 2s
262+
basic-shared-claim-across-containers pod0 0/1 ContainerCreating 0 2s
263+
basic-shared-claim-across-containers pod1 0/1 ContainerCreating 0 2s
264+
basic-shared-claim-across-pods pod0 0/1 Pending 0 2s
265+
basic-resourceclaim-opaque-config pod0 0/4 Pending 0 2s
262266
...
263267
```
264268

265-
Use your favorite editor to look through each of the `gpu-test{1,2,3,4,5}.yaml`
266-
files and see what they are doing. The semantics of each match the figure
267-
below:
268-
269-
![Demo Apps Figure](demo/demo-apps.png?raw=true "Semantics of the applications requesting resources from the example DRA resource driver.")
269+
Use your favorite editor to look through each of the `basic-*.yaml`
270+
files and see what they are doing.
270271

271272
Then dump the logs of each app to verify that GPUs were allocated to them
272273
according to these semantics:
273274
```bash
274-
for example in $(seq 1 5); do \
275-
echo "gpu-test${example}:"
276-
for pod in $(kubectl get pod -n gpu-test${example} --output=jsonpath='{.items[*].metadata.name}'); do \
277-
for ctr in $(kubectl get pod -n gpu-test${example} ${pod} -o jsonpath='{.spec.containers[*].name}'); do \
275+
for ns in basic-resourceclaimtemplate basic-multiple-requests basic-shared-claim-across-containers basic-shared-claim-across-pods basic-resourceclaim-opaque-config; do \
276+
echo "${ns}:"
277+
for pod in $(kubectl get pod -n ${ns} --output=jsonpath='{.items[*].metadata.name}'); do \
278+
for ctr in $(kubectl get pod -n ${ns} ${pod} -o jsonpath='{.spec.containers[*].name}'); do \
278279
echo "${pod} ${ctr}:"
279-
if [ "${example}" -lt 3 ]; then
280-
kubectl logs -n gpu-test${example} ${pod} -c ${ctr}| grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"
281-
else
282-
kubectl logs -n gpu-test${example} ${pod} -c ${ctr}| grep -E "GPU_DEVICE_[0-9]+" | grep -v "RESOURCE_CLAIM"
283-
fi
280+
kubectl logs -n ${ns} ${pod} -c ${ctr}| grep -E "GPU_DEVICE_[0-9]+" | grep -v "RESOURCE_CLAIM"
284281
done
285282
done
286283
echo ""
@@ -289,18 +286,18 @@ done
289286

290287
This should produce output similar to the following:
291288
```bash
292-
gpu-test1:
289+
basic-resourceclaimtemplate:
293290
pod0 ctr0:
294291
declare -x GPU_DEVICE_6="gpu-6"
295292
pod1 ctr0:
296293
declare -x GPU_DEVICE_7="gpu-7"
297294

298-
gpu-test2:
295+
basic-multiple-requests:
299296
pod0 ctr0:
300297
declare -x GPU_DEVICE_0="gpu-0"
301298
declare -x GPU_DEVICE_1="gpu-1"
302299

303-
gpu-test3:
300+
basic-shared-claim-across-containers:
304301
pod0 ctr0:
305302
declare -x GPU_DEVICE_2="gpu-2"
306303
declare -x GPU_DEVICE_2_SHARING_STRATEGY="TimeSlicing"
@@ -310,7 +307,7 @@ declare -x GPU_DEVICE_2="gpu-2"
310307
declare -x GPU_DEVICE_2_SHARING_STRATEGY="TimeSlicing"
311308
declare -x GPU_DEVICE_2_TIMESLICE_INTERVAL="Default"
312309

313-
gpu-test4:
310+
basic-shared-claim-across-pods:
314311
pod0 ctr0:
315312
declare -x GPU_DEVICE_3="gpu-3"
316313
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
@@ -320,7 +317,7 @@ declare -x GPU_DEVICE_3="gpu-3"
320317
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
321318
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
322319

323-
gpu-test5:
320+
basic-resourceclaim-opaque-config:
324321
pod0 ts-ctr0:
325322
declare -x GPU_DEVICE_4="gpu-4"
326323
declare -x GPU_DEVICE_4_SHARING_STRATEGY="TimeSlicing"
@@ -353,14 +350,14 @@ This example driver includes support for the [DRA AdminAccess feature](https://k
353350

354351
#### Usage Example
355352

356-
See `demo/gpu-test7.yaml` for a complete example. Key points:
353+
See `demo/admin-access.yaml` for a complete example. Key points:
357354

358355
1. **Namespace**: Must have the `resource.kubernetes.io/admin-access` label set to create ResourceClaimTemplate and ResourceClaim with `adminAccess: true` for Kubernetes v1.34+.
359356
```yaml
360357
apiVersion: v1
361358
kind: Namespace
362359
metadata:
363-
name: gpu-test7
360+
name: admin-access
364361
labels:
365362
resource.kubernetes.io/admin-access: "true"
366363
```
@@ -399,22 +396,27 @@ This demonstration shows the end-to-end flow of the DRA AdminAccess feature. In
399396
Once you have verified everything is running correctly, delete all of the
400397
example apps:
401398
```bash
402-
kubectl delete --wait=false --filename=demo/gpu-test{1,2,3,4,5,7}.yaml
399+
kubectl delete --wait=false --filename=demo/basic-resourceclaimtemplate.yaml \
400+
--filename=demo/basic-multiple-requests.yaml \
401+
--filename=demo/basic-shared-claim-across-containers.yaml \
402+
--filename=demo/basic-shared-claim-across-pods.yaml \
403+
--filename=demo/basic-resourceclaim-opaque-config.yaml \
404+
--filename=demo/admin-access.yaml
403405
```
404406

405407
And wait for them to terminate:
406408
```console
407409
$ kubectl get pod -A
408-
NAMESPACE NAME READY STATUS RESTARTS AGE
410+
NAMESPACE NAME READY STATUS RESTARTS AGE
409411
...
410-
gpu-test1 pod0 1/1 Terminating 0 31m
411-
gpu-test1 pod1 1/1 Terminating 0 31m
412-
gpu-test2 pod0 2/2 Terminating 0 31m
413-
gpu-test3 pod0 1/1 Terminating 0 31m
414-
gpu-test3 pod1 1/1 Terminating 0 31m
415-
gpu-test4 pod0 1/1 Terminating 0 31m
416-
gpu-test5 pod0 4/4 Terminating 0 31m
417-
gpu-test7 pod0 1/1 Terminating 0 31m
412+
basic-resourceclaimtemplate pod0 1/1 Terminating 0 31m
413+
basic-resourceclaimtemplate pod1 1/1 Terminating 0 31m
414+
basic-multiple-requests pod0 2/2 Terminating 0 31m
415+
basic-shared-claim-across-containers pod0 1/1 Terminating 0 31m
416+
basic-shared-claim-across-containers pod1 1/1 Terminating 0 31m
417+
basic-shared-claim-across-pods pod0 1/1 Terminating 0 31m
418+
basic-resourceclaim-opaque-config pod0 4/4 Terminating 0 31m
419+
admin-access pod0 1/1 Terminating 0 31m
418420
...
419421
```
420422

demo/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Demo Examples
2+
3+
This directory contains example workloads that demonstrate different ways to
4+
request and configure devices using Dynamic Resource Allocation (DRA).
5+
6+
Examples prefixed with `basic-` are a good starting point for
7+
learning about DRA.
8+
9+
Each example file has detailed comments at the top explaining what it
10+
demonstrates, what output to expect, and the driver and cluster requirements.
11+
12+
## Running Examples
13+
14+
Each example can be run individually:
15+
16+
```bash
17+
kubectl apply -f demo/<example-name>.yaml
18+
```
19+
20+
To clean up:
21+
22+
```bash
23+
kubectl delete -f demo/<example-name>.yaml
24+
```
25+
26+
## Notes
27+
28+
- The default Helm chart configures **8 GPUs** per node, which is enough to run
29+
several examples simultaneously.
30+
- Each example creates its own namespace, so examples don't interfere with
31+
each other's resource names.
Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,46 @@
1-
# One Namespace with admin access label
2-
# One pod with one container requesting all GPUs with admin access
3-
# This demo shows the DRA admin access feature with DRA_ADMIN_ACCESS environment variable
1+
# Example: DRA Admin Access
2+
#
3+
# One namespace with admin access label.
4+
# One pod with one container requesting all GPUs with admin access.
5+
# This demo shows the DRA admin access feature with DRA_ADMIN_ACCESS
6+
# environment variable.
7+
#
8+
# Key requirements:
9+
# - The namespace must have the label:
10+
# resource.kubernetes.io/admin-access: "true"
11+
# - The request must set adminAccess: true
12+
# - "allocationMode: All" is used here to access all available GPUs on a Node.
13+
# Admins typically require access to all devices on a node to perform
14+
# maintenance or monitoring.
15+
#
16+
# Expected: The container has DRA_ADMIN_ACCESS=true and GPU_DEVICE env vars
17+
# for all available GPUs. Check with:
18+
# kubectl logs -n admin-access pod0 -c ctr0 | grep DRA_ADMIN_ACCESS
19+
# kubectl logs -n admin-access pod0 -c ctr0 | grep GPU_DEVICE
20+
#
21+
# Driver requirements:
22+
# Profile: gpu
23+
# GPUs: all available on a Node (uses allocationMode: All)
24+
#
25+
# Cluster requirements:
26+
# Kubernetes 1.34+
27+
# Feature gate: DRAAdminAccess
428

529
---
630
apiVersion: v1
731
kind: Namespace
832
metadata:
9-
name: gpu-test7
33+
name: admin-access
1034
labels:
1135
resource.kubernetes.io/admin-access: "true"
1236
---
1337
apiVersion: resource.k8s.io/v1
1438
kind: ResourceClaimTemplate
1539
metadata:
16-
namespace: gpu-test7
40+
namespace: admin-access
1741
name: multiple-gpus-admin
1842
spec:
19-
spec:
43+
spec:
2044
devices:
2145
requests:
2246
- name: admin-gpu
@@ -29,7 +53,7 @@ spec:
2953
apiVersion: v1
3054
kind: Pod
3155
metadata:
32-
namespace: gpu-test7
56+
namespace: admin-access
3357
name: pod0
3458
spec:
3559
containers:
Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,30 @@
1-
# One pod, one container
2-
# Asking for 2 distinct GPUs
1+
# Example: One Pod, Two GPUs
2+
#
3+
# One pod, one container.
4+
# Asking for 2 distinct GPUs.
5+
#
6+
# Expected: The container gets 2 different GPUs. Check with:
7+
# kubectl logs -n basic-multiple-requests pod0 -c ctr0 | grep GPU_DEVICE
8+
# The container should have 2 GPU_DEVICE env vars with distinct GPU IDs.
9+
#
10+
# Driver requirements:
11+
# Profile: gpu
12+
# GPUs: 2
13+
#
14+
# Cluster requirements:
15+
# Kubernetes 1.34+
316

417
---
518
apiVersion: v1
619
kind: Namespace
720
metadata:
8-
name: gpu-test2
21+
name: basic-multiple-requests
922

1023
---
1124
apiVersion: resource.k8s.io/v1
1225
kind: ResourceClaimTemplate
1326
metadata:
14-
namespace: gpu-test2
27+
namespace: basic-multiple-requests
1528
name: multiple-gpus
1629
spec:
1730
spec:
@@ -28,7 +41,7 @@ spec:
2841
apiVersion: v1
2942
kind: Pod
3043
metadata:
31-
namespace: gpu-test2
44+
namespace: basic-multiple-requests
3245
name: pod0
3346
labels:
3447
app: pod

demo/gpu-test5.yaml renamed to demo/basic-resourceclaim-opaque-config.yaml

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,37 @@
1-
# One pod, 1 container
2-
# Run as deployment with 1 replica
1+
# Example: GPU Sharing Strategies (TimeSlicing + SpacePartitioning)
2+
#
3+
# One pod, four containers, two GPUs with custom GpuConfig:
4+
#
5+
# - ts-gpu: Configured with TimeSlicing (interval: Long). Two containers
6+
# (ts-ctr0, ts-ctr1) share this GPU by taking turns.
7+
#
8+
# - sp-gpu: Configured with SpacePartitioning (partitionCount: 10). Two
9+
# containers (sp-ctr0, sp-ctr1) each get a partition of this GPU.
10+
#
11+
# Expected: ts-ctr0 and ts-ctr1 share one GPU with SHARING_STRATEGY=TimeSlicing
12+
# and TIMESLICE_INTERVAL=Long. sp-ctr0 and sp-ctr1 share a different GPU with
13+
# SHARING_STRATEGY=SpacePartitioning and PARTITION_COUNT=10. Check with:
14+
# kubectl logs -n basic-resourceclaim-opaque-config pod0 -c ts-ctr0 | grep GPU_DEVICE
15+
# kubectl logs -n basic-resourceclaim-opaque-config pod0 -c sp-ctr0 | grep GPU_DEVICE
16+
#
17+
# Driver requirements:
18+
# Profile: gpu
19+
# GPUs: 2
20+
#
21+
# Cluster requirements:
22+
# Kubernetes 1.34+
323

424
---
525
apiVersion: v1
626
kind: Namespace
727
metadata:
8-
name: gpu-test5
28+
name: basic-resourceclaim-opaque-config
929

1030
---
1131
apiVersion: resource.k8s.io/v1
1232
kind: ResourceClaimTemplate
1333
metadata:
14-
namespace: gpu-test5
34+
namespace: basic-resourceclaim-opaque-config
1535
name: multiple-gpus
1636
spec:
1737
spec:
@@ -49,7 +69,7 @@ spec:
4969
apiVersion: v1
5070
kind: Pod
5171
metadata:
52-
namespace: gpu-test5
72+
namespace: basic-resourceclaim-opaque-config
5373
name: pod0
5474
spec:
5575
containers:

0 commit comments

Comments
 (0)