Skip to content

Rename demo examples and add per-example documentation#170

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
willie-yao:improve-docs
Apr 28, 2026
Merged

Rename demo examples and add per-example documentation#170
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
willie-yao:improve-docs

Conversation

@willie-yao
Copy link
Copy Markdown
Contributor

Fixes #144

This PR renames demos and keeps documentation for each demo in the yaml file itself. The main README is modified to run through the first 3 examples and a separate README under /demo gives an overview of the rest of the demos. Also the default number of GPUs was changed to 2 to more accurately reflect real world scenarios, with the trade-off of having to clean up after running each demo.

@k8s-ci-robot k8s-ci-robot requested a review from elezar April 24, 2026 17:36
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 24, 2026
@k8s-ci-robot k8s-ci-robot requested a review from klueska April 24, 2026 17:36
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2026
@willie-yao
Copy link
Copy Markdown
Contributor Author

/assign @nojnhuh

@nojnhuh
Copy link
Copy Markdown
Contributor

nojnhuh commented Apr 24, 2026

@alaypatel07 Do the DRA scalability tests use any of these demo manifests? Or just the driver itself and the tests define their own Pods, ResourceClaims, etc.?

@alaypatel07
Copy link
Copy Markdown

The tests define their own pods.

Comment thread demo/admin-access.yaml
# This demo shows the DRA admin access feature with DRA_ADMIN_ACCESS
# environment variable.
#
# Key requirements:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to organize these examples by "profile"? i.e. demo/admin-access.yaml -> demo/gpu/admin-access.yaml

As we add more profiles, that would help users know how to install the driver to run this particular example. On the other hand, more generic features like admin access could work with any profile, so it might become harder to find than in a flat list like this.

Let's not change that right now, but something to consider later.

Comment thread demo/admin-access.yaml
Comment on lines +14 to +17
# Expected: The container has DRA_ADMIN_ACCESS=true and GPU_DEVICE env vars
# for all available GPUs. Check with:
# kubectl logs -n admin-access pod0 -c ctr0 | grep DRA_ADMIN_ACCESS
# kubectl logs -n admin-access pod0 -c ctr0 | grep GPU_DEVICE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not something we need to do now, but it would be awesome if we could make comments like this drive the e2e tests as well as serve as human-readable documentation. Then we wouldn't need to update the test logic at all when adding an example.

Comment thread demo/admin-access.yaml Outdated
# - 1 ResourceClaimTemplate (multiple-gpus-admin)
# - 1 Pod (pod0) with 1 container
#
# GPUs required: all available (uses allocationMode: All)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we organize this in a new section titled something like "driver requirements" and also include the profile required? Right now only the "gpu" profile is implemented but we'll be adding more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added then removed the profile requirement. Was that intentional?

If we still want to include that, we can fold this "GPUs required" note under the "Driver requirements" section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops I thought it was redundant and forgot that you mentioned it in the comment! I'll add it back

Comment thread demo/admin-access.yaml Outdated
# - 1 ResourceClaimTemplate (multiple-gpus-admin)
# - 1 Pod (pod0) with 1 container
#
# GPUs required: all available (uses allocationMode: All)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a new section for cluster requirements? For example, Kubernetes version and feature gates. I'll go through and comment on each example. This one needs Kubernetes 1.34+ and the DRAAdminAccess feature.

Comment thread demo/cel-selector.yaml
# One pod, one container
# Uses CEL expression selectors instead of matching on a DeviceClass
# Selects a GPU with a specific model and at least 4Gi of memory
# Example: CEL Expression Selectors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one requires Kubernetes 1.34+, no additional features.

Comment thread demo/README.md Outdated
Comment thread demo/test-admin-access.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR: I forgot about this script when reviewing #153 and now it's no longer called. Could you check to make sure these are all tested and port anything missing as part of #171?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think this was ever part of the e2e script and is only referenced in the README, so nothing to do here for #171.

Comment thread demo/basic-shared-claim-across-containers.yaml
Comment thread demo/basic-resourceclaimtemplate.yaml
Comment thread README.md Outdated
gpu-test4 pod0 0/1 Pending 0 2s
gpu-test5 pod0 0/4 Pending 0 2s
...
Clean up before the next example:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bump the default to 8 devices, then we wouldn't need to clean up in between each example. I've found it valuable before to have multiple examples running side-by-side at the same time to understand what's different between them. ResourceClaim status in particular is one of the more interesting parts that only exists at runtime and not in any of the manifests.

@pohly pohly moved this from 🆕 New to 🏗 In progress in Dynamic Resource Allocation Apr 27, 2026
Comment thread demo/README.md Outdated
Comment thread README.md Outdated
kind: List
metadata:
resourceVersion: ""
```bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to keep the README changes scoped to only renaming the examples? I'm happy to incorporate drive-by improvements, but let's at least keep those in separate commits.

Comment thread demo/basic-resourceclaim-opaque-config.yaml
Comment thread demo/basic-shared-claim-across-containers.yaml
Comment thread demo/basic-resourceclaimtemplate.yaml
Comment thread demo/basic-multiple-requests.yaml
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 27, 2026
Copy link
Copy Markdown
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go aside from renaming one more example.

Comment thread README.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's time to remove this since it uses the old names. It would be nice if each example had a README with its own Mermaid diagram or something. Not worth blocking this PR though IMO. I opened #176 to track that separately.

Comment thread README.md Outdated
kubectl apply --filename=demo/basic-resourceclaimtemplate.yaml \
--filename=demo/basic-multiple-requests.yaml \
--filename=demo/basic-shared-claim-across-containers.yaml \
--filename=demo/shared-claim-across-pods.yaml \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to make this a basic- one too.

Comment thread demo/test-admin-access.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think this was ever part of the e2e script and is only referenced in the README, so nothing to do here for #171.

…entation

Signed-off-by: William Yao <william2000yao@gmail.com>
Copy link
Copy Markdown
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 3639fe2f601d766acf1db61d38e49cbc4768cf35

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nojnhuh, willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2026
@k8s-ci-robot k8s-ci-robot merged commit a2ad0ff into kubernetes-sigs:main Apr 28, 2026
6 checks passed
@pohly pohly moved this from 🏗 In progress to ✅ Done in Dynamic Resource Allocation Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restructure examples and demo to support more examples

6 participants