@@ -465,3 +465,222 @@ oc patch ingresscontroller/default -n openshift-ingress-operator --type=merge \
465465` ` `
466466
467467Until the secret is populated, the router keeps serving the installer default; after issuance, HAProxy reload picks up the Let’s Encrypt chain.
468+
469+ # # Day-2: Add GPU Node
470+
471+ === "Download"
472+
473+ ` ` ` shell
474+ curl -L -O {{ page.canonical_url }}ign-gpu-worker-0.rcc
475+ ` ` `
476+
477+ === "ign-gpu-worker-0.rcc"
478+
479+ ` ` ` json
480+ --8<-- "content/cluster-installation/stackit/ign-gpu-worker-0.rcc"
481+ ` ` `
482+
483+ ` ` ` shell
484+ stackit server create \
485+ --assume-yes \
486+ --availability-zone eu01-1 \
487+ --machine-type n2.14d.g1 \
488+ --name "cluster-a-gpu-worker-0" \
489+ --boot-volume-source-type image \
490+ --boot-volume-source-id 6055861d-6641-4a45-b00e-fcfb250d65e6 \
491+ --boot-volume-delete-on-termination \
492+ --boot-volume-size 120 \
493+ --network-id 459afb3e-54fa-45d4-a972-ae39ec370761 \
494+ --user-data @<(butane -d . -r "ign-gpu-worker-0.rcc")
495+ ` ` `
496+
497+ Wait and approve CSR
498+
499+ ` ` ` shell
500+ export KUBECONFIG="$PWD/conf/auth/kubeconfig"
501+ oc get csr | awk '/Pending/{print $1}' | xargs oc adm certificate approve
502+ ` ` `
503+
504+ Install Nvidia GPU Operator : [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html)
505+
506+ ` ` ` shell
507+ oc create -f - <<EOF
508+ apiVersion: v1
509+ kind: Pod
510+ metadata:
511+ name: nvidia-smi
512+ spec:
513+ containers:
514+ - image: registry.redhat.io/rhai/base-image-cuda-13.0-rhel9:3.3.1-1775076057
515+ name: nvidia-smi
516+ command: [ nvidia-smi ]
517+ resources:
518+ limits:
519+ nvidia.com/gpu: 1
520+ requests:
521+ nvidia.com/gpu: 1
522+ EOF
523+
524+ $ oc logs nvidia-smi
525+ Tue May 5 13:27:54 2026
526+ +-----------------------------------------------------------------------------------------+
527+ | NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
528+ +-----------------------------------------+------------------------+----------------------+
529+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
530+ | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
531+ | | | MIG M. |
532+ |=========================================+========================+======================|
533+ | 0 NVIDIA L40S On | 00000000:05:00.0 Off | 0 |
534+ | N/A 29C P8 36W / 350W | 0MiB / 46068MiB | 0% Default |
535+ | | | N/A |
536+ +-----------------------------------------+------------------------+----------------------+
537+
538+ +-----------------------------------------------------------------------------------------+
539+ | Processes: |
540+ | GPU GI CI PID Type Process name GPU Memory |
541+ | ID ID Usage |
542+ |=========================================================================================|
543+ | No running processes found |
544+ +-----------------------------------------------------------------------------------------+
545+ ` ` `
546+
547+ # # Day-2: Deploy Cloud Controller Manager (CCM)
548+
549+ [Upstream documentation](https://github.com/stackitcloud/cloud-provider-stackit/blob/main/docs/deployment.md)
550+
551+ * Create an Service Account at STACKIT called `ccm-and-csi`
552+ * Create Service account keys and download the json file
553+ * Assign editor role for the entire project.
554+
555+ Deployment steps :
556+
557+ ` ` ` shell
558+ oc create secret generic -n kube-system stackit-cloud-secret --from-file=sa_key.json=<service account json>
559+ ` ` `
560+
561+ === "Download"
562+
563+ ` ` ` shell
564+ curl -L -O {{ page.canonical_url }}cloud.yaml
565+ ` ` `
566+
567+ === "cloud.yaml"
568+
569+ ` ` ` json
570+ --8<-- "content/cluster-installation/stackit/cloud.yaml"
571+ ` ` `
572+
573+ Adjust cloud.yaml and put into configmap :
574+
575+ ` ` ` shell
576+ oc create configmap -n kube-system stackit-cloud-config --from-file=cloud.yaml
577+ ` ` `
578+
579+ Deploy cloud controller manager :
580+
581+ ` ` ` shell
582+ oc apply -f https://raw.githubusercontent.com/stackitcloud/cloud-provider-stackit/refs/heads/main/deploy/cloud-controller-manager/rbac.yaml
583+ oc apply -f https://github.com/stackitcloud/cloud-provider-stackit/raw/refs/heads/main/deploy/cloud-controller-manager/service.yaml
584+ ` ` `
585+
586+ === "Apply"
587+
588+ ` ` ` shell
589+ oc apply -f{{ page.canonical_url }}ccm-and-csi-deployment.yaml
590+ ` ` `
591+
592+ === "ccm-and-csi-deployment.yaml"
593+
594+ ` ` ` json
595+ --8<-- "content/cluster-installation/stackit/ccm-and-csi-deployment.yaml"
596+ ` ` `
597+
598+ ???+ failure
599+
600+ ```
601+ starting Controller
602+ I0507 12:55:45.723462 1 serving.go:411] Generated self-signed cert in-memory
603+ W0507 12:55:45.723531 1 client_config.go:683] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
604+ panic : runtime error: invalid memory address or nil pointer dereference
605+ [signal SIGSEGV : segmentation violation code=0x1 addr=0x18 pc=0x4fe685]
606+ ` ` `
607+
608+ **No further investigation, because it looks CSI works without CCM**
609+
610+ ## Day-2: Container Storage Interface (CSI) Driver
611+
612+ [Upstream documentation](https://github.com/stackitcloud/cloud-provider-stackit/blob/main/docs/csi-driver.md)
613+
614+ * Create an Service Account at STACKIT called ` ccm-and-csi` (already done in `[Day-2: Deploy Cloud Controller Manager (CCM)` )
615+ * Create Service account keys and download the json file (already done in `[Day-2: Deploy Cloud Controller Manager (CCM)` )
616+ * Assign editor role for the entire project. (already done in `[Day-2: Deploy Cloud Controller Manager (CCM)` )
617+
618+ Let's deploy the csi-driver into namespace/project `stackit-csi-driver`
619+
620+ ` ` ` shell
621+ oc new-project stackit-csi-driver
622+ ` ` `
623+
624+ ` ` ` shell
625+ oc create secret generic -n kube-system stackit-cloud-secret --from-file=sa_key.json=<service account json>
626+ ` ` `
627+
628+ === "Download"
629+
630+ ` ` ` shell
631+ curl -L -O {{ page.canonical_url }}cloud.yaml
632+ ` ` `
633+
634+ === "cloud.yaml"
635+
636+ ` ` ` json
637+ --8<-- "content/cluster-installation/stackit/cloud.yaml"
638+ ` ` `
639+
640+ Adjust cloud.yaml and put into configmap :
641+
642+ ` ` ` shell
643+ oc create configmap stackit-cloud-config --from-file=cloud.yaml
644+ ` ` `
645+
646+ Allow the CSI node componentes to run privileged, looks like it only need hostpath and hostnetwork. It's recommended to pick and/or create a more precise security context constraint.
647+
648+ ` ` ` shell
649+ oc adm policy add-scc-to-user privileged -z csi-stackit-node-sa
650+ ` ` `
651+
652+ Download and apply `kustomization.yaml` to deploy csi driver into specific namespace and propper image url
653+
654+ === "Download"
655+
656+ ` ` ` shell
657+ curl -L -O {{ page.canonical_url }}kustomization.yaml
658+ ` ` `
659+
660+ === "kustomization.yaml"
661+
662+ ` ` ` json
663+ --8<-- "content/cluster-installation/stackit/kustomization.yaml"
664+ ` ` `
665+
666+ ` ` ` shell
667+ oc apply -k .
668+ ` ` `
669+
670+ # ## Let's try to storage
671+
672+ ` ` ` shell
673+ oc new-project storage-test
674+ ` ` `
675+
676+ === "Apply"
677+
678+ ` ` ` shell
679+ oc apply -f {{ page.canonical_url }}lets-try-storage.yaml
680+ ` ` `
681+
682+ === "lets-try-storage.yaml"
683+
684+ ` ` ` json
685+ --8<-- "content/cluster-installation/stackit/lets-try-storage.yaml"
686+ ` ` `
0 commit comments