Skip to content

[MWG-1605] feat: add cube and gpu support via templates#351

Open
lucasl0st wants to merge 10 commits into
ionos-cloud:mainfrom
lucasl0st:gpus
Open

[MWG-1605] feat: add cube and gpu support via templates#351
lucasl0st wants to merge 10 commits into
ionos-cloud:mainfrom
lucasl0st:gpus

Conversation

@lucasl0st

@lucasl0st lucasl0st commented Apr 1, 2026

Copy link
Copy Markdown

What is the purpose of this pull request/Why do we need it?

We need kubernetes nodes with GPUs.

Description of changes:

GPU servers are similar to cubes, they use templates.
Therefore I added support for templates and since CUBE is just another server type, I added support for it in addition to GPU as well.
Added an e2e test that tests with cubes because testing with GPUs requires an image with UEFI and would also just be too expensive.

Checklist:

  • Documentation updated
  • Unit Tests added
  • E2E Tests added
  • Includes emojis

@lucasl0st

Copy link
Copy Markdown
Author

@jriedel-ionos I want to add an additional e2e test for cubes, could you please set the variable IONOSCLOUD_CUBE_TEMPLATE_ID to 72e73b81-8551-4e74-b398-fc63b39994af (smallest cube XS)


// when using templates (cubes or gpu servers) we cannot delete the boot volume
// the whole server must be deleted at once
if !deleteVolumes && bootVolumeID != nil && (server.Properties != nil && server.Properties.TemplateUuid == nil) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this change here I am not 100% sure if this works as expected.
when using templates you must delete the whole server including the boot volume at once, you cannot detach or delete the boot volume by itself.

but we also dont want to delete the attached volumes from PVCs.
in testing I noticed that CAPI (or CAPIC, not sure) waits until all PVCs are detached, I could perform a node rebuild/deletion without loosing the PVC volumes. but I am not sure if this a guarantee

@lpape-ionos

Copy link
Copy Markdown
Contributor

Note to reviewers: I have this running on our teams sandbox here: https://github.com/ionos-cloud/mwg-deployment/tree/main/projects/sandbox-cluster/capi/templates

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for provisioning IONOS Cloud CUBE and GPU Kubernetes nodes via server templates, including new clusterctl templates, CRD/schema updates, and tests (with e2e coverage using CUBE as a cheaper proxy for GPU template behavior).

Changes:

  • Add templateID plus new server/disk types (CUBE/GPU, DAS) to the API types and CRDs, including validation rules.
  • Update server reconciliation to set template-backed server properties correctly and handle template-specific boot volume constraints.
  • Add new cluster templates (cube/gpu) and extend e2e coverage with a CUBE flavor test.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/service/cloud/server.go Applies template-specific server/boot-volume property rules and skips boot-volume deletion for template-backed servers.
internal/service/cloud/server_test.go Adds unit tests for CUBE/GPU template provisioning and template deletion behavior.
api/v1alpha1/ionoscloudmachine_types.go Introduces templateID, new ServerType values (CUBE/GPU), and DAS disk type + validation annotations.
api/v1alpha1/ionoscloudmachine_types_test.go Adds/extends validation tests for new server types and templateID rules.
config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudmachines.yaml Updates CRD schema/enum/validations for template-backed server types and DAS.
config/crd/bases/infrastructure.cluster.x-k8s.io_ionoscloudmachinetemplates.yaml Same as above for machine templates CRD.
templates/cluster-template-cube.yaml Adds clusterctl flavor template for CUBE servers using templateID (and DAS).
templates/cluster-template-gpu.yaml Adds clusterctl flavor template for GPU servers using templateID.
test/e2e/data/infrastructure-ionoscloud/cluster-template-cube.yaml Adds e2e cluster template for the cube flavor.
test/e2e/config/ionoscloud.yaml Registers the new e2e template and adds IONOSCLOUD_CUBE_TEMPLATE_ID variable.
test/e2e/capic_test.go Adds an e2e QuickStartSpec covering the cube flavor.
.github/workflows/e2e.yaml Plumbs IONOSCLOUD_CUBE_TEMPLATE_ID into the e2e workflow environment.
docs/quickstart.md Documents new server types and the new cube/gpu templates and variables.
docs/custom-image.md Documents EFI/UEFI requirements for GPU usage and updated build guidance.
envfile.example Adds example env vars for cube/gpu template IDs.
go.mod, go.sum Bumps github.com/ionos-cloud/sdk-go/v6 to v6.3.6.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/v1alpha1/ionoscloudmachine_types.go
@lucasl0st

lucasl0st commented Apr 2, 2026

Copy link
Copy Markdown
Author

TODO: I need to check this:

E0402 15:17:44.288900       1 controller.go:324] "Reconciler error" err=<
	error in step ReconcileIPFailover: failed to patch LAN 1: request to Cloud API has failed: 422 Unprocessable Entity {
	  "httpStatus" : 422,
	  "messages" : [ {
	    "errorCode" : "345",
	    "message" : "[(root).properties.ipFailover] NICs of a Cube instance are not allowed to be added to an IP Failover setup"
	  } ]
	}

Edit: this essentially means that cubes should not be used as control plane nodes.

lpape-ionos added a commit that referenced this pull request Apr 7, 2026
For
#351
I need the cube template id added to the e2e tests. For the tests to run
I need to merge this to main because I am coming from a fork.
@sonarqubecloud

sonarqubecloud Bot commented Apr 8, 2026

Copy link
Copy Markdown

@mspoeri mspoeri changed the title feat: add cube and gpu support via templates [MWG-1605] feat: add cube and gpu support via templates Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants