Skip to content

Feature/kube native upgrade#14

Draft
nekwar wants to merge 4 commits into
mainfrom
feature/kube-native-upgrade
Draft

Feature/kube native upgrade#14
nekwar wants to merge 4 commits into
mainfrom
feature/kube-native-upgrade

Conversation

@nekwar
Copy link
Copy Markdown
Contributor

@nekwar nekwar commented May 26, 2026

Ansible: automated MKE post-install configuration

Adds a post-install Ansible workflow that configures an MKE3 cluster without SSH — all operations run on the controller via the MKE REST API and kube API, safe to run after SSH is disabled.

What's added

mke-client-bundle-playbook.yml / tasks/mke-client-bundle-tasks.yml
Fetches the MKE3 client bundle via REST API, extracts kube.yml, cleans up the ZIP. Auth and download both have retries.

tasks/mke-upgrade-controller-tasks.yml
Installs mke-upgrade-controller (branch feature/mke3-upgrade-support) using the bundle kubeconfig. When deploy_suc: true, also deploys System Upgrade Controller with:

  • Node affinity patch (control-plane/master nodes, Linux only)
  • SYSTEM_UPGRADE_JOB_ACTIVE_DEADLINE_SECONDS=3600 ConfigMap patch
  • MKE privilege grants for the SUC service account (pod-security attributes, all-node scheduling via config-toml and grants API)

mke-post-install-playbook.yml
Standalone playbook wrapping the above. Imported by mke-install-playbook.yml as its final step; can also be run independently after SSH is disabled.

New vars (vars/common-vars.yml)

Variable Default Purpose
deploy_suc false Deploy SUC before mke-upgrade-controller
suc_service_account system-upgrade:system-upgrade SA granted privileged pod-security attributes

nekwar added 4 commits May 26, 2026 15:37
Adds a standalone playbook and task file to obtain the MKE3 client
bundle via the REST API (/auth/login → /api/clientbundle), extract
it locally, and delete the ZIP.

- mke-client-bundle-playbook.yml: targets managers to inherit mke_url
  from inventory all.vars; saves bundle to playbook_dir/mke-bundle
- tasks/mke-client-bundle-tasks.yml: auth with retries, download with
  retries, unzip, cleanup; all tasks delegate to localhost
- .gitignore: exclude ansible/mke-bundle/ (contains TLS keys)
Adds tasks to install mke-upgrade-controller via the client bundle
kubeconfig, with optional System Upgrade Controller (SUC) deployment
and the MKE privilege configuration SUC requires to operate.

mke-upgrade-controller-tasks.yml:
- fetches the client bundle (reuses mke-client-bundle-tasks.yml)
- conditionally deploys SUC from latest release manifests
- patches the SUC deployment with control-plane node affinity
- conditionally runs SUC privilege grants
- applies mke-upgrade-controller static manifests from
  feature/mke3-upgrade-support branch

suc-priv-grant-tasks.yml:
- GET /api/ucp/config-toml, merge, PUT — sets priv-attribute allowlists
  and enable_admin_ucp_scheduling = true in [scheduling_configuration]
- PUT /collectionGrants/authenticated/swarm/scheduler — grants Scheduler
  role to all authenticated users on the root swarm collection

tasks/helpers/suc_priv_grant.py:
- idempotent TOML merger: priv_attributes arrays in [cluster_config]
  and enable_admin_ucp_scheduling bool in [scheduling_configuration]
- accepts SA as CLI arg; all six priv attributes always granted

vars/common-vars.yml:
- deploy_suc (bool, default false)
- suc_service_account (default: system-upgrade:system-upgrade)
Moves the mke-upgrade-controller installation play out of
mke-install-playbook.yml into a dedicated mke-post-install-playbook.yml.

Motivation: all post-install operations run on localhost via the MKE
REST API and kube API — no SSH to cluster nodes required. The standalone
playbook can be re-run independently after SSH has been disabled without
needing to repeat the full installation sequence.

mke-install-playbook.yml delegates to the new playbook via
import_playbook as its final step.
Sets SYSTEM_UPGRADE_JOB_ACTIVE_DEADLINE_SECONDS=3600 on the
default-controller-env ConfigMap in the system-upgrade namespace.

The default 900s deadline is insufficient for OS upgrade jobs which
require pulling a ~1.4GB image and running bootc switch per node.
@nekwar nekwar requested a review from james-nesbitt May 26, 2026 12:11
@nekwar nekwar marked this pull request as draft May 26, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant