Skip to content

AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow#8192

Merged
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
jkyros:autoscale-558-kubeletconfig-overflow
May 12, 2026
Merged

AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow#8192
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
jkyros:autoscale-558-kubeletconfig-overflow

Conversation

@jkyros
Copy link
Copy Markdown
Member

@jkyros jkyros commented Apr 9, 2026

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • NodeClass now accepts detailed kubelet configuration and maps selected fields to provisioned nodes; per-NodeClass kubelet ConfigMaps, a cluster taint ConfigMap, finalizers, and NodePool config reference selection are reconciled. Added helpers to generate Karpenter taint manifests and label/name helpers. Added privileged checker Pod manifest for node kubelet validation.
  • Tests

    • New unit and e2e tests covering JSON marshal/unmarshal, unknown-field preservation, mapping to upstream config, ConfigMap lifecycle, finalizers, manifest validation, and runtime kubelet checks.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 9, 2026

@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces structured kubelet configuration: a new KubeletConfiguration API type with custom JSON marshal/unmarshal, IsZero semantics, and an optional kubelet field on OpenshiftEC2NodeClassSpec. Adds helpers/constants to generate a Karpenter taint KubeletConfig manifest. Controllers now reconcile a global Karpenter taint ConfigMap and per-NodeClass kubelet ConfigMaps (create/update/delete and finalizer lifecycle). Reconciliation copies selected kubelet fields into upstream EC2NodeClass Kubelet. Extensive unit and e2e tests validate JSON overflow preservation, ConfigMap lifecycle, manifest contents, and node-level kubelet propagation. Lint exclusions updated for kubelet fields.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant OSE2NC as OpenshiftEC2NodeClass
    participant KarpIgnition as KarpenterIgnitionController
    participant KubeletCM as Kubelet ConfigMap\n(Management Cluster)
    participant EC2NC as EC2NodeClass\n(Karpenter)
    participant NP as NodePool\n(Karpenter)
    participant Node as Provisioned Node

    User->>OSE2NC: Create/Update with spec.kubelet
    OSE2NC->>KarpIgnition: Notify controller of change

    alt spec.kubelet is set
        KarpIgnition->>KarpIgnition: Add kubeletConfigFinalizer
        KarpIgnition->>KarpIgnition: Merge user kubelet config + base taints
        KarpIgnition->>KubeletCM: Create/Update per-NodeClass ConfigMap (data["config"]=KubeletConfig YAML)
    else spec.kubelet is unset
        KarpIgnition->>KubeletCM: Delete per-NodeClass ConfigMap
        KarpIgnition->>KarpIgnition: Remove kubeletConfigFinalizer if present
    end

    User->>NP: Create NodePool referencing OpenshiftEC2NodeClass
    NP->>EC2NC: Populate EC2NodeClass.Kubelet via KarpenterKubeletConfiguration()
    NP->>Node: Provision node referencing Kubelet ConfigMap
    Node->>KubeletCM: Read and apply kubelet config
Loading
sequenceDiagram
    actor Operator
    participant KarController as KarpenterController
    participant MgrCM as Management ConfigMap\n(karpenter taint)
    participant Support as support/karpenter helpers

    Operator->>KarController: Reconcile loop
    KarController->>Support: KarpenterTaintConfigManifest()
    Support-->>KarController: YAML manifest
    KarController->>MgrCM: Create/Update global taint ConfigMap (data["config"]=manifest)
    MgrCM-->>KarController: Created/Updated or Error
Loading
🚥 Pre-merge checks | ✅ 7 | ❌ 5

❌ Failed checks (5 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test code manually patches finalizers instead of calling Reconcile(), verifying state mutation but not controller logic execution. Refactor finalizer tests to call r.Reconcile() directly instead of manually manipulating finalizers to verify the controller's own logic is exercised.
Microshift Test Compatibility ⚠️ Warning The e2e test TestKarpenter uses MicroShift-unavailable APIs (Karpenter NodePool, AWS EC2NodeClass) and features with no protection mechanisms to prevent execution on MicroShift clusters. Add [Skipped:MicroShift] label to test name or wrap with exutil.IsMicroShiftCluster() check that calls g.Skip() to prevent execution on MicroShift.
Single Node Openshift (Sno) Test Compatibility ⚠️ Warning The testKubeletPropagation e2e test lacks SNO protection and will fail on Single Node OpenShift clusters. Add [Skipped:SingleReplicaTopology] label to test or use exutil.IsSingleNode() check with g.Skip().
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Pod manifest specifies 'image: alpine' without registry prefix, requiring external pull from Docker Hub in disconnected environments; test lacks IP family detection for IPv6-only clusters. Use image from internal registry, add IP family detection with GetIPAddressFamily(), or add [Skipped:Disconnected] tag for disconnected cluster scenarios.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly summarizes the main change: exposing KubeletConfig on OpenShiftEC2NodeClass as structured fields with overflow preservation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test function names referenced in the PR (TestReconcileTaintConfigMap, TestCreateInMemoryNodePool, TestReconcileKubeletConfigMap, TestReconcileDeletedNodeClass) are stable, deterministic, and contain no dynamic values like timestamps, UUIDs, or generated identifiers.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces karpenter integration without topology-aware scheduling constraints. Changes include kubelet config API types, controller reconciliation logic, and e2e tests with no affinity rules or control-plane node assumptions.
Ote Binary Stdout Contract ✅ Passed Module-level variable initializer contains panic(err) in IIFE, but panic writes to stderr, not stdout. Embedded YAML is valid, so panic condition should never execute. Code does not emit non-JSON content to stdout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 78.13953% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.65%. Comparing base (bded456) to head (194f1a3).
⚠️ Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
.../karpenterignition/karpenterignition_controller.go 73.60% 22 Missing and 11 partials ⚠️
...ator/controllers/karpenter/karpenter_controller.go 66.66% 5 Missing and 1 partial ⚠️
support/karpenter/karpenter.go 80.76% 4 Missing and 1 partial ⚠️
...r-operator/controllers/nodeclass/karpenter_util.go 95.23% 1 Missing and 1 partial ⚠️
...trollers/hostedcluster/hostedcluster_controller.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8192      +/-   ##
==========================================
+ Coverage   37.53%   37.65%   +0.11%     
==========================================
  Files         751      751              
  Lines       92026    92206     +180     
==========================================
+ Hits        34544    34717     +173     
+ Misses      54841    54834       -7     
- Partials     2641     2655      +14     
Files with missing lines Coverage Δ
...ft-operator/controllers/hostedcluster/karpenter.go 75.49% <100.00%> (+12.16%) ⬆️
.../controllers/nodeclass/ec2_nodeclass_controller.go 53.72% <100.00%> (+0.08%) ⬆️
...trollers/hostedcluster/hostedcluster_controller.go 43.23% <0.00%> (ø)
...r-operator/controllers/nodeclass/karpenter_util.go 85.71% <95.23%> (+3.36%) ⬆️
support/karpenter/karpenter.go 74.50% <80.76%> (+6.50%) ⬆️
...ator/controllers/karpenter/karpenter_controller.go 28.77% <66.66%> (+2.00%) ⬆️
.../karpenterignition/karpenterignition_controller.go 64.94% <73.60%> (+2.27%) ⬆️

... and 2 files with indirect coverage changes

Flag Coverage Δ
cmd-support 32.80% <80.76%> (+0.04%) ⬆️
cpo-hostedcontrolplane 36.78% <ø> (+0.01%) ⬆️
cpo-other 37.84% <ø> (+0.08%) ⬆️
hypershift-operator 47.99% <66.66%> (+0.05%) ⬆️
other 28.70% <77.95%> (+0.92%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 9, 2026

/test e2e-aws-autonode

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 9, 2026

Heyyy that is super cool, I don't have to have my claude watch and root cause test failures anymore
/test e2e-aws-autonode

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 10, 2026

KubeletConfig passed, teardown failure. One more time
/test e2e-aws-autonode

Comment thread api/karpenter/v1beta1/karpenter_types.go Outdated
Comment thread api/.golangci.yml Outdated
// +kubebuilder:object:generate=false
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent"
type KubeletConfiguration struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need test coverage once d448ab4 merges

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can either come back afterwards and add it, or we can wait for that to merge and then do this one, and I can add it here. I'll set a test up based on your branch in the mean time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added envtest test coverage for KubeletConfiguration

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 10, 2026

/test e2e-aws-autonode

@maxcao13
Copy link
Copy Markdown
Member

tests are just taking too long i think 😂

We can see TestKarpenter hitting the 2 hour mark which is when it stops.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2026
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from c11822a to 4d6300e Compare April 15, 2026 07:03
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added kubelet configuration field to NodeClass specifications with support for image garbage collection thresholds, eviction policies, and resource reservation settings that are applied to provisioned nodes.

  • Tests

  • Added comprehensive tests for kubelet configuration lifecycle management, YAML serialization, and end-to-end validation of kubelet settings on nodes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 15, 2026

Rebased, fixed review feedback issues, and swapped out that map[string]json.RawMessage with a runtime.rawExtension so we're out of the business of maintaining that bespoke DeepCopy function.

Now that we're parallel, let's see what we get
/test e2e-aws-autonode

@@ -0,0 +1,147 @@
apiVersion: apiextensions.k8s.io/v1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it be nice to populate Expected: for these cases

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, populated expected for these


// Overflow holds additional kubelet configuration fields not explicitly defined above.
// These fields are preserved during serialization and deserialization, allowing arbitrary
// kubelet configuration to pass through to the node's ignition/MachineConfig.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may be mention overflow fields bypass all CRD validation, invalid values will manifest as node bootstrap failures (kubelet crash loop), not admission errors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated language to mention the kubelet crash loop

Comment thread api/AGENTS.md Outdated

When graduating a field from overflow to a typed struct field:

- **Match upstream Karpenter's field name and JSON tag exactly.** Our structured fields must use the
Copy link
Copy Markdown
Member

@enxebre enxebre May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with overflow fields set in existing configs, if upstream Karpenter introduce support for it, and we want to promote it but karpenter upstream choose to differ from upstream kubelet on how the semantic is exposed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to match kubelet. Users today will set kubelet fields which end up in our overflow, and we want to maintain compatibility with those. If Karpenter decide to change the field, that would break our upgrades if we made the field structured and kept them aligned to karpenter, so we would have to align to kubelet, then translate to karpenter (so that it can then translate back to kubelet)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Match kubelet. I changed that one weird EvictionSoftGracePeriod Karpenter time.Duration back to a string (so we're matching upstream Kube not Karpenter, it will serialize the same) and updated the AGENTS, etc language to match this.

// +kubebuilder:validation:MinProperties=1
// +optional
KubeReserved map[string]string `json:"kubeReserved,omitempty"`
// evictionHard is a map of signal names to quantities that defines hard eviction thresholds.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to validate that evictionHard and EvictionSoft doesn't clash any values?

Copy link
Copy Markdown
Member Author

@jkyros jkyros May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as much CEL as we can to validate this.

They can be either quantities or percentages (yay!) and that makes it difficult to compare since CEL doesn't know node capacity. But we can can compare percentages to percentages and values to values so I added what ended up being kind of a gnarly expression that compares them if it can and fails open if they are mix/match quantity/percentage.

EDIT: Nope. Too gnarly. quantity() and compareTo() are too expensive. And apparently...I can't put max lengths on the map key strings to help without type aliasing them so I have somewhere to attach. Yuck. (unless we want to hack on the CRD like adjust-cel.sh does or something? I don't think we do )

I'm going to turn into Joel here with the "ugh, this API sucks" 😛

Yeah I think (because contents can be either type) we either have to hack on the CRD or type alias it to pass budget and still validate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added separate type definition for the strings so we can limit their length to 64 inside the map, CEL budget is okay now.

Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each of the maps, could we add a maximum limit too? Seems like the realistic user case is ~4/5 entries in each map, what if we limited to 32 to give plenty of buffer but bring estimated CEL costs down?

Comment thread api/karpenter/v1/kubelet_config.go Outdated
Comment on lines +36 to +37
// When graduating new fields from overflow to typed fields, match upstream Karpenter's
// field names and types exactly. See api/AGENTS.md "KubeletConfiguration Field Graduation"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it karpenter, or kubelet that we must match?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relates to #8192 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching Kubelet, adjusted accordingly.

jkyros added 2 commits May 8, 2026 17:15
…eClass

Add KubeletConfiguration type with custom JSON marshal/unmarshal to support
both explicitly typed fields (maxPods, systemReserved, eviction thresholds,
etc.) and arbitrary overflow fields that pass through to the node's kubelet
config via MachineConfig.

Typed fields get CEL validation at admission time (range checks,
cross-field rules like imageGCHigh > imageGCLow and evictionSoft >=
evictionHard, key-set validation on maps). Overflow fields bypass CRD
validation entirely — invalid values surface as kubelet crash loops at
node bootstrap, not admission errors.

The overflow mechanism uses a runtime.RawExtension with json:"-" and
custom MarshalJSON/UnmarshalJSON to split known fields into the struct
and unknown fields into overflow. On marshal, structured fields win
over overflow on conflict.

Field types match upstream kubelet (k8s.io/kubelet/config/v1beta1) as
the primary compatibility target, with upstream Karpenter as a secondary
reference. The one deliberate deviation is evictionMaxPodGracePeriod
(*int32 vs kubelet's int32) required by the API linter since 0 is a
valid value with no Minimum constraint.

Signed-off-by: John Kyros <jkyros@redhat.com>
…d mapping

Wire the KubeletConfiguration from OpenshiftEC2NodeClass through the
karpenter-operator's ignition controller to inject kubelet settings into
node ignition via MachineConfig.

Add type mapping function (karpenterKubeletConfigurationFromNodeClassSpec)
that converts our kubelet types to upstream Karpenter types, including a
string-to-Duration conversion for evictionSoftGracePeriod where our API
matches kubelet's map[string]string but Karpenter uses map[string]metav1.Duration.

Move taint ConfigMap creation from the hypershift-operator to the
karpenter-operator for centralized management.

Signed-off-by: John Kyros <jkyros@redhat.com>
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from a2887fe to 31ee3c0 Compare May 8, 2026 22:53
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 8, 2026

envtest...cancelled itself? hmmm
/retest

Add envtest suites for CEL validation rules including threshold
ordering, field promotion ratcheting, and generated ratcheting tests.
Add e2e test infrastructure for verifying kubelet config on nodes.

Signed-off-by: John Kyros <jkyros@redhat.com>
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 31ee3c0 to 0a5d587 Compare May 9, 2026 02:07
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 9, 2026

/test e2e-aws-autonode

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 11, 2026

/retest-required

The relationship between the limits for soft and hard eviction
thresholds was not validated in the upstream kubernetes API but we
desired to validate it downstream. This just makes sure the soft
threshold fires before the hard threshold.

Signed-off-by: John Kyros <jkyros@redhat.com>
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 12, 2026

I wonder if I can...
/test e2e-aks
/test e2e-aks-4-22
/test e2e-aws
/test e2e-aws-4-22
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 12, 2026

Infra

�[31mERRO�[0m[2026-05-12T00:57:31Z] Some steps failed:                           
�[31mERRO�[0m[2026-05-12T00:57:31Z] 
  * could not run steps: step [release:n3minor] failed: failed to get CLI image: unable to extract the 'cli' image from the release image, pod produced no output 
�[36mINFO�[0m[2026-05-12T00:57:31Z] Reporting job state 'failed' with reason 'executing_graph:step_failed:importing_release' 

/test e2e-aws

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 12, 2026

/test e2e-aks-4-22

@JoelSpeed
Copy link
Copy Markdown
Contributor

/approve

For api

@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 12, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, jkyros, JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
@JoelSpeed
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 12, 2026

/verified by e2e-tests

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This PR has been marked as verified by e2e-tests.

Details

In response to this:

/verified by e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 4341d0c into openshift:main May 12, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/ci-tooling Indicates the PR includes changes for CI or tooling area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants