Skip to content

feat: add CEL validation rules to MemberCluster taint API types#478

Draft
Yetkin Timocin (ytimocin) wants to merge 1 commit into
kubefleet-dev:mainfrom
ytimocin:feat/cel-validation-membercluster
Draft

feat: add CEL validation rules to MemberCluster taint API types#478
Yetkin Timocin (ytimocin) wants to merge 1 commit into
kubefleet-dev:mainfrom
ytimocin:feat/cel-validation-membercluster

Conversation

@ytimocin
Copy link
Copy Markdown
Collaborator

@ytimocin Yetkin Timocin (ytimocin) commented Mar 4, 2026

Description of your changes

Adds CRD CEL validation for MemberCluster taints in both API versions (v1 and v1beta1), replacing the webhook-based taint validation with declarative CRD-level validation.

What changed

  • API type updates:
    • Added CEL rules on MemberClusterSpec.Taints and Taint in:
      • apis/cluster/v1/membercluster_types.go
      • apis/cluster/v1beta1/membercluster_types.go
    • Rules cover:
      • taint key name segment format and max length
      • taint key prefix DNS-subdomain format and max length
      • taint value label-value format and max length
      • taint uniqueness by (key, value, effect)
    • Added MaxLength, MinLength, and MaxItems constraints to bound CEL cost
  • Regenerated CRD schema:
    • config/crd/bases/cluster.kubernetes-fleet.io_memberclusters.yaml
  • Removed webhook taint validation:
    • pkg/webhook/membercluster/membercluster_validating_webhook.go — CREATE/UPDATE now returns Allowed since CEL handles validation before the webhook is reached. DELETE path (ServiceExport check) is unchanged.
    • pkg/utils/validator/membercluster.go — Deleted (dead code)
    • pkg/utils/validator/membercluster_test.go — Deleted (dead code)
  • Added integration validation coverage:
    • test/apis/cluster/v1/api_validation_integration_test.go
    • test/apis/cluster/v1beta1/api_validation_integration_test.go
    • Comprehensive valid/invalid taint CEL cases for both versions.
  • Updated e2e assertion:
    • test/e2e/webhook_test.go
    • Adjusted expected error message to match CEL validation output.

Why

  • Enforces invalid taints earlier at CRD admission time (schema validation runs before webhooks).
  • Removes runtime dependency on the webhook for taint validation — if the webhook is unavailable, taint validation still works.
  • Simplifies webhook code by removing redundant validation logic.

Breaking change note

Error responses for invalid taints have changed:

  • Before: Webhook returned 403 Forbidden with messages from k8s.io/apimachinery/pkg/util/validation (e.g., "name part must consist of...")
  • After: CRD validation returns 422 Unprocessable Entity with CEL rule messages (e.g., "taint key name segment must consist of...")

If external tooling parses taint validation error messages or status codes, it will need to be updated.

How has this code been tested

  • make reviewable passes clean
  • Integration tests:
    • go test ./test/apis/cluster/v1 -run TestAPIs -count=1 -- -ginkgo.focus="MemberCluster taint CEL validation" -ginkgo.v
    • go test ./test/apis/cluster/v1beta1 -run TestAPIs -count=1 -- -ginkgo.focus="MemberCluster taint CEL validation" -ginkgo.v

Special notes for your reviewer

  • CEL validation runs at schema validation time, before validating webhooks. Invalid taints are rejected before reaching the webhook.
  • The webhook DELETE path is preserved — it performs cross-object validation (checking InternalServiceExport resources) that cannot be expressed in CEL.
  • Estimated total CEL cost is ~4.4M out of the 10M per-object budget (~56% headroom).

I have:

  • Run make reviewable to ensure this PR is ready for review.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CRD-level CEL validations for MemberCluster taints (v1 + v1beta1) so invalid taints are rejected at schema validation time, aligning with existing webhook validation for defense-in-depth.

Changes:

  • Added CEL rules and length constraints to MemberClusterSpec.Taints / Taint in both API versions (format, prefix/name/value limits, and uniqueness by (key,value,effect)).
  • Regenerated the MemberCluster CRD schema to include the new validations.
  • Expanded validation coverage via API integration tests (v1 + v1beta1), updated validator unit tests, and adjusted an e2e assertion to match CEL error output.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
apis/cluster/v1/membercluster_types.go Adds CEL validations + length constraints for taints in v1 API types.
apis/cluster/v1beta1/membercluster_types.go Adds the same CEL validations + constraints for v1beta1 API types.
config/crd/bases/cluster.kubernetes-fleet.io_memberclusters.yaml CRD schema regeneration to include taint CEL validations and max/min length constraints.
test/apis/cluster/v1/api_validation_integration_test.go Adds integration coverage for valid/invalid taint CEL cases (v1).
test/apis/cluster/v1beta1/api_validation_integration_test.go Adds integration coverage for valid/invalid taint CEL cases (v1beta1).
pkg/utils/validator/membercluster_test.go Adds a unit test covering “same key+effect, different value” as valid (uniqueness is by key+value+effect).
test/e2e/webhook_test.go Updates expected error text to match CEL/CRD validation output.

Comment thread test/e2e/webhook_test.go Outdated
var statusErr *k8sErrors.StatusError
g.Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Update MC call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{})))
g.Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character"))
g.Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("taint key name segment must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character"))
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion uses MatchRegexp with a literal string that contains regex metacharacters (notably the unescaped "."), which can make the test pass for unintended messages. Consider using ContainSubstring for this check, or wrapping the expected message with regexp.QuoteMeta() before passing it to MatchRegexp to ensure the regex matches the literal text.

Copilot uses AI. Check for mistakes.
@ytimocin Yetkin Timocin (ytimocin) force-pushed the feat/cel-validation-membercluster branch 2 times, most recently from d486583 to d20fcc6 Compare March 6, 2026 21:14
@ytimocin Yetkin Timocin (ytimocin) force-pushed the feat/cel-validation-membercluster branch 5 times, most recently from ca856e2 to f395770 Compare March 31, 2026 23:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment on lines +80 to 83
// Taint validation is now handled by CRD-level CEL validation rules.
// CREATE/UPDATE requests with invalid taints are rejected at schema
// validation time before reaching this webhook.
return admission.Allowed("Member cluster has valid fields")
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The webhook now short-circuits CREATE/UPDATE, but the validating webhook is still configured (in pkg/webhook/webhook.go) with FailurePolicy=Fail and Operations={Create,Update,Delete} for MemberCluster. That means webhook unavailability will still block MemberCluster CREATE/UPDATE even though taint validation moved to CEL, which undermines the stated goal of removing the webhook dependency for taint validation. Consider updating the webhook configuration to only target DELETE (or set FailurePolicy=Ignore for CREATE/UPDATE) now that only DELETE performs cross-object validation.

Copilot uses AI. Check for mistakes.
Comment on lines +453 to +457
It("should allow creating MemberCluster with taints having same key and effect but different values", func() {
mc := &clusterv1.MemberCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "mc-v1-taint-same-key-effect",
},
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec is a valid (allowed) case but is placed under the "invalid cases" Context, which makes the test suite harder to understand/maintain. Move this It block into the "valid cases" Context (or rename/split the Contexts to match the actual expectations).

Copilot uses AI. Check for mistakes.
Comment on lines +453 to +473
It("should allow creating MemberCluster with taints having same key and effect but different values", func() {
mc := &clusterv1beta1.MemberCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "mc-taint-same-key-effect",
},
Spec: clusterv1beta1.MemberClusterSpec{
Identity: rbacv1.Subject{
Name: "fleet-member-agent-cluster-1",
Kind: "ServiceAccount",
Namespace: "fleet-system",
APIGroup: "",
},
Taints: []clusterv1beta1.Taint{
{Key: "key1", Value: "value1", Effect: "NoSchedule"},
{Key: "key1", Value: "value2", Effect: "NoSchedule"},
},
},
}
By("expecting success of CREATE with same key+effect but different values")
Expect(hubClient.Create(ctx, mc)).Should(Succeed())
Expect(hubClient.Delete(ctx, mc)).Should(Succeed())
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec is a valid (allowed) case but is placed under the "invalid cases" Context, which makes the test suite harder to understand/maintain. Move this It block into the "valid cases" Context (or rename/split the Contexts to match the actual expectations).

Suggested change
It("should allow creating MemberCluster with taints having same key and effect but different values", func() {
mc := &clusterv1beta1.MemberCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "mc-taint-same-key-effect",
},
Spec: clusterv1beta1.MemberClusterSpec{
Identity: rbacv1.Subject{
Name: "fleet-member-agent-cluster-1",
Kind: "ServiceAccount",
Namespace: "fleet-system",
APIGroup: "",
},
Taints: []clusterv1beta1.Taint{
{Key: "key1", Value: "value1", Effect: "NoSchedule"},
{Key: "key1", Value: "value2", Effect: "NoSchedule"},
},
},
}
By("expecting success of CREATE with same key+effect but different values")
Expect(hubClient.Create(ctx, mc)).Should(Succeed())
Expect(hubClient.Delete(ctx, mc)).Should(Succeed())
Context("Test MemberCluster API validation - valid taint cases", func() {
It("should allow creating MemberCluster with taints having same key and effect but different values", func() {
mc := &clusterv1beta1.MemberCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "mc-taint-same-key-effect",
},
Spec: clusterv1beta1.MemberClusterSpec{
Identity: rbacv1.Subject{
Name: "fleet-member-agent-cluster-1",
Kind: "ServiceAccount",
Namespace: "fleet-system",
APIGroup: "",
},
Taints: []clusterv1beta1.Taint{
{Key: "key1", Value: "value1", Effect: "NoSchedule"},
{Key: "key1", Value: "value2", Effect: "NoSchedule"},
},
},
}
By("expecting success of CREATE with same key+effect but different values")
Expect(hubClient.Create(ctx, mc)).Should(Succeed())
Expect(hubClient.Delete(ctx, mc)).Should(Succeed())
})

Copilot uses AI. Check for mistakes.
@ytimocin Yetkin Timocin (ytimocin) force-pushed the feat/cel-validation-membercluster branch from f395770 to 9d79ddb Compare April 1, 2026 00:08
Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>
@ytimocin Yetkin Timocin (ytimocin) force-pushed the feat/cel-validation-membercluster branch from 9d79ddb to 4229c8e Compare April 9, 2026 16:55

// Taint attached to MemberCluster has the "effect" on
// any ClusterResourcePlacement that does not tolerate the Taint.
// +kubebuilder:validation:XValidation:rule="(self.key.contains('/') ? self.key.substring(self.key.indexOf('/') + 1) : self.key).matches('^[A-Za-z0-9]([A-Za-z0-9._-]*[A-Za-z0-9])?$')",message="taint key name segment must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CEL validation for name and label seem too complex ? Currently we depend on validation functions provided by k8s which guarantee that we are following established conventions once this CEL validation is merged we would also need to maintain consistency between our validation and k8s (very unlikely but still something to keep in mind)

@ytimocin Yetkin Timocin (ytimocin) marked this pull request as draft April 29, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: improve reliability by moving away from using webhook

3 participants