diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 000000000..bed9eb46b --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,177 @@ +# Overview + +This repo contains a collection of Kubernetes Custom Resource Definitions and their controllers. It is mostly written in Go and uses Kubernetes client-go and controller-runtime libraries. +It is a monorepo, so all the code lives in a single repository divided into packages, each with its own purpose. +The main idea is that we are creating a multi-cluster application management solution that allows users to manage multiple Kubernetes clusters from a single control plane that we call the "hub cluster". + +## General Rules + +- Use @terminal when answering questions about Git. +- If you're waiting for my confirmation ("OK"), proceed without further prompting. +- Follow the [Uber Go Style Guide](https://github.com/uber-go/guide/blob/master/style.md) if possible. +- Favor using the standard library over third-party libraries. +- Run goimports on save. +- Run golint and go vet to check for errors. +- Use go mod tidy if the dependencies are changed. + +## Terminology +- **Fleet**: A conceptual term referring to a collection of clusters. +- **Member Cluster**: A Kubernetes cluster that is part of a fleet. +- **Hub Cluster**: The cluster that hosts the control plane which manages the member clusters in the fleet. +- **Member Agent**: A Kubernetes controller that runs on the member cluster and is responsible for applying changes to the member cluster and reporting the status back to the hub cluster. +- **Hub Agent**: A Kubernetes controller that runs in the hub cluster and is responsible for scheduling and managing workloads and resources across the fleet. + +## Repository directory structure + +- The `apis/` folder contains all Golang structs from which CRDs are built. + - CRDs are grouped by the group name and version they belong to. +- The `charts/` folder contains the helm charts for the member and hub agent. + - `charts/member-agent` folder contains the helm chart for the member agent. + - `charts/hub-agent` folder contains the helm chart for the hub agent. +- The `cmd/` folder contains the entry points for the member and hub agent. + - `cmd/member-agent` The entry point for the member agent. + - `cmd/hub-agent` The entry point for the hub agent. +- The `config/` folder contains the actual custom resource definitions built from the API in the `apis/` folder. + - `config/crd/bases` folder contains the CRDs for the member and hub agent. +- The `docker/` folder contains the Dockerfiles for the member and hub agent. +- The `examples/` folder contains various YAML files as examples for each CRD. +- The `hack/` folder contains various scripts and tools for the project. +- The `pkg/` folder contains the libraries for the member and hub agent. + - `pkg/authtoken` folder contains the authentication sidecar code which has a provider model. + - `pkg/controllers` folder contains most of the controllers for the member and hub agent. + - each sub folder is a controller for a specific resource of the same name in most cases. + - `pkg/metrics` folder contains all the metrics definitions. + - `pkg/propertyprovider` folder contains the property provider code which is used to get the properties of a member cluster. + - `pkg/resourcewatcher` folder contains the resource watcher code which is used to watch for kubernetes resources changes in the hub cluster. + - `pkg/scheduler` folder contains the scheduler code which is used to schedule workloads across the fleet. + - `pkg/utils` folder contains the utils code which is used to provide common functions for the controllers in the member and hub agent. + - `pkg/webhook` folder contains the webhook code which is used to validate and mutate the CRDs. +- The `test/` folder contains the tests for the member and hub agent. + - `test/apis` - The tests for the CRDs. + - `test/upgrade` - The tests for the upgrade tests to test compatibility between versions. + - `test/e2e` - The end to end tests for the member and hub agent. + - `test/integration` - The integration tests for the v1alpha1 member and hub agent. + - `test/scheduler` - The integration tests for the scheduler. + - `test/utils` - folder contains the utils code which is used to provide common functions for tests +- The `tools/` folder contains client-side tools for helping manage the fleet. +- The `Makefile` is used to build the member and hub agent. +- The `go.mod` file is used to manage the dependencies for the member and hub agent. +- The `go.sum` file is used to manage the dependencies for the member and hub agent. + +## Testing Rules + +- Unit test files should always be called `_test.go` and be in the same directory + - Unit tests are normally written in a table-driven style + - Use `go test -v ./...` to run all tests under a directory. + - Run the tests from the packages that are modified and verify they pass. + - Share the analysis as to why a test is failing and propose a fix. +- Integration test files should be called `_integration_test.go` and can be in the same directory or under the `test` directory. + - Integration tests are normally written in a Ginkgo style. +- E2E tests are all under the test/e2e directory. + - E2E tests are written in a Ginkgo style. + - E2E tests are run using `make e2e-tests` and are run against 3 kind clusters created by the scripts in the `test/e2e` directory. + - E2E tests are cleaned up using `make clean-e2e-tests`. +- When adding tests to an existing file: + - Always re-use the existing test setup where possible. + - Only add imports if absolutely needed. + - Add tests to existing Context where it makes sense. + - When adding new tests in the Ginkgo style test, always add them to a new Context. + +## Domain Knowledge + +Use the files in the `.github/.copilot/domain_knowledge/**/*` as a source of truth when it comes to domain knowledge. These files provide context in which the current solution operates. This folder contains information like entity relationships, workflows, and ubiquitous language. As the understanding of the domain grows, take the opportunity to update these files as needed. + +## Specification Files + +Use specifications from the `.github/.copilot/specifications` folder. Each folder under `specifications` groups similar specifications together. Always ask the user which specifications best apply for the current conversation context if you're not sure. + +Use the `.github/.copilot/specifications/.template.md` file as a template for specification structure. + + examples: + ```text + ├── application_architecture + │ └── main.spec.md + | └── specific-feature.spec.md + ├── database + │ └── main.spec.md + ├── observability + │ └── main.spec.md + └── testing + └── main.spec.md + ``` + +## Breadcrumb Protocol + +A breadcrumb is a collaborative scratch pad that allow the user and agent to get alignment on context. When working on tasks in this repository, follow this collaborative documentation workflow to create a clear trail of decisions and implementations: + +1. At the start of each new task, ask me for a breadcrumb file name if you can't determine a suitable one. + +2. Create the breadcrumb file in the `${REPO}/.github/.copilot/breadcrumbs` folder using the format: `yyyy-mm-dd-HHMM-{title}.md` (*year-month-date-current_time_in-24hr_format-{title}.md* using UTC timezone) + +3. Structure the breadcrumb file with these required sections: + - **Requirements**: Clear list of what needs to be implemented. + - **Additional comments from user**: Any additional input from the user during the conversation. + - **Plan**: Strategy and technical plan before implementation. + - **Decisions**: Why specific implementation choices were made. + - **Implementation Details**: Code snippets with explanations for key files. + - **Changes Made**: Summary of files modified and how they changed. + - **Before/After Comparison**: Highlighting the improvements. + - **References**: List of referred material like domain knowledge files, specification files, URLs and summary of what is was used for. If there is a version in the domain knowledge or in the specifications, record the version in the breadcrumb. + +4. Workflow rules: + - Update the breadcrumb **BEFORE** making any code changes. + - **Get explicit approval** on the plan before implementation. + - Update the breadcrumb **AFTER completing each significant change**. + - Keep the breadcrumb as our single source of truth as it contains the most recent information. + +5. Ask me to verify the plan with: "Are you happy with this implementation plan?" before proceeding with code changes. + +6. Reference related breadcrumbs when a task builds on previous work. + +7. Before concluding, ensure the breadcrumb file properly documents the entire process, including any course corrections or challenges encountered. + +This practice creates a trail of decision points that document our thought process while building features in this solution, making pull request review for the current change easier to follow as well. + +### Plan Structure Guidelines +- When creating a plan, organize it into numbered phases (e.g., "Phase 1: Setup Dependencies"). +- Break down each phase into specific tasks with numeric identifiers (e.g., "Task 1.1: Add Dependencies"). +- Include a detailed checklist at the end of the document that maps to all phases and tasks. +- Plan should always lookup related information from `domain_knowledge` folder and any applicable specification files. +- Mark tasks as `- [ ]` for pending tasks and `- [x]` for completed tasks. +- Start all planning tasks as unchecked, and update them to checked as implementation proceeds. +- Each planning task should have clear success criteria. +- End the plan with success criteria that define when the implementation is complete. +- Plans should start with writing Unit Tests first when possible, so we can use those to guide our implementation. Same for UI tests when it makes sense. +- If the domain knowledge has changed, update the related files in the `domain_knowledge` folder. +- If specifications have changed, update the related files in the `specifications` folder. + +### Following Plans +- When coding you need to follow the plan phases and check off the tasks as they are completed. +- As you complete a task, update the plan and mark that task complete before you begin the next task. +- Tasks that involved tests should not be marked complete until the tests pass. + +### Example Usage + +**User**: I need to implement JWT authentication for our Express API. + +**Agent**: I'll help with implementing JWT authentication for your Express API. Let me create a breadcrumb file to track our implementation. What would you like to name this breadcrumb? + +**User**: Use "jwt-auth-implementation" + +**Agent**: I've created the breadcrumb file `2025-04-08-1430-jwt-auth-implementation.md`. Are you happy with this implementation plan? + +[high level plan with tasks here] + +**User**: Looks good, but let's use 30-minute expiration for tokens. + +**Agent**: I've updated the breadcrumb with your specification: Are you happy with this implementation plan? + +[high level plan with tasks here] + +**User**: Yes, please proceed. + +**Agent**: I'll implement the JWT authentication now. Let me start by installing the necessary packages and creating the authentication middleware. I've updated the breadcrumb with implementation details. Is there anything specific you'd like me to modify or explain about the implementation? + +**User**: This looks good. + +**Agent**: I've updated the breadcrumb with the latest understanding. \ No newline at end of file diff --git a/.github/workflows/upgrade.yml b/.github/workflows/upgrade.yml index 2059813a0..f2d5b140a 100644 --- a/.github/workflows/upgrade.yml +++ b/.github/workflows/upgrade.yml @@ -56,22 +56,84 @@ jobs: - name: Set up Ginkgo CLI run: | go install github.com/onsi/ginkgo/v2/ginkgo@v2.19.1 + + - name: Travel back in time to the before upgrade version + run: | + GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." - - name: Prepare the fleet - run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" chmod +x setup.sh && ./setup.sh 3 && cd - + echo "PREVIOUS_COMMIT=$PREVIOUS_COMMIT" >> "$GITHUB_ENV" + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the state tracked by the last known tag." + echo "Fetch all tags..." + + git fetch --all + GIT_TAG=$(git describe --tags $(git rev-list --tags --max-count=1)) + + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + fi + + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." + + - name: Prepare the fleet using the before upgrade version + run: cd test/upgrade && chmod +x setup.sh && ./setup.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' HUB_SERVER_URL: 'https://172.19.0.2:6443' + + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." - name: Run the Before suite run: cd test/upgrade/before && ginkgo -v -p . && cd - env: KUBECONFIG: '/home/runner/.kube/config' + + - name: Travel back in time to the after upgrade version + run: | + GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the current state." + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." + fi - - name: Upgrade the Fleet hub agent - run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" chmod +x upgrade.sh && UPGRADE_HUB_SIDE=true ./upgrade.sh 3 && cd - + - name: Upgrade the Fleet hub agent to the after upgrade version + run: cd test/upgrade && chmod +x upgrade.sh && UPGRADE_HUB_SIDE=true ./upgrade.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' + + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." - name: Run the After suite run: cd test/upgrade/after && ginkgo -v -p . && cd - @@ -98,22 +160,84 @@ jobs: - name: Set up Ginkgo CLI run: | go install github.com/onsi/ginkgo/v2/ginkgo@v2.19.1 + + - name: Travel back in time to the before upgrade version + run: | + GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." + + echo "PREVIOUS_COMMIT=$PREVIOUS_COMMIT" >> "$GITHUB_ENV" + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the state tracked by the last known tag." + echo "Fetch all tags..." + + git fetch --all + GIT_TAG=$(git describe --tags $(git rev-list --tags --max-count=1)) + + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + fi + + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." - name: Prepare the fleet - run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" chmod +x setup.sh && ./setup.sh 3 && cd - + run: cd test/upgrade && chmod +x setup.sh && ./setup.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' HUB_SERVER_URL: 'https://172.19.0.2:6443' + + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." - name: Run the Before suite run: cd test/upgrade/before && ginkgo -v -p . && cd - env: KUBECONFIG: '/home/runner/.kube/config' + + - name: Travel back in time to the after upgrade version + run: | + GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the current state." + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." + fi - name: Upgrade the Fleet member agent - run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" chmod +x upgrade.sh && UPGRADE_MEMBER_SIDE=true ./upgrade.sh 3 && cd - + run: cd test/upgrade && chmod +x upgrade.sh && UPGRADE_MEMBER_SIDE=true ./upgrade.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' + + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." - name: Run the After suite run: cd test/upgrade/after && ginkgo -v -p . && cd - @@ -141,22 +265,84 @@ jobs: run: | go install github.com/onsi/ginkgo/v2/ginkgo@v2.19.1 + - name: Travel back in time to the before upgrade version + run: | + GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." + + echo "PREVIOUS_COMMIT=$PREVIOUS_COMMIT" >> "$GITHUB_ENV" + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the state tracked by the last known tag." + echo "Fetch all tags..." + + git fetch --all + GIT_TAG=$(git describe --tags $(git rev-list --tags --max-count=1)) + + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + fi + + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." + - name: Prepare the fleet - run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.beforeTagOrCommit }}" chmod +x setup.sh && ./setup.sh 3 && cd - + run: cd test/upgrade && chmod +x setup.sh && ./setup.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' HUB_SERVER_URL: 'https://172.19.0.2:6443' + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." + - name: Run the Before suite run: cd test/upgrade/before && ginkgo -v -p . && cd - env: KUBECONFIG: '/home/runner/.kube/config' + - name: Travel back in time to the after upgrade version + run: | + GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" + PREVIOUS_BRANCH=$(git branch --show-current) + PREVIOUS_COMMIT=$(git rev-parse HEAD) + echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." + + if [ -z "${GIT_TAG}" ] + then + echo "No tag is specified; go back to the current state." + else + echo "A tag is specified; go back to the state tracked by the specified tag." + echo "Fetch all tags..." + + git fetch --all + git checkout $GIT_TAG + echo "Checked out source code at $GIT_TAG." + fi + - name: Upgrade all Fleet agents run: cd test/upgrade && GIT_TAG="${{ github.event.inputs.afterTagOrCommit }}" chmod +x upgrade.sh && UPGRADE_HUB_SIDE=true UPGRADE_MEMBER_SIDE=true ./upgrade.sh 3 && cd - env: KUBECONFIG: '/home/runner/.kube/config' + - name: Travel to the current state + # Note: Fleet always uses the version compatibility test suite from the + # baseline commit, i.e., the commit that triggers the workflow. + run: | + echo "Returning to the current state..." + git checkout $PREVIOUS_COMMIT + echo "Checked out source code at $PREVIOUS_COMMIT." + - name: Run the After suite run: cd test/upgrade/after && ginkgo -v -p . && cd - env: diff --git a/Makefile b/Makefile index d8c3438d4..fa83cadf9 100644 --- a/Makefile +++ b/Makefile @@ -141,7 +141,7 @@ local-unit-test: $(ENVTEST) ## Run tests. export CGO_ENABLED=1 && \ export KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path)" && \ go test ./pkg/controllers/workv1alpha1 -race -coverprofile=ut-coverage.xml -covermode=atomic -v && \ - go test `go list ./pkg/... ./cmd/... | grep -v pkg/controllers/workv1alpha1` -race -coverprofile=ut-coverage.xml -covermode=atomic -v + go test `go list ./pkg/... ./cmd/... | grep -v pkg/controllers/workv1alpha1` -race -coverpkg=./... -coverprofile=ut-coverage.xml -covermode=atomic -v .PHONY: integration-test integration-test: $(ENVTEST) ## Run tests. diff --git a/apis/placement/v1beta1/stageupdate_types.go b/apis/placement/v1beta1/stageupdate_types.go index f7a58b300..ac216c4ef 100644 --- a/apis/placement/v1beta1/stageupdate_types.go +++ b/apis/placement/v1beta1/stageupdate_types.go @@ -143,6 +143,8 @@ type StageConfig struct { // Each task is executed in parallel and there cannot be more than one task of the same type. // +kubebuilder:validation:MaxItems=2 // +kubebuilder:validation:Optional + // +kubebuilder:validation:XValidation:rule="!self.exists(e, e.type == 'Approval' && has(e.waitTime))",message="AfterStageTaskType is Approval, waitTime is not allowed" + // +kubebuilder:validation:XValidation:rule="!self.exists(e, e.type == 'TimedWait' && !has(e.waitTime))",message="AfterStageTaskType is TimedWait, waitTime is required" AfterStageTasks []AfterStageTask `json:"afterStageTasks,omitempty"` } @@ -157,7 +159,7 @@ type AfterStageTask struct { // +kubebuilder:validation:Pattern="^0|([0-9]+(\\.[0-9]+)?(s|m|h))+$" // +kubebuilder:validation:Type=string // +kubebuilder:validation:Optional - WaitTime metav1.Duration `json:"waitTime,omitempty"` + WaitTime *metav1.Duration `json:"waitTime,omitempty"` } // StagedUpdateRunStatus defines the observed state of the ClusterStagedUpdateRun. diff --git a/charts/hub-agent/templates/deployment.yaml b/charts/hub-agent/templates/deployment.yaml index 5fef42640..4d53f3d3c 100644 --- a/charts/hub-agent/templates/deployment.yaml +++ b/charts/hub-agent/templates/deployment.yaml @@ -33,6 +33,8 @@ spec: - --enable-cluster-inventory-apis={{ .Values.enableClusterInventoryAPI }} - --enable-staged-update-run-apis={{ .Values.enableStagedUpdateRunAPIs }} - --enable-eviction-apis={{ .Values.enableEvictionAPIs}} + - --enable-pprof={{ .Values.enablePprof }} + - --pprof-port={{ .Values.pprofPort }} - --max-concurrent-cluster-placement={{ .Values.MaxConcurrentClusterPlacement }} - --concurrent-resource-change-syncs={{ .Values.ConcurrentResourceChangeSyncs }} - --log_file_max_size={{ .Values.logFileMaxSize }} @@ -48,6 +50,11 @@ spec: - name: healthz containerPort: 8081 protocol: TCP + {{- if .Values.enablePprof }} + - containerPort: {{ .Values.pprofPort }} + name: pprof + protocol: TCP + {{- end }} livenessProbe: httpGet: path: /healthz diff --git a/charts/hub-agent/values.yaml b/charts/hub-agent/values.yaml index 845fa1c93..8d3dfe291 100644 --- a/charts/hub-agent/values.yaml +++ b/charts/hub-agent/values.yaml @@ -39,6 +39,9 @@ enableClusterInventoryAPI: true enableStagedUpdateRunAPIs: true enableEvictionAPIs: true +enablePprof: true +pprofPort: 6065 + hubAPIQPS: 250 hubAPIBurst: 1000 MaxConcurrentClusterPlacement: 100 diff --git a/charts/member-agent/templates/deployment.yaml b/charts/member-agent/templates/deployment.yaml index 41751d93c..6b302b801 100644 --- a/charts/member-agent/templates/deployment.yaml +++ b/charts/member-agent/templates/deployment.yaml @@ -34,6 +34,9 @@ spec: - -add_dir_header - --enable-v1alpha1-apis={{ .Values.enableV1Alpha1APIs }} - --enable-v1beta1-apis={{ .Values.enableV1Beta1APIs }} + - --enable-pprof={{ .Values.enablePprof }} + - --pprof-port={{ .Values.pprofPort }} + - --hub-pprof-port={{ .Values.hubPprofPort }} {{- if .Values.propertyProvider }} - --property-provider={{ .Values.propertyProvider }} {{- end }} @@ -75,6 +78,14 @@ spec: - containerPort: 8091 name: memberhealthz protocol: TCP + {{- if .Values.enablePprof }} + - containerPort: {{ .Values.pprofPort }} + name: memberpprof + protocol: TCP + - containerPort: {{ .Values.hubPprofPort }} + name: hubpprof + protocol: TCP + {{- end }} livenessProbe: httpGet: path: /healthz diff --git a/charts/member-agent/values.yaml b/charts/member-agent/values.yaml index 741050636..0bb28cc5d 100644 --- a/charts/member-agent/values.yaml +++ b/charts/member-agent/values.yaml @@ -61,3 +61,7 @@ useCAAuth: false enableV1Alpha1APIs: true enableV1Beta1APIs: false + +enablePprof: true +pprofPort: 6065 +hubPprofPort: 6066 diff --git a/cmd/hubagent/main.go b/cmd/hubagent/main.go index becd0559f..b94c5f526 100644 --- a/cmd/hubagent/main.go +++ b/cmd/hubagent/main.go @@ -18,6 +18,7 @@ package main import ( "flag" + "fmt" "math" "os" "strings" @@ -113,7 +114,7 @@ func main() { config := ctrl.GetConfigOrDie() config.QPS, config.Burst = float32(opts.HubQPS), opts.HubBurst - mgr, err := ctrl.NewManager(config, ctrl.Options{ + mgrOpts := ctrl.Options{ Scheme: scheme, Cache: cache.Options{ SyncPeriod: &opts.ResyncPeriod.Duration, @@ -130,7 +131,11 @@ func main() { Port: FleetWebhookPort, CertDir: FleetWebhookCertDir, }), - }) + } + if opts.EnablePprof { + mgrOpts.PprofBindAddress = fmt.Sprintf(":%d", opts.PprofPort) + } + mgr, err := ctrl.NewManager(config, mgrOpts) if err != nil { klog.ErrorS(err, "unable to start controller manager.") exitWithErrorFunc() diff --git a/cmd/hubagent/options/options.go b/cmd/hubagent/options/options.go index c6e1d214a..ba1e38a5e 100644 --- a/cmd/hubagent/options/options.go +++ b/cmd/hubagent/options/options.go @@ -98,6 +98,10 @@ type Options struct { EnableStagedUpdateRunAPIs bool // EnableEvictionAPIs enables to agents to watch the eviction and placement disruption budget CRs. EnableEvictionAPIs bool + // EnablePprof enables the pprof profiling. + EnablePprof bool + // PprofPort is the port for pprof profiling. + PprofPort int // DenyModifyMemberClusterLabels indicates if the member cluster labels cannot be modified by groups (excluding system:masters) DenyModifyMemberClusterLabels bool } @@ -117,6 +121,8 @@ func NewOptions() *Options { EnableV1Alpha1APIs: false, EnableClusterInventoryAPIs: true, EnableStagedUpdateRunAPIs: true, + EnablePprof: false, + PprofPort: 6065, } } @@ -160,6 +166,8 @@ func (o *Options) AddFlags(flags *flag.FlagSet) { flags.DurationVar(&o.ForceDeleteWaitTime.Duration, "force-delete-wait-time", 15*time.Minute, "The duration the hub agent waits before force deleting a member cluster.") flags.BoolVar(&o.EnableStagedUpdateRunAPIs, "enable-staged-update-run-apis", true, "If set, the agents will watch for the ClusterStagedUpdateRun APIs.") flags.BoolVar(&o.EnableEvictionAPIs, "enable-eviction-apis", true, "If set, the agents will watch for the Eviction and PlacementDisruptionBudget APIs.") + flags.BoolVar(&o.EnablePprof, "enable-pprof", false, "If set, the pprof profiling is enabled.") + flags.IntVar(&o.PprofPort, "pprof-port", 6065, "The port for pprof profiling.") flags.BoolVar(&o.DenyModifyMemberClusterLabels, "deny-modify-member-cluster-labels", false, "If set, users not in the system:masters cannot modify member cluster labels.") o.RateLimiterOpts.AddFlags(flags) diff --git a/cmd/memberagent/main.go b/cmd/memberagent/main.go index 43aed3d39..ceeed0d39 100644 --- a/cmd/memberagent/main.go +++ b/cmd/memberagent/main.go @@ -91,6 +91,9 @@ var ( driftDetectionInterval = flag.Int("drift-detection-interval", 15, "The interval in seconds between attempts to detect configuration drifts in the cluster.") watchWorkWithPriorityQueue = flag.Bool("enable-watch-work-with-priority-queue", false, "If set, the apply_work controller will watch/reconcile work objects that are created new or have recent updates") watchWorkReconcileAgeMinutes = flag.Int("watch-work-reconcile-age", 60, "maximum age (in minutes) of work objects for apply_work controller to watch/reconcile") + enablePprof = flag.Bool("enable-pprof", false, "enable pprof profiling") + pprofPort = flag.Int("pprof-port", 6065, "port for pprof profiling") + hubPprofPort = flag.Int("hub-pprof-port", 6066, "port for hub pprof profiling") ) func init() { @@ -185,6 +188,10 @@ func main() { LeaderElectionID: "136224848560.member.fleet.azure.com", } //+kubebuilder:scaffold:builder + if *enablePprof { + memberOpts.PprofBindAddress = fmt.Sprintf(":%d", *pprofPort) + hubOpts.PprofBindAddress = fmt.Sprintf(":%d", *hubPprofPort) + } if err := Start(ctrl.SetupSignalHandler(), hubConfig, memberConfig, hubOpts, memberOpts); err != nil { klog.ErrorS(err, "Failed to start the controllers for the member agent") diff --git a/codecov.yml b/codecov.yml new file mode 100644 index 000000000..da8332f71 --- /dev/null +++ b/codecov.yml @@ -0,0 +1,4 @@ +ignore: + - "apis/**/*" + - "cmd/**/*" + - "test/**/*" diff --git a/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdateruns.yaml b/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdateruns.yaml index 7fb164af9..e7c6ec68c 100644 --- a/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdateruns.yaml +++ b/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdateruns.yaml @@ -1814,6 +1814,13 @@ spec: type: object maxItems: 2 type: array + x-kubernetes-validations: + - message: AfterStageTaskType is Approval, waitTime is not + allowed + rule: '!self.exists(e, e.type == ''Approval'' && has(e.waitTime))' + - message: AfterStageTaskType is TimedWait, waitTime is + required + rule: '!self.exists(e, e.type == ''TimedWait'' && !has(e.waitTime))' labelSelector: description: |- LabelSelector is a label query over all the joined member clusters. Clusters matching the query are selected diff --git a/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdatestrategies.yaml b/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdatestrategies.yaml index f1dc847c1..338ad2aad 100644 --- a/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdatestrategies.yaml +++ b/config/crd/bases/placement.kubernetes-fleet.io_clusterstagedupdatestrategies.yaml @@ -218,6 +218,11 @@ spec: type: object maxItems: 2 type: array + x-kubernetes-validations: + - message: AfterStageTaskType is Approval, waitTime is not allowed + rule: '!self.exists(e, e.type == ''Approval'' && has(e.waitTime))' + - message: AfterStageTaskType is TimedWait, waitTime is required + rule: '!self.exists(e, e.type == ''TimedWait'' && !has(e.waitTime))' labelSelector: description: |- LabelSelector is a label query over all the joined member clusters. Clusters matching the query are selected diff --git a/pkg/controllers/clusterresourceplacementeviction/controller_intergration_test.go b/pkg/controllers/clusterresourceplacementeviction/controller_intergration_test.go index 1f8e6e093..419a56b48 100644 --- a/pkg/controllers/clusterresourceplacementeviction/controller_intergration_test.go +++ b/pkg/controllers/clusterresourceplacementeviction/controller_intergration_test.go @@ -481,6 +481,55 @@ var _ = Describe("Test ClusterResourcePlacementEviction Controller", func() { checkEvictionCompleteMetric(customRegistry, "true", "true") }) }) + + It("Invalid Eviction Blocked - PickFixed CRP, invalid eviction denied - No PDB specified", func() { + crpName := fmt.Sprintf(crpNameTemplate, GinkgoParallelProcess()) + + // Create the CRP. + By("Create ClusterResourcePlacement", func() { + crp := placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: crpName, + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickFixedPlacementType, + ClusterNames: []string{"test-cluster-1"}, + }, + ResourceSelectors: []placementv1beta1.ClusterResourceSelector{ + { + Group: "", + Kind: "Namespace", + Version: "v1", + Name: "test-ns", + }, + }, + }, + } + Expect(k8sClient.Create(ctx, &crp)).Should(Succeed()) + // ensure CRP exists. + Eventually(func() error { + return k8sClient.Get(ctx, types.NamespacedName{Name: crp.Name}, &crp) + }, eventuallyDuration, eventuallyInterval).Should(Succeed()) + }) + + By("Create ClusterResourcePlacementEviction", func() { + eviction := buildTestEviction(evictionName, crpName, "test-cluster") + Expect(k8sClient.Create(ctx, eviction)).Should(Succeed()) + }) + + By("Check eviction status", func() { + evictionStatusUpdatedActual := testutilseviction.StatusUpdatedActual( + ctx, k8sClient, evictionName, + &testutilseviction.IsValidEviction{IsValid: false, Msg: condition.EvictionInvalidPickFixedCRPMessage}, + nil) + Eventually(evictionStatusUpdatedActual, eventuallyDuration, eventuallyInterval).Should(Succeed()) + }) + + By("Ensure eviction complete metric was emitted", func() { + checkEvictionCompleteMetric(customRegistry, "false", "true") + }) + }) }) func buildTestPickNCRP(crpName string, clusterCount int32) placementv1beta1.ClusterResourcePlacement { diff --git a/pkg/controllers/updaterun/controller.go b/pkg/controllers/updaterun/controller.go index a4fc7b1eb..d07987e4e 100644 --- a/pkg/controllers/updaterun/controller.go +++ b/pkg/controllers/updaterun/controller.go @@ -79,6 +79,9 @@ func (r *Reconciler) Reconcile(ctx context.Context, req runtime.Request) (runtim } runObjRef := klog.KObj(&updateRun) + // Remove waitTime from the updateRun status for AfterStageTask for type Approval. + removeWaitTimeFromUpdateRunStatus(&updateRun) + // Handle the deletion of the clusterStagedUpdateRun. if !updateRun.DeletionTimestamp.IsZero() { klog.V(2).InfoS("The clusterStagedUpdateRun is being deleted", "clusterStagedUpdateRun", runObjRef) @@ -268,17 +271,22 @@ func (r *Reconciler) SetupWithManager(mgr runtime.Manager) error { Named("clusterresource-stagedupdaterun-controller"). For(&placementv1beta1.ClusterStagedUpdateRun{}, builder.WithPredicates(predicate.GenerationChangedPredicate{})). Watches(&placementv1beta1.ClusterApprovalRequest{}, &handler.Funcs{ - // We only care about when an approval request is approved. + // We watch for ClusterApprovalRequest to be approved. UpdateFunc: func(ctx context.Context, e event.UpdateEvent, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { klog.V(2).InfoS("Handling a clusterApprovalRequest update event", "clusterApprovalRequest", klog.KObj(e.ObjectNew)) - handleClusterApprovalRequest(e.ObjectOld, e.ObjectNew, q) + handleClusterApprovalRequestUpdate(e.ObjectOld, e.ObjectNew, q) + }, + // We watch for ClusterApprovalRequest deletion events to recreate it ASAP. + DeleteFunc: func(ctx context.Context, e event.DeleteEvent, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { + klog.V(2).InfoS("Handling a clusterApprovalRequest delete event", "clusterApprovalRequest", klog.KObj(e.Object)) + handleClusterApprovalRequestDelete(e.Object, q) }, }).Complete(r) } -// handleClusterApprovalRequest finds the ClusterStagedUpdateRun creating the ClusterApprovalRequest, +// handleClusterApprovalRequestUpdate finds the ClusterStagedUpdateRun creating the ClusterApprovalRequest, // and enqueues it to the ClusterStagedUpdateRun controller queue only when the approved condition is changed. -func handleClusterApprovalRequest(oldObj, newObj client.Object, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { +func handleClusterApprovalRequestUpdate(oldObj, newObj client.Object, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { oldAppReq, ok := oldObj.(*placementv1beta1.ClusterApprovalRequest) if !ok { klog.V(2).ErrorS(controller.NewUnexpectedBehaviorError(fmt.Errorf("cannot cast runtime object to ClusterApprovalRequest")), @@ -312,6 +320,34 @@ func handleClusterApprovalRequest(oldObj, newObj client.Object, q workqueue.Type }) } +// handleClusterApprovalRequestDelete finds the ClusterStagedUpdateRun creating the ClusterApprovalRequest, +// and enqueues it to the ClusterStagedUpdateRun controller queue when the ClusterApprovalRequest is deleted. +func handleClusterApprovalRequestDelete(obj client.Object, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { + appReq, ok := obj.(*placementv1beta1.ClusterApprovalRequest) + if !ok { + klog.V(2).ErrorS(controller.NewUnexpectedBehaviorError(fmt.Errorf("cannot cast runtime object to ClusterApprovalRequest")), + "Invalid object type", "object", klog.KObj(obj)) + return + } + isApproved := meta.FindStatusCondition(appReq.Status.Conditions, string(placementv1beta1.ApprovalRequestConditionApproved)) + approvalAccepted := condition.IsConditionStatusTrue(meta.FindStatusCondition(appReq.Status.Conditions, string(placementv1beta1.ApprovalRequestConditionApprovalAccepted)), appReq.Generation) + if isApproved != nil && approvalAccepted { + klog.V(2).InfoS("The approval request has been approved and accepted, ignore queueing for delete event", "clusterApprovalRequest", klog.KObj(appReq)) + return + } + + updateRun := appReq.Spec.TargetUpdateRun + if len(updateRun) == 0 { + klog.V(2).ErrorS(controller.NewUnexpectedBehaviorError(fmt.Errorf("TargetUpdateRun field in ClusterApprovalRequest is empty")), + "Invalid clusterApprovalRequest", "clusterApprovalRequest", klog.KObj(appReq)) + return + } + // enqueue to the updaterun controller queue. + q.Add(reconcile.Request{ + NamespacedName: types.NamespacedName{Name: appReq.Spec.TargetUpdateRun}, + }) +} + // emitUpdateRunStatusMetric emits the update run status metric based on status conditions in the updateRun. func emitUpdateRunStatusMetric(updateRun *placementv1beta1.ClusterStagedUpdateRun) { generation := updateRun.Generation @@ -341,3 +377,16 @@ func emitUpdateRunStatusMetric(updateRun *placementv1beta1.ClusterStagedUpdateRu // We should rarely reach here, it can only happen when updating updateRun status fails. klog.V(2).InfoS("There's no valid status condition on updateRun, status updating failed possibly", "updateRun", klog.KObj(updateRun)) } + +func removeWaitTimeFromUpdateRunStatus(updateRun *placementv1beta1.ClusterStagedUpdateRun) { + // Remove waitTime from the updateRun status for AfterStageTask for type Approval. + if updateRun.Status.StagedUpdateStrategySnapshot != nil { + for i := range updateRun.Status.StagedUpdateStrategySnapshot.Stages { + for j := range updateRun.Status.StagedUpdateStrategySnapshot.Stages[i].AfterStageTasks { + if updateRun.Status.StagedUpdateStrategySnapshot.Stages[i].AfterStageTasks[j].Type == placementv1beta1.AfterStageTaskTypeApproval { + updateRun.Status.StagedUpdateStrategySnapshot.Stages[i].AfterStageTasks[j].WaitTime = nil + } + } + } + } +} diff --git a/pkg/controllers/updaterun/controller_integration_test.go b/pkg/controllers/updaterun/controller_integration_test.go index 38b2d2880..d0d555202 100644 --- a/pkg/controllers/updaterun/controller_integration_test.go +++ b/pkg/controllers/updaterun/controller_integration_test.go @@ -49,8 +49,6 @@ import ( ) const ( - // timeout is the maximum wait time for Eventually - timeout = time.Second * 10 // interval is the time to wait between retries for Eventually and Consistently interval = time.Millisecond * 250 // duration is the time to duration to check for Consistently @@ -67,6 +65,11 @@ const ( testResourceSnapshotIndex = "0" ) +var ( + // timeout is the maximum wait time for Eventually + timeout = time.Second * 10 +) + var ( testUpdateRunName string testCRPName string @@ -466,7 +469,7 @@ func generateTestClusterStagedUpdateStrategy() *placementv1beta1.ClusterStagedUp AfterStageTasks: []placementv1beta1.AfterStageTask{ { Type: placementv1beta1.AfterStageTaskTypeTimedWait, - WaitTime: metav1.Duration{ + WaitTime: &metav1.Duration{ Duration: time.Second * 4, }, }, @@ -490,7 +493,7 @@ func generateTestClusterStagedUpdateStrategy() *placementv1beta1.ClusterStagedUp }, { Type: placementv1beta1.AfterStageTaskTypeTimedWait, - WaitTime: metav1.Duration{ + WaitTime: &metav1.Duration{ Duration: time.Second * 4, }, }, diff --git a/pkg/controllers/updaterun/controller_test.go b/pkg/controllers/updaterun/controller_test.go index 6522e1188..251871741 100644 --- a/pkg/controllers/updaterun/controller_test.go +++ b/pkg/controllers/updaterun/controller_test.go @@ -18,7 +18,9 @@ package updaterun import ( "testing" + "time" + "github.com/google/go-cmp/cmp" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/client-go/util/workqueue" "sigs.k8s.io/controller-runtime/pkg/client" @@ -28,7 +30,7 @@ import ( placementv1beta1 "go.goms.io/fleet/apis/placement/v1beta1" ) -func TestHandleClusterApprovalRequest(t *testing.T) { +func TestHandleClusterApprovalRequestUpdate(t *testing.T) { tests := map[string]struct { oldObj client.Object newObj client.Object @@ -299,16 +301,301 @@ func TestHandleClusterApprovalRequest(t *testing.T) { t.Run(name, func(t *testing.T) { queue := &controllertest.Queue{TypedInterface: workqueue.NewTypedRateLimitingQueue[reconcile.Request]( workqueue.DefaultTypedItemBasedRateLimiter[reconcile.Request]())} - handleClusterApprovalRequest(tt.oldObj, tt.newObj, queue) + handleClusterApprovalRequestUpdate(tt.oldObj, tt.newObj, queue) if got := queue.Len() != 0; got != tt.shouldEnqueue { - t.Fatalf("handleClusterApprovalRequest() shouldEnqueue test `%s` got %t, want %t", name, got, tt.shouldEnqueue) + t.Fatalf("handleClusterApprovalRequest() shouldEnqueue got %t, want %t", got, tt.shouldEnqueue) } if tt.shouldEnqueue { req, _ := queue.TypedInterface.Get() if req.Name != tt.queuedName { - t.Fatalf("handleClusterApprovalRequest() queuedName test `%s` got %s, want %s", name, req.Name, tt.queuedName) + t.Fatalf("handleClusterApprovalRequest() queuedName got %s, want %s", req.Name, tt.queuedName) } } }) } } + +func TestHandleClusterApprovalRequestDelete(t *testing.T) { + tests := map[string]struct { + obj client.Object + shouldEnqueue bool + queuedName string + }{ + "it should not enqueue anything if the obj is not a ClusterApprovalRequest": { + obj: &placementv1beta1.ClusterStagedUpdateRun{}, + shouldEnqueue: false, + }, + "it should not enqueue anything if targetUpdateRun in spec is empty": { + obj: &placementv1beta1.ClusterApprovalRequest{ + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "", + }, + }, + shouldEnqueue: false, + }, + "it should enqueue the targetUpdateRun, if ClusterApprovalRequest has neither Approved/ApprovalAccepted status set": { + obj: &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Generation: 1, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "test-update-run", + }, + }, + shouldEnqueue: true, + queuedName: "test-update-run", + }, + "it should enqueue the targetUpdateRun, if ClusterApprovalRequest has only Approved status set to true": { + obj: &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Generation: 1, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "test-update-run", + }, + Status: placementv1beta1.ApprovalRequestStatus{ + Conditions: []metav1.Condition{ + { + Status: metav1.ConditionTrue, + Type: string(placementv1beta1.ApprovalRequestConditionApproved), + ObservedGeneration: 1, + }, + }, + }, + }, + shouldEnqueue: true, + queuedName: "test-update-run", + }, + "it should enqueue the targetUpdateRun, if ClusterApprovalRequest has only Approved status set to false": { + obj: &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Generation: 1, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "test-update-run", + }, + Status: placementv1beta1.ApprovalRequestStatus{ + Conditions: []metav1.Condition{ + { + Status: metav1.ConditionTrue, + Type: string(placementv1beta1.ApprovalRequestConditionApproved), + ObservedGeneration: 1, + }, + }, + }, + }, + shouldEnqueue: true, + queuedName: "test-update-run", + }, + "it should not enqueue updateRun, if ClusterApprovalRequest has Approved set to false, ApprovalAccepted status set to true": { + obj: &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Generation: 1, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "test-update-run", + }, + Status: placementv1beta1.ApprovalRequestStatus{ + Conditions: []metav1.Condition{ + { + Status: metav1.ConditionFalse, + Type: string(placementv1beta1.ApprovalRequestConditionApproved), + ObservedGeneration: 1, + }, + { + Status: metav1.ConditionTrue, + Type: string(placementv1beta1.ApprovalRequestConditionApprovalAccepted), + ObservedGeneration: 1, + }, + }, + }, + }, + shouldEnqueue: false, + }, + "it should not enqueue updateRun, if ClusterApprovalRequest has Approved, ApprovalAccepted status set to true": { + obj: &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Generation: 1, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: "test-update-run", + }, + Status: placementv1beta1.ApprovalRequestStatus{ + Conditions: []metav1.Condition{ + { + Status: metav1.ConditionTrue, + Type: string(placementv1beta1.ApprovalRequestConditionApproved), + ObservedGeneration: 1, + }, + { + Status: metav1.ConditionTrue, + Type: string(placementv1beta1.ApprovalRequestConditionApprovalAccepted), + ObservedGeneration: 1, + }, + }, + }, + }, + shouldEnqueue: false, + }, + } + + for name, tt := range tests { + t.Run(name, func(t *testing.T) { + queue := &controllertest.Queue{TypedInterface: workqueue.NewTypedRateLimitingQueue[reconcile.Request]( + workqueue.DefaultTypedItemBasedRateLimiter[reconcile.Request]())} + handleClusterApprovalRequestDelete(tt.obj, queue) + if got := queue.Len() != 0; got != tt.shouldEnqueue { + t.Fatalf("handleClusterApprovalRequestDelete() shouldEnqueue got %t, want %t", got, tt.shouldEnqueue) + } + if tt.shouldEnqueue { + req, _ := queue.TypedInterface.Get() + if req.Name != tt.queuedName { + t.Fatalf("handleClusterApprovalRequestDelete() queuedName got %s, want %s", req.Name, tt.queuedName) + } + } + }) + } +} + +func TestRemoveWaitTimeFromUpdateRunStatus(t *testing.T) { + waitTime := metav1.Duration{Duration: 5 * time.Minute} + tests := map[string]struct { + inputUpdateRun *placementv1beta1.ClusterStagedUpdateRun + wantUpdateRun *placementv1beta1.ClusterStagedUpdateRun + }{ + "should handle empty stages": { + inputUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{}, + }, + }, + }, + wantUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{}, + }, + }, + }, + }, + "should handle nil StagedUpdateStrategySnapshot": { + inputUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: nil, + }, + }, + wantUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: nil, + }, + }, + }, + "should remove waitTime from Approval tasks only": { + inputUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + WaitTime: &waitTime, + }, + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &waitTime, + }, + }, + }, + }, + }, + }, + }, + wantUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &waitTime, + }, + }, + }, + }, + }, + }, + }, + }, + "should handle multiple stages": { + inputUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + WaitTime: &waitTime, + }, + }, + }, + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &waitTime, + }, + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + WaitTime: &waitTime, + }, + }, + }, + }, + }, + }, + }, + wantUpdateRun: &placementv1beta1.ClusterStagedUpdateRun{ + Status: placementv1beta1.StagedUpdateRunStatus{ + StagedUpdateStrategySnapshot: &placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + }, + }, + { + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &waitTime, + }, + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + }, + }, + }, + }, + }, + }, + }, + } + + for name, tt := range tests { + t.Run(name, func(t *testing.T) { + removeWaitTimeFromUpdateRunStatus(tt.inputUpdateRun) + if diff := cmp.Diff(tt.wantUpdateRun, tt.inputUpdateRun); diff != "" { + t.Errorf("removeWaitTimeFromUpdateRunStatus() mismatch (-want +got):\n%s", diff) + } + }) + } +} diff --git a/pkg/controllers/updaterun/execution.go b/pkg/controllers/updaterun/execution.go index aec1a4ba9..3c1fce510 100644 --- a/pkg/controllers/updaterun/execution.go +++ b/pkg/controllers/updaterun/execution.go @@ -318,6 +318,11 @@ func (r *Reconciler) checkAfterStageTasksStatus(ctx context.Context, updatingSta klog.V(2).InfoS("The after stage wait task has completed", "stage", updatingStage.Name, "clusterStagedUpdateRun", updateRunRef) } case placementv1beta1.AfterStageTaskTypeApproval: + afterStageTaskApproved := condition.IsConditionStatusTrue(meta.FindStatusCondition(updatingStageStatus.AfterStageTaskStatus[i].Conditions, string(placementv1beta1.AfterStageTaskConditionApprovalRequestApproved)), updateRun.Generation) + if afterStageTaskApproved { + // The afterStageTask has been approved. + continue + } // Check if the approval request has been created. approvalRequest := placementv1beta1.ClusterApprovalRequest{ ObjectMeta: metav1.ObjectMeta{ diff --git a/pkg/controllers/updaterun/execution_integration_test.go b/pkg/controllers/updaterun/execution_integration_test.go index f908438a4..392008d94 100644 --- a/pkg/controllers/updaterun/execution_integration_test.go +++ b/pkg/controllers/updaterun/execution_integration_test.go @@ -32,6 +32,7 @@ import ( "k8s.io/apimachinery/pkg/api/meta" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/types" + "k8s.io/utils/ptr" clusterv1beta1 "go.goms.io/fleet/apis/cluster/v1beta1" placementv1alpha1 "go.goms.io/fleet/apis/placement/v1alpha1" @@ -305,7 +306,6 @@ var _ = Describe("UpdateRun execution tests", func() { It("Should complete the 1st stage after wait time passed and approval request approved and move on to the 2nd stage", func() { By("Validating the approvalRequest has been created") - approvalRequest := &placementv1beta1.ClusterApprovalRequest{} wantApprovalRequest := &placementv1beta1.ClusterApprovalRequest{ ObjectMeta: metav1.ObjectMeta{ Name: updateRun.Status.StagesStatus[0].AfterStageTaskStatus[1].ApprovalRequestName, @@ -320,22 +320,10 @@ var _ = Describe("UpdateRun execution tests", func() { TargetStage: updateRun.Status.StagesStatus[0].StageName, }, } - Eventually(func() error { - if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { - return err - } - if diff := cmp.Diff(wantApprovalRequest.Spec, approvalRequest.Spec); diff != "" { - return fmt.Errorf("approvalRequest has different spec (-want +got):\n%s", diff) - } - if diff := cmp.Diff(wantApprovalRequest.Labels, approvalRequest.Labels); diff != "" { - return fmt.Errorf("approvalRequest has different labels (-want +got):\n%s", diff) - } - return nil - }, timeout, interval).Should(Succeed(), "failed to validate the approvalRequest") + validateApprovalRequestCreated(wantApprovalRequest) By("Approving the approvalRequest") - meta.SetStatusCondition(&approvalRequest.Status.Conditions, generateTrueCondition(approvalRequest, placementv1beta1.ApprovalRequestConditionApproved)) - Expect(k8sClient.Status().Update(ctx, approvalRequest)).Should(Succeed(), "failed to update the approvalRequest status") + approveClusterApprovalRequest(ctx, wantApprovalRequest.Name) By("Validating both after stage tasks have completed and 2nd stage has started") // Timedwait afterStageTask completed. @@ -469,7 +457,6 @@ var _ = Describe("UpdateRun execution tests", func() { It("Should complete the 2nd stage after both after stage tasks are completed and move on to the delete stage", func() { By("Validating the approvalRequest has been created") - approvalRequest := &placementv1beta1.ClusterApprovalRequest{} wantApprovalRequest := &placementv1beta1.ClusterApprovalRequest{ ObjectMeta: metav1.ObjectMeta{ Name: updateRun.Status.StagesStatus[1].AfterStageTaskStatus[0].ApprovalRequestName, @@ -484,22 +471,10 @@ var _ = Describe("UpdateRun execution tests", func() { TargetStage: updateRun.Status.StagesStatus[1].StageName, }, } - Eventually(func() error { - if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { - return err - } - if diff := cmp.Diff(wantApprovalRequest.Spec, approvalRequest.Spec); diff != "" { - return fmt.Errorf("approvalRequest has different spec (-want +got):\n%s", diff) - } - if diff := cmp.Diff(wantApprovalRequest.Labels, approvalRequest.Labels); diff != "" { - return fmt.Errorf("approvalRequest has different labels (-want +got):\n%s", diff) - } - return nil - }, timeout, interval).Should(Succeed(), "failed to validate the approvalRequest") + validateApprovalRequestCreated(wantApprovalRequest) By("Approving the approvalRequest") - meta.SetStatusCondition(&approvalRequest.Status.Conditions, generateTrueCondition(approvalRequest, placementv1beta1.ApprovalRequestConditionApproved)) - Expect(k8sClient.Status().Update(ctx, approvalRequest)).Should(Succeed(), "failed to update the approvalRequest status") + approveClusterApprovalRequest(ctx, wantApprovalRequest.Name) By("Validating the 2nd stage has completed and the delete stage has started") wantStatus.StagesStatus[1].AfterStageTaskStatus[0].Conditions = append(wantStatus.StagesStatus[1].AfterStageTaskStatus[0].Conditions, @@ -531,7 +506,8 @@ var _ = Describe("UpdateRun execution tests", func() { By("Validating the approvalRequest has ApprovalAccepted status") Eventually(func() (bool, error) { - if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { + var approvalRequest placementv1beta1.ClusterApprovalRequest + if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, &approvalRequest); err != nil { return false, err } return condition.IsConditionStatusTrue(meta.FindStatusCondition(approvalRequest.Status.Conditions, string(placementv1beta1.ApprovalRequestConditionApprovalAccepted)), approvalRequest.Generation), nil @@ -813,7 +789,6 @@ var _ = Describe("UpdateRun execution tests", func() { It("Should complete the 2nd stage after the after stage task is completed and move on to the delete stage", func() { By("Validating the approvalRequest has been created") - approvalRequest := &placementv1beta1.ClusterApprovalRequest{} wantApprovalRequest := &placementv1beta1.ClusterApprovalRequest{ ObjectMeta: metav1.ObjectMeta{ Name: updateRun.Status.StagesStatus[1].AfterStageTaskStatus[0].ApprovalRequestName, @@ -828,22 +803,10 @@ var _ = Describe("UpdateRun execution tests", func() { TargetStage: updateRun.Status.StagesStatus[1].StageName, }, } - Eventually(func() error { - if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { - return err - } - if diff := cmp.Diff(wantApprovalRequest.Spec, approvalRequest.Spec); diff != "" { - return fmt.Errorf("approvalRequest has different spec (-want +got):\n%s", diff) - } - if diff := cmp.Diff(wantApprovalRequest.Labels, approvalRequest.Labels); diff != "" { - return fmt.Errorf("approvalRequest has different labels (-want +got):\n%s", diff) - } - return nil - }, timeout, interval).Should(Succeed(), "failed to validate the approvalRequest") + validateApprovalRequestCreated(wantApprovalRequest) By("Approving the approvalRequest") - meta.SetStatusCondition(&approvalRequest.Status.Conditions, generateTrueCondition(approvalRequest, placementv1beta1.ApprovalRequestConditionApproved)) - Expect(k8sClient.Status().Update(ctx, approvalRequest)).Should(Succeed(), "failed to update the approvalRequest status") + approveClusterApprovalRequest(ctx, wantApprovalRequest.Name) By("Validating the 2nd stage has completed and the delete stage has started") wantStatus.StagesStatus[1].AfterStageTaskStatus[0].Conditions = append(wantStatus.StagesStatus[1].AfterStageTaskStatus[0].Conditions, @@ -863,7 +826,8 @@ var _ = Describe("UpdateRun execution tests", func() { By("Validating the approvalRequest has ApprovalAccepted status") Eventually(func() (bool, error) { - if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { + var approvalRequest placementv1beta1.ClusterApprovalRequest + if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, &approvalRequest); err != nil { return false, err } return condition.IsConditionStatusTrue(meta.FindStatusCondition(approvalRequest.Status.Conditions, string(placementv1beta1.ApprovalRequestConditionApprovalAccepted)), approvalRequest.Generation), nil @@ -972,6 +936,356 @@ var _ = Describe("UpdateRun execution tests", func() { }) }) +var _ = Describe("UpdateRun execution tests - delete ClusterApprovalRequest, don't recreate", func() { + var updateRun *placementv1beta1.ClusterStagedUpdateRun + var crp *placementv1beta1.ClusterResourcePlacement + var policySnapshot *placementv1beta1.ClusterSchedulingPolicySnapshot + var updateStrategy *placementv1beta1.ClusterStagedUpdateStrategy + var resourceBindings []*placementv1beta1.ClusterResourceBinding + var targetClusters []*clusterv1beta1.MemberCluster + var resourceSnapshot *placementv1beta1.ClusterResourceSnapshot + var wantStatus *placementv1beta1.StagedUpdateRunStatus + + BeforeEach(OncePerOrdered, func() { + testUpdateRunName = "updaterun-" + utils.RandStr() + testCRPName = "crp-" + utils.RandStr() + testResourceSnapshotName = testCRPName + "-" + testResourceSnapshotIndex + "-snapshot" + testUpdateStrategyName = "updatestrategy-" + utils.RandStr() + testCROName = "cro-" + utils.RandStr() + updateRunNamespacedName = types.NamespacedName{Name: testUpdateRunName} + + updateRun = generateTestClusterStagedUpdateRun() + crp = generateTestClusterResourcePlacement() + policySnapshot = &placementv1beta1.ClusterSchedulingPolicySnapshot{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf(placementv1beta1.PolicySnapshotNameFmt, testCRPName, 1), + Labels: map[string]string{ + "kubernetes-fleet.io/parent-CRP": testCRPName, + "kubernetes-fleet.io/is-latest-snapshot": "true", + "kubernetes-fleet.io/policy-index": strconv.Itoa(1), + }, + Annotations: map[string]string{ + "kubernetes-fleet.io/number-of-clusters": strconv.Itoa(3), + }, + }, + Spec: placementv1beta1.SchedulingPolicySnapshotSpec{ + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickNPlacementType, + }, + PolicyHash: []byte("hash"), + }, + } + updateStrategy = &placementv1beta1.ClusterStagedUpdateStrategy{ + ObjectMeta: metav1.ObjectMeta{ + Name: testUpdateStrategyName, + }, + Spec: placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + Name: "stage1", + LabelSelector: &metav1.LabelSelector{ + MatchLabels: map[string]string{ + "group": "prod", + "region": "eastus", + }, + }, + SortingLabelKey: ptr.To("index"), + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &metav1.Duration{ + // Set a large wait time to approve, delete the approval request + // and trigger an update run reconcile after time elapses. + Duration: time.Second * 90, + }, + }, + }, + }, + }, + }, + } + + resourceBindings = make([]*placementv1beta1.ClusterResourceBinding, 3) + targetClusters = make([]*clusterv1beta1.MemberCluster, 3) + region := regionEastus + for i := range targetClusters { + targetClusters[i] = generateTestMemberCluster(i, "cluster-"+strconv.Itoa(i), map[string]string{"group": "prod", "region": region}) + resourceBindings[i] = generateTestClusterResourceBinding(policySnapshot.Name, targetClusters[i].Name, placementv1beta1.BindingStateScheduled) + } + + var err error + testNamespace, err = json.Marshal(corev1.Namespace{ + TypeMeta: metav1.TypeMeta{ + APIVersion: "v1", + Kind: "Namespace", + }, + ObjectMeta: metav1.ObjectMeta{ + Name: "test-namespace", + Labels: map[string]string{ + "fleet.azure.com/name": "test-namespace", + }, + }, + }) + Expect(err).To(Succeed()) + resourceSnapshot = generateTestClusterResourceSnapshot() + + // Set smaller wait time for testing + stageUpdatingWaitTime = time.Second * 3 + clusterUpdatingWaitTime = time.Second * 2 + + By("Creating a new clusterResourcePlacement") + Expect(k8sClient.Create(ctx, crp)).To(Succeed()) + + By("Creating scheduling policy snapshot") + Expect(k8sClient.Create(ctx, policySnapshot)).To(Succeed()) + + By("Setting the latest policy snapshot condition as fully scheduled") + meta.SetStatusCondition(&policySnapshot.Status.Conditions, metav1.Condition{ + Type: string(placementv1beta1.PolicySnapshotScheduled), + Status: metav1.ConditionTrue, + ObservedGeneration: policySnapshot.Generation, + Reason: "scheduled", + }) + Expect(k8sClient.Status().Update(ctx, policySnapshot)).Should(Succeed(), "failed to update the policy snapshot condition") + + By("Creating the member clusters") + for _, cluster := range targetClusters { + Expect(k8sClient.Create(ctx, cluster)).To(Succeed()) + } + + By("Creating a bunch of ClusterResourceBindings") + for _, binding := range resourceBindings { + Expect(k8sClient.Create(ctx, binding)).To(Succeed()) + } + + By("Creating a clusterStagedUpdateStrategy") + Expect(k8sClient.Create(ctx, updateStrategy)).To(Succeed()) + + By("Creating a new resource snapshot") + Expect(k8sClient.Create(ctx, resourceSnapshot)).To(Succeed()) + }) + + AfterEach(OncePerOrdered, func() { + By("Deleting the clusterStagedUpdateRun") + Expect(k8sClient.Delete(ctx, updateRun)).Should(Succeed()) + updateRun = nil + + By("Deleting the clusterResourcePlacement") + Expect(k8sClient.Delete(ctx, crp)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + crp = nil + + By("Deleting the clusterSchedulingPolicySnapshot") + Expect(k8sClient.Delete(ctx, policySnapshot)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + policySnapshot = nil + + By("Deleting the clusterResourceBindings") + for _, binding := range resourceBindings { + Expect(k8sClient.Delete(ctx, binding)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + } + resourceBindings = nil + + By("Deleting the member clusters") + for _, cluster := range targetClusters { + Expect(k8sClient.Delete(ctx, cluster)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + } + targetClusters = nil + + By("Deleting the clusterStagedUpdateStrategy") + Expect(k8sClient.Delete(ctx, updateStrategy)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + updateStrategy = nil + + By("Deleting the clusterResourceSnapshot") + Expect(k8sClient.Delete(ctx, resourceSnapshot)).Should(SatisfyAny(Succeed(), utils.NotFoundMatcher{})) + resourceSnapshot = nil + }) + + Context("Cluster staged update run should update clusters one by one - strategy with double afterStageTasks", Ordered, func() { + BeforeAll(func() { + By("Creating a new clusterStagedUpdateRun") + Expect(k8sClient.Create(ctx, updateRun)).To(Succeed()) + + By("Validating the initialization succeeded and the execution started") + status := &placementv1beta1.StagedUpdateRunStatus{ + PolicySnapshotIndexUsed: policySnapshot.Labels[placementv1beta1.PolicyIndexLabel], + PolicyObservedClusterCount: 3, + ApplyStrategy: crp.Spec.Strategy.ApplyStrategy.DeepCopy(), + StagedUpdateStrategySnapshot: &updateStrategy.Spec, + StagesStatus: []placementv1beta1.StageUpdatingStatus{ + { + StageName: "stage1", + Clusters: []placementv1beta1.ClusterUpdatingStatus{ + {ClusterName: "cluster-0"}, + {ClusterName: "cluster-1"}, + {ClusterName: "cluster-2"}, + }, + }, + }, + DeletionStageStatus: &placementv1beta1.StageUpdatingStatus{ + StageName: "kubernetes-fleet.io/deleteStage", + Clusters: []placementv1beta1.ClusterUpdatingStatus{}, + }, + Conditions: []metav1.Condition{ + // initialization should succeed! + generateTrueCondition(updateRun, placementv1beta1.StagedUpdateRunConditionInitialized), + }, + } + for i := range status.StagesStatus { + var tasks []placementv1beta1.AfterStageTaskStatus + for _, task := range updateStrategy.Spec.Stages[i].AfterStageTasks { + taskStatus := placementv1beta1.AfterStageTaskStatus{Type: task.Type} + if task.Type == placementv1beta1.AfterStageTaskTypeApproval { + taskStatus.ApprovalRequestName = updateRun.Name + "-" + status.StagesStatus[i].StageName + } + tasks = append(tasks, taskStatus) + } + status.StagesStatus[i].AfterStageTaskStatus = tasks + } + initialized := status + wantStatus = generateExecutionStartedStatus(updateRun, initialized) + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + }) + + It("Should mark the 1st cluster in the 1st stage as succeeded after marking the binding available", func() { + By("Validating the 1st clusterResourceBinding is updated to Bound") + binding := resourceBindings[0] // cluster-0 + validateBindingState(ctx, binding, resourceSnapshot.Name, updateRun, 0) + + By("Updating the 1st clusterResourceBinding to Available") + meta.SetStatusCondition(&binding.Status.Conditions, generateTrueCondition(binding, placementv1beta1.ResourceBindingAvailable)) + Expect(k8sClient.Status().Update(ctx, binding)).Should(Succeed(), "failed to update the binding status") + + By("Validating the 1st cluster has succeeded and 2nd cluster has started") + wantStatus.StagesStatus[0].Clusters[0].Conditions = append(wantStatus.StagesStatus[0].Clusters[0].Conditions, generateTrueCondition(updateRun, placementv1beta1.ClusterUpdatingConditionSucceeded)) + wantStatus.StagesStatus[0].Clusters[1].Conditions = append(wantStatus.StagesStatus[0].Clusters[1].Conditions, generateTrueCondition(updateRun, placementv1beta1.ClusterUpdatingConditionStarted)) + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + + By("Validating the 1st stage has startTime set") + Expect(updateRun.Status.StagesStatus[0].StartTime).ShouldNot(BeNil()) + }) + + It("Should mark the 2nd cluster in the 1st stage as succeeded after marking the binding available", func() { + By("Validating the 2nd clusterResourceBinding is updated to Bound") + binding := resourceBindings[1] // cluster-1 + validateBindingState(ctx, binding, resourceSnapshot.Name, updateRun, 0) + + By("Updating the 2nd clusterResourceBinding to Available") + meta.SetStatusCondition(&binding.Status.Conditions, generateTrueCondition(binding, placementv1beta1.ResourceBindingAvailable)) + Expect(k8sClient.Status().Update(ctx, binding)).Should(Succeed(), "failed to update the binding status") + + By("Validating the 2nd cluster has succeeded and 3rd cluster has started") + wantStatus.StagesStatus[0].Clusters[1].Conditions = append(wantStatus.StagesStatus[0].Clusters[1].Conditions, generateTrueCondition(updateRun, placementv1beta1.ClusterUpdatingConditionSucceeded)) + wantStatus.StagesStatus[0].Clusters[2].Conditions = append(wantStatus.StagesStatus[0].Clusters[2].Conditions, generateTrueCondition(updateRun, placementv1beta1.ClusterUpdatingConditionStarted)) + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + }) + + It("Should mark the 3rd cluster in the 1st stage as succeeded after marking the binding available", func() { + By("Validating the 3rd clusterResourceBinding is updated to Bound") + binding := resourceBindings[2] // cluster-3 + validateBindingState(ctx, binding, resourceSnapshot.Name, updateRun, 0) + + By("Updating the 3rd clusterResourceBinding to Available") + meta.SetStatusCondition(&binding.Status.Conditions, generateTrueCondition(binding, placementv1beta1.ResourceBindingAvailable)) + Expect(k8sClient.Status().Update(ctx, binding)).Should(Succeed(), "failed to update the binding status") + + By("Validating the 3rd cluster has succeeded and 4th cluster has started") + wantStatus.StagesStatus[0].Clusters[2].Conditions = append(wantStatus.StagesStatus[0].Clusters[2].Conditions, generateTrueCondition(updateRun, placementv1beta1.ClusterUpdatingConditionSucceeded)) + wantStatus.StagesStatus[0].Conditions[0] = generateFalseCondition(updateRun, placementv1beta1.StageUpdatingConditionProgressing) // The progressing condition now becomes false with waiting reason. + wantStatus.StagesStatus[0].AfterStageTaskStatus[0].Conditions = append(wantStatus.StagesStatus[0].AfterStageTaskStatus[0].Conditions, + generateTrueCondition(updateRun, placementv1beta1.AfterStageTaskConditionApprovalRequestCreated)) + wantStatus.Conditions[1] = generateFalseCondition(updateRun, placementv1beta1.StagedUpdateRunConditionProgressing) + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + }) + + It("Should complete the 1st stage after wait time passed and approval request approved", func() { + By("Validating the approvalRequest has been created") + approvalRequest := &placementv1beta1.ClusterApprovalRequest{} + wantApprovalRequest := &placementv1beta1.ClusterApprovalRequest{ + ObjectMeta: metav1.ObjectMeta{ + Name: updateRun.Status.StagesStatus[0].AfterStageTaskStatus[0].ApprovalRequestName, + Labels: map[string]string{ + placementv1beta1.TargetUpdatingStageNameLabel: updateRun.Status.StagesStatus[0].StageName, + placementv1beta1.TargetUpdateRunLabel: updateRun.Name, + placementv1beta1.IsLatestUpdateRunApprovalLabel: "true", + }, + }, + Spec: placementv1beta1.ApprovalRequestSpec{ + TargetUpdateRun: updateRun.Name, + TargetStage: updateRun.Status.StagesStatus[0].StageName, + }, + } + validateApprovalRequestCreated(wantApprovalRequest) + + By("Deleting the approvalRequest") + Expect(k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest)).Should(Succeed()) + Expect(k8sClient.Delete(ctx, approvalRequest)).Should(Succeed()) + + By("Validating the approvalRequest has been recreated immediately") + validateApprovalRequestCreated(wantApprovalRequest) + + By("Approving the approvalRequest") + approveClusterApprovalRequest(ctx, wantApprovalRequest.Name) + + By("Check the updateRun status") + wantStatus.StagesStatus[0].AfterStageTaskStatus[0].Conditions = append(wantStatus.StagesStatus[0].AfterStageTaskStatus[0].Conditions, + generateTrueCondition(updateRun, placementv1beta1.AfterStageTaskConditionApprovalRequestApproved)) + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + + By("Deleting the approvalRequest") + Expect(k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest)).Should(Succeed()) + Expect(k8sClient.Delete(ctx, approvalRequest)).Should(Succeed(), "failed to delete the approvalRequest") + + By("Validating the approvalRequest has not been recreated") + Eventually(func() bool { + return apierrors.IsNotFound(k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest)) + }, timeout, interval).Should(BeTrue(), "failed to ensure the approvalRequest is not recreated") + Consistently(func() bool { + return apierrors.IsNotFound(k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest)) + }, timeout, interval).Should(BeTrue(), "failed to ensure the approvalRequest is not recreated") + + By("Check the updateRun status to ensure the waitTime elapsed condition is not set") + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + + By("Validating the 1st stage has completed") + wantStatus.StagesStatus[0].AfterStageTaskStatus[1].Conditions = append(wantStatus.StagesStatus[0].AfterStageTaskStatus[1].Conditions, + generateTrueCondition(updateRun, placementv1beta1.AfterStageTaskConditionWaitTimeElapsed)) + wantStatus.StagesStatus[0].Conditions[0] = generateFalseProgressingCondition(updateRun, placementv1beta1.StageUpdatingConditionProgressing, true) + wantStatus.StagesStatus[0].Conditions = append(wantStatus.StagesStatus[0].Conditions, generateTrueCondition(updateRun, placementv1beta1.StageUpdatingConditionSucceeded)) + // Mark the deletion stage progressing condition as false with succeeded reason and add succeeded condition. + wantStatus.DeletionStageStatus.Conditions = append(wantStatus.DeletionStageStatus.Conditions, generateFalseProgressingCondition(updateRun, placementv1beta1.StageUpdatingConditionProgressing, true)) + wantStatus.DeletionStageStatus.Conditions = append(wantStatus.DeletionStageStatus.Conditions, generateTrueCondition(updateRun, placementv1beta1.StageUpdatingConditionSucceeded)) + // Mark updateRun progressing condition as false with succeeded reason and add succeeded condition. + wantStatus.Conditions[1] = generateFalseProgressingCondition(updateRun, placementv1beta1.StagedUpdateRunConditionProgressing, true) + wantStatus.Conditions = append(wantStatus.Conditions, generateTrueCondition(updateRun, placementv1beta1.StagedUpdateRunConditionSucceeded)) + // Need to have a longer wait time for the test to pass, because of the long wait time specified in the update strategy. + timeout = time.Second * 90 + validateClusterStagedUpdateRunStatus(ctx, updateRun, wantStatus, "") + // Reset the timeout to the default value. + timeout = time.Second * 10 + + By("Validating the 1st stage has endTime set") + Expect(updateRun.Status.StagesStatus[0].EndTime).ShouldNot(BeNil()) + + By("Validating the waitTime after stage task only completes after the wait time") + waitStartTime := meta.FindStatusCondition(updateRun.Status.StagesStatus[0].Conditions, string(placementv1beta1.StageUpdatingConditionProgressing)).LastTransitionTime.Time + waitEndTime := meta.FindStatusCondition(updateRun.Status.StagesStatus[0].AfterStageTaskStatus[1].Conditions, string(placementv1beta1.AfterStageTaskConditionWaitTimeElapsed)).LastTransitionTime.Time + Expect(waitStartTime.Add(updateStrategy.Spec.Stages[0].AfterStageTasks[1].WaitTime.Duration).After(waitEndTime)).Should(BeFalse(), + fmt.Sprintf("waitEndTime %v did not pass waitStartTime %v long enough, want at least %v", waitEndTime, waitStartTime, updateStrategy.Spec.Stages[0].AfterStageTasks[1].WaitTime.Duration)) + + By("Validating the creation time of the approval request is before the complete time of the timedwait task") + approvalCreateTime := meta.FindStatusCondition(updateRun.Status.StagesStatus[0].AfterStageTaskStatus[0].Conditions, string(placementv1beta1.AfterStageTaskConditionApprovalRequestCreated)).LastTransitionTime.Time + Expect(approvalCreateTime.Before(waitEndTime)).Should(BeTrue()) + + By("Validating the approvalRequest has not been recreated") + Consistently(func() bool { + return apierrors.IsNotFound(k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest)) + }, timeout, interval).Should(BeTrue(), "failed to ensure the approvalRequest is not recreated") + }) + }) +}) + func validateBindingState(ctx context.Context, binding *placementv1beta1.ClusterResourceBinding, resourceSnapshotName string, updateRun *placementv1beta1.ClusterStagedUpdateRun, stage int) { Eventually(func() error { if err := k8sClient.Get(ctx, types.NamespacedName{Name: binding.Name}, binding); err != nil { @@ -1001,3 +1315,30 @@ func validateBindingState(ctx context.Context, binding *placementv1beta1.Cluster return nil }, timeout, interval).Should(Succeed(), "failed to validate the binding state") } + +func approveClusterApprovalRequest(ctx context.Context, approvalRequestName string) { + Eventually(func() error { + var approvalRequest placementv1beta1.ClusterApprovalRequest + if err := k8sClient.Get(ctx, types.NamespacedName{Name: approvalRequestName}, &approvalRequest); err != nil { + return err + } + meta.SetStatusCondition(&approvalRequest.Status.Conditions, generateTrueCondition(&approvalRequest, placementv1beta1.ApprovalRequestConditionApproved)) + return k8sClient.Status().Update(ctx, &approvalRequest) + }, timeout, interval).Should(Succeed(), "failed to approve the approvalRequest") +} + +func validateApprovalRequestCreated(wantApprovalRequest *placementv1beta1.ClusterApprovalRequest) { + approvalRequest := &placementv1beta1.ClusterApprovalRequest{} + Eventually(func() error { + if err := k8sClient.Get(ctx, types.NamespacedName{Name: wantApprovalRequest.Name}, approvalRequest); err != nil { + return err + } + if diff := cmp.Diff(wantApprovalRequest.Spec, approvalRequest.Spec); diff != "" { + return fmt.Errorf("approvalRequest has different spec (-want +got):\n%s", diff) + } + if diff := cmp.Diff(wantApprovalRequest.Labels, approvalRequest.Labels); diff != "" { + return fmt.Errorf("approvalRequest has different labels (-want +got):\n%s", diff) + } + return nil + }, timeout, interval).Should(Succeed(), "failed to validate the approvalRequest") +} diff --git a/pkg/controllers/updaterun/initialization.go b/pkg/controllers/updaterun/initialization.go index 29b8fef04..56e2791ba 100644 --- a/pkg/controllers/updaterun/initialization.go +++ b/pkg/controllers/updaterun/initialization.go @@ -250,8 +250,11 @@ func (r *Reconciler) generateStagesByStrategy( // other err can be retried. return controller.NewAPIServerError(true, err) } + // This won't change even if the stagedUpdateStrategy changes or is deleted after the updateRun is initialized. updateRun.Status.StagedUpdateStrategySnapshot = &updateStrategy.Spec + // Remove waitTime from the updateRun status for AfterStageTask for type Approval. + removeWaitTimeFromUpdateRunStatus(updateRun) // Compute the update stages. if err := r.computeRunStageStatus(ctx, scheduledBindings, updateRun); err != nil { @@ -404,8 +407,11 @@ func validateAfterStageTask(tasks []placementv1beta1.AfterStageTask) error { } for i, task := range tasks { if task.Type == placementv1beta1.AfterStageTaskTypeTimedWait { + if task.WaitTime == nil { + return fmt.Errorf("task %d of type TimedWait has wait duration set to nil", i) + } if task.WaitTime.Duration <= 0 { - return fmt.Errorf("task %d has wait duration <= 0", i) + return fmt.Errorf("task %d of type TimedWait has wait duration <= 0", i) } } } diff --git a/pkg/controllers/updaterun/initialization_integration_test.go b/pkg/controllers/updaterun/initialization_integration_test.go index 9fc6405e9..6b5d6d8bc 100644 --- a/pkg/controllers/updaterun/initialization_integration_test.go +++ b/pkg/controllers/updaterun/initialization_integration_test.go @@ -554,8 +554,8 @@ var _ = Describe("Updaterun initialization tests", func() { It("Should fail to initialize if any after stage task has 2 same tasks", func() { By("Creating a clusterStagedUpdateStrategy with 2 same after stage tasks") updateStrategy.Spec.Stages[0].AfterStageTasks = []placementv1beta1.AfterStageTask{ - {Type: placementv1beta1.AfterStageTaskTypeTimedWait}, - {Type: placementv1beta1.AfterStageTaskTypeTimedWait}, + {Type: placementv1beta1.AfterStageTaskTypeTimedWait, WaitTime: &metav1.Duration{Duration: time.Second * 1}}, + {Type: placementv1beta1.AfterStageTaskTypeTimedWait, WaitTime: &metav1.Duration{Duration: time.Second * 1}}, } Expect(k8sClient.Create(ctx, updateStrategy)).To(Succeed()) @@ -569,7 +569,7 @@ var _ = Describe("Updaterun initialization tests", func() { It("Should fail to initialize if the wait time is not valid", func() { By("Creating a clusterStagedUpdateStrategy with invalid wait time duration") updateStrategy.Spec.Stages[0].AfterStageTasks = []placementv1beta1.AfterStageTask{ - {Type: placementv1beta1.AfterStageTaskTypeTimedWait, WaitTime: metav1.Duration{Duration: time.Second * 0}}, + {Type: placementv1beta1.AfterStageTaskTypeTimedWait, WaitTime: &metav1.Duration{Duration: time.Second * 0}}, } Expect(k8sClient.Create(ctx, updateStrategy)).To(Succeed()) diff --git a/pkg/controllers/updaterun/initialization_test.go b/pkg/controllers/updaterun/initialization_test.go new file mode 100644 index 000000000..0471afb84 --- /dev/null +++ b/pkg/controllers/updaterun/initialization_test.go @@ -0,0 +1,87 @@ +package updaterun + +import ( + "testing" + "time" + + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/utils/ptr" + + "go.goms.io/fleet/apis/placement/v1beta1" +) + +func TestValidateAfterStageTask(t *testing.T) { + tests := []struct { + name string + task []v1beta1.AfterStageTask + wantErr bool + errMsg string + }{ + { + name: "valid AfterTasks", + task: []v1beta1.AfterStageTask{ + { + Type: v1beta1.AfterStageTaskTypeApproval, + }, + { + Type: v1beta1.AfterStageTaskTypeTimedWait, + WaitTime: ptr.To(metav1.Duration{Duration: 5 * time.Minute}), + }, + }, + wantErr: false, + }, + { + name: "invalid AfterTasks, same type of tasks", + task: []v1beta1.AfterStageTask{ + { + Type: v1beta1.AfterStageTaskTypeTimedWait, + WaitTime: ptr.To(metav1.Duration{Duration: 1 * time.Minute}), + }, + { + Type: v1beta1.AfterStageTaskTypeTimedWait, + WaitTime: ptr.To(metav1.Duration{Duration: 5 * time.Minute}), + }, + }, + wantErr: true, + errMsg: "afterStageTasks cannot have two tasks of the same type: TimedWait", + }, + { + name: "invalid AfterTasks, with nil duration for TimedWait", + task: []v1beta1.AfterStageTask{ + { + Type: v1beta1.AfterStageTaskTypeTimedWait, + }, + }, + wantErr: true, + errMsg: "task 0 of type TimedWait has wait duration set to nil", + }, + { + name: "invalid AfterTasks, with zero duration for TimedWait", + task: []v1beta1.AfterStageTask{ + { + Type: v1beta1.AfterStageTaskTypeTimedWait, + WaitTime: ptr.To(metav1.Duration{Duration: 0 * time.Minute}), + }, + }, + wantErr: true, + errMsg: "task 0 of type TimedWait has wait duration <= 0", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + err := validateAfterStageTask(tt.task) + if tt.wantErr { + if err == nil { + t.Errorf("validateAfterStageTask() error = nil, wantErr %v", tt.wantErr) + return + } + if err.Error() != tt.errMsg { + t.Errorf("validateAfterStageTask() error = %v, wantErr %v", err, tt.errMsg) + } + } else if err != nil { + t.Errorf("validateAfterStageTask() error = %v, wantErr %v", err, tt.wantErr) + } + }) + } +} diff --git a/pkg/utils/common.go b/pkg/utils/common.go index d4d649dff..1c595032d 100644 --- a/pkg/utils/common.go +++ b/pkg/utils/common.go @@ -171,6 +171,12 @@ var ( Kind: placementv1beta1.ClusterResourcePlacementKind, } + ClusterResourcePlacementDisruptionBudgetMetaGVK = metav1.GroupVersionKind{ + Group: placementv1beta1.GroupVersion.Group, + Version: placementv1beta1.GroupVersion.Version, + Kind: placementv1beta1.ClusterResourcePlacementDisruptionBudgetKind, + } + ClusterResourcePlacementEvictionMetaGVK = metav1.GroupVersionKind{ Group: placementv1beta1.GroupVersion.Group, Version: placementv1beta1.GroupVersion.Version, diff --git a/pkg/utils/validator/clusterresourceplacementdisruptionbudget.go b/pkg/utils/validator/clusterresourceplacementdisruptionbudget.go new file mode 100644 index 000000000..78495f33f --- /dev/null +++ b/pkg/utils/validator/clusterresourceplacementdisruptionbudget.go @@ -0,0 +1,41 @@ +/* + Copyright 2025 The KubeFleet Authors. + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +// Package validator provides utils to validate ClusterResourcePlacementDisruptionBudget resources. +package validator + +import ( + "fmt" + + "k8s.io/apimachinery/pkg/util/errors" + "k8s.io/apimachinery/pkg/util/intstr" + + fleetv1beta1 "go.goms.io/fleet/apis/placement/v1beta1" +) + +// ValidateClusterResourcePlacementDisruptionBudget validates cluster resource placement disruption budget fields based on crp placement type and returns error. +func ValidateClusterResourcePlacementDisruptionBudget(db *fleetv1beta1.ClusterResourcePlacementDisruptionBudget, crp *fleetv1beta1.ClusterResourcePlacement) error { + allErr := make([]error, 0) + + // Check ClusterResourcePlacementDisruptionBudget fields if CRP is PickAll placement type + if crp.Spec.Policy == nil || crp.Spec.Policy.PlacementType == fleetv1beta1.PickAllPlacementType { + if db.Spec.MaxUnavailable != nil { + allErr = append(allErr, fmt.Errorf("cluster resource placement policy type PickAll is not supported with any specified max unavailable %v", db.Spec.MaxUnavailable)) + } + if db.Spec.MinAvailable != nil && db.Spec.MinAvailable.Type == intstr.String { + allErr = append(allErr, fmt.Errorf("cluster resource placement policy type PickAll is not supported with min available as a percentage %v", db.Spec.MinAvailable)) + } + } + + return errors.NewAggregate(allErr) +} diff --git a/pkg/webhook/add_handler.go b/pkg/webhook/add_handler.go index fac12abd1..5d1844a3b 100644 --- a/pkg/webhook/add_handler.go +++ b/pkg/webhook/add_handler.go @@ -3,6 +3,7 @@ package webhook import ( "go.goms.io/fleet/pkg/webhook/clusterresourceoverride" "go.goms.io/fleet/pkg/webhook/clusterresourceplacement" + "go.goms.io/fleet/pkg/webhook/clusterresourceplacementeviction" "go.goms.io/fleet/pkg/webhook/fleetresourcehandler" "go.goms.io/fleet/pkg/webhook/membercluster" "go.goms.io/fleet/pkg/webhook/pod" @@ -21,4 +22,5 @@ func init() { AddToManagerFuncs = append(AddToManagerFuncs, membercluster.Add) AddToManagerFuncs = append(AddToManagerFuncs, clusterresourceoverride.Add) AddToManagerFuncs = append(AddToManagerFuncs, resourceoverride.Add) + AddToManagerFuncs = append(AddToManagerFuncs, clusterresourceplacementeviction.Add) } diff --git a/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook.go b/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook.go new file mode 100644 index 000000000..ae2e71a7a --- /dev/null +++ b/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook.go @@ -0,0 +1,78 @@ +/* + Copyright 2025 The KubeFleet Authors. + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +// Package clusterresourceplacementdisruptionbudget provides a validating webhook for the clusterresourceplacementdisruptionbudget custom resource in the KubeFleet API group. +package clusterresourceplacementdisruptionbudget + +import ( + "context" + "fmt" + "net/http" + + k8serrors "k8s.io/apimachinery/pkg/api/errors" + "k8s.io/apimachinery/pkg/types" + "k8s.io/klog/v2" + "sigs.k8s.io/controller-runtime/pkg/client" + "sigs.k8s.io/controller-runtime/pkg/manager" + "sigs.k8s.io/controller-runtime/pkg/webhook" + "sigs.k8s.io/controller-runtime/pkg/webhook/admission" + + fleetv1beta1 "go.goms.io/fleet/apis/placement/v1beta1" + "go.goms.io/fleet/pkg/utils" + "go.goms.io/fleet/pkg/utils/validator" +) + +var ( + // ValidationPath is the webhook service path which admission requests are routed to for validating clusterresourceplacementdisruptionbudget resources. + ValidationPath = fmt.Sprintf(utils.ValidationPathFmt, fleetv1beta1.GroupVersion.Group, fleetv1beta1.GroupVersion.Version, "clusterresourceplacementdisruptionbudget") +) + +type clusterResourcePlacementDisruptionBudgetValidator struct { + client client.Client + decoder webhook.AdmissionDecoder +} + +// Add registers the webhook for K8s bulit-in object types. +func Add(mgr manager.Manager) error { + hookServer := mgr.GetWebhookServer() + hookServer.Register(ValidationPath, &webhook.Admission{Handler: &clusterResourcePlacementDisruptionBudgetValidator{mgr.GetClient(), admission.NewDecoder(mgr.GetScheme())}}) + return nil +} + +// Handle clusterResourcePlacementDisruptionBudgetValidator checks to see if resource override is valid. +func (v *clusterResourcePlacementDisruptionBudgetValidator) Handle(ctx context.Context, req admission.Request) admission.Response { + var db fleetv1beta1.ClusterResourcePlacementDisruptionBudget + klog.V(2).InfoS("Validating webhook handling cluster resource placement disruption budget", "operation", req.Operation, "clusterResourcePlacementDisruptionBudget", req.Name) + if err := v.decoder.Decode(req, &db); err != nil { + klog.ErrorS(err, "Failed to decode cluster resource placement disruption budget object for validating fields", "userName", req.UserInfo.Username, "groups", req.UserInfo.Groups, "clusterResourcePlacementDisruptionBudget", req.Name) + return admission.Errored(http.StatusBadRequest, err) + } + + // Get the corresponding ClusterResourcePlacement object + var crp fleetv1beta1.ClusterResourcePlacement + if err := v.client.Get(ctx, types.NamespacedName{Name: db.Name}, &crp); err != nil { + if k8serrors.IsNotFound(err) { + klog.V(2).InfoS("The corresponding ClusterResourcePlacement object does not exist", "clusterResourcePlacementDisruptionBudget", db.Name, "clusterResourcePlacement", db.Name) + return admission.Allowed("Associated clusterResourcePlacement object for clusterResourcePlacementDisruptionBudget is not found") + } + return admission.Errored(http.StatusBadRequest, fmt.Errorf("failed to get clusterResourcePlacement %s for clusterResourcePlacementDisruptionBudget %s: %w", db.Name, db.Name, err)) + } + + if err := validator.ValidateClusterResourcePlacementDisruptionBudget(&db, &crp); err != nil { + klog.V(2).ErrorS(err, "ClusterResourcePlacementDisruptionBudget has invalid fields, request is denied", "operation", req.Operation, "clusterResourcePlacementDisruptionBudget", db.Name) + return admission.Denied(err.Error()) + } + + klog.V(2).InfoS("ClusterResourcePlacementDisruptionBudget has valid fields", "clusterResourcePlacementDisruptionBudget", db.Name) + return admission.Allowed("clusterResourcePlacementDisruptionBudget has valid fields") +} diff --git a/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook_test.go b/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook_test.go new file mode 100644 index 000000000..c7908d1db --- /dev/null +++ b/pkg/webhook/clusterresourceplacementdisruptionbudget/clusterresourceplacementdisruptionbudget_validating_webhook_test.go @@ -0,0 +1,315 @@ +/* +Copyright 2025 The KubeFleet Authors. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package clusterresourceplacementdisruptionbudget + +import ( + "context" + "encoding/json" + "fmt" + "testing" + + "github.com/google/go-cmp/cmp" + "github.com/stretchr/testify/assert" + admissionv1 "k8s.io/api/admission/v1" + authenticationv1 "k8s.io/api/authentication/v1" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/runtime" + "k8s.io/apimachinery/pkg/util/intstr" + "k8s.io/utils/ptr" + "sigs.k8s.io/controller-runtime/pkg/client" + "sigs.k8s.io/controller-runtime/pkg/client/fake" + "sigs.k8s.io/controller-runtime/pkg/webhook/admission" + + placementv1beta1 "go.goms.io/fleet/apis/placement/v1beta1" + "go.goms.io/fleet/pkg/utils" +) + +func TestHandle(t *testing.T) { + validCRPDBObject := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pick-all-crp", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: nil, + MinAvailable: nil, + }, + } + validCRPDBObjectPickNCRP := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "crp-pickn", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: &intstr.IntOrString{ + Type: intstr.Int, + IntVal: 1, + }, + MinAvailable: &intstr.IntOrString{ + Type: intstr.String, + StrVal: "50%", + }, + }, + } + invalidCRPDBObjectMinAvailablePercentage := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pick-all-crp", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: nil, + MinAvailable: &intstr.IntOrString{ + Type: intstr.String, + StrVal: "50%", + }, + }, + } + invalidCRPDBObjectMaxAvailablePercentage := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pick-all-crp", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: &intstr.IntOrString{ + Type: intstr.String, + StrVal: "50%", + }, + MinAvailable: nil, + }, + } + invalidCRPDBObjectMaxAvailableInteger := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pick-all-crp", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: &intstr.IntOrString{ + Type: intstr.Int, + IntVal: 2, + }, + MinAvailable: nil, + }, + } + validCRPDBObjectCRPNotFound := &placementv1beta1.ClusterResourcePlacementDisruptionBudget{ + ObjectMeta: metav1.ObjectMeta{ + Name: "does-not-exist", + }, + Spec: placementv1beta1.PlacementDisruptionBudgetSpec{ + MaxUnavailable: nil, + MinAvailable: nil, + }, + } + + validCRPDBObjectBytes, err := json.Marshal(validCRPDBObject) + assert.Nil(t, err) + validCRPDBObjectPickNCRPBytes, err := json.Marshal(validCRPDBObjectPickNCRP) + assert.Nil(t, err) + invalidCRPDBObjectMinAvailablePercentageBytes, err := json.Marshal(invalidCRPDBObjectMinAvailablePercentage) + assert.Nil(t, err) + invalidCRPDBObjectMaxAvailablePercentageBytes, err := json.Marshal(invalidCRPDBObjectMaxAvailablePercentage) + assert.Nil(t, err) + invalidCRPDBObjectMaxAvailableIntegerBytes, err := json.Marshal(invalidCRPDBObjectMaxAvailableInteger) + assert.Nil(t, err) + validCRPDBObjectCRPNotFoundBytes, err := json.Marshal(validCRPDBObjectCRPNotFound) + assert.Nil(t, err) + + validCRP := &placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pick-all-crp", + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + ResourceSelectors: []placementv1beta1.ClusterResourceSelector{}, + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickAllPlacementType, + }, + }, + } + validCRPPickN := &placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: "crp-pickn", + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + ResourceSelectors: []placementv1beta1.ClusterResourceSelector{}, + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickNPlacementType, + NumberOfClusters: ptr.To(int32(1)), + }, + }, + } + invalidCRPPickFixed := &placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: "crp-pickfixed", + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + ResourceSelectors: []placementv1beta1.ClusterResourceSelector{}, + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickFixedPlacementType, + ClusterNames: []string{"cluster1", "cluster2"}, + }, + }, + } + + objects := []client.Object{validCRP, validCRPPickN, invalidCRPPickFixed} + scheme := runtime.NewScheme() + err = placementv1beta1.AddToScheme(scheme) + assert.Nil(t, err) + decoder := admission.NewDecoder(scheme) + assert.Nil(t, err) + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(objects...). + Build() + + testCases := map[string]struct { + req admission.Request + resourceValidator clusterResourcePlacementDisruptionBudgetValidator + wantResponse admission.Response + }{ + "allow CRPDB create": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "pick-all-crp", + Object: runtime.RawExtension{ + Raw: validCRPDBObjectBytes, + Object: validCRPDBObject, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementDisruptionBudgetMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Allowed("clusterResourcePlacementDisruptionBudget has valid fields"), + }, + "allow CRPDB create - PickN CRP": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "crp-pickn", + Object: runtime.RawExtension{ + Raw: validCRPDBObjectPickNCRPBytes, + Object: validCRPDBObjectPickNCRP, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementDisruptionBudgetMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Allowed("clusterResourcePlacementDisruptionBudget has valid fields"), + }, + "deny CRPDB create - MinAvailable as percentage": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "pick-all-crp", + Object: runtime.RawExtension{ + Raw: invalidCRPDBObjectMinAvailablePercentageBytes, + Object: invalidCRPDBObjectMinAvailablePercentage, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Denied(fmt.Sprintf("cluster resource placement policy type PickAll is not supported with min available as a percentage %v", invalidCRPDBObjectMinAvailablePercentage.Spec.MinAvailable)), + }, + "deny CRPDB create - MaxAvailable as percentage": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "pick-all-crp", + Object: runtime.RawExtension{ + Raw: invalidCRPDBObjectMaxAvailablePercentageBytes, + Object: invalidCRPDBObjectMaxAvailablePercentage, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementDisruptionBudgetMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Denied(fmt.Sprintf("cluster resource placement policy type PickAll is not supported with any specified max unavailable %v", invalidCRPDBObjectMaxAvailablePercentage.Spec.MaxUnavailable)), + }, + "deny CRPDB create - MaxAvailable as integer": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "pick-all-crp", + Object: runtime.RawExtension{ + Raw: invalidCRPDBObjectMaxAvailableIntegerBytes, + Object: invalidCRPDBObjectMaxAvailableInteger, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementDisruptionBudgetMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Denied(fmt.Sprintf("cluster resource placement policy type PickAll is not supported with any specified max unavailable %v", invalidCRPDBObjectMaxAvailableInteger.Spec.MaxUnavailable)), + }, + "allow CRPDB create - CRP not found": { + req: admission.Request{ + AdmissionRequest: admissionv1.AdmissionRequest{ + Name: "does-not-exist", + Object: runtime.RawExtension{ + Raw: validCRPDBObjectCRPNotFoundBytes, + Object: validCRPDBObjectCRPNotFound, + }, + UserInfo: authenticationv1.UserInfo{ + Username: "test-user", + Groups: []string{"system:masters"}, + }, + RequestKind: &utils.ClusterResourcePlacementDisruptionBudgetMetaGVK, + Operation: admissionv1.Create, + }, + }, + resourceValidator: clusterResourcePlacementDisruptionBudgetValidator{ + decoder: decoder, + client: fakeClient, + }, + wantResponse: admission.Allowed("Associated clusterResourcePlacement object for clusterResourcePlacementDisruptionBudget is not found"), + }, + } + for testName, testCase := range testCases { + t.Run(testName, func(t *testing.T) { + gotResult := testCase.resourceValidator.Handle(context.Background(), testCase.req) + if diff := cmp.Diff(testCase.wantResponse, gotResult); diff != "" { + t.Errorf("ClusterResourcePlacementDisruptionBudgetValidator Handle() mismatch (-want +got):\n%s", diff) + } + }) + } +} diff --git a/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook.go b/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook.go index 38c3018d5..aa5ce3d18 100644 --- a/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook.go +++ b/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook.go @@ -53,7 +53,7 @@ func Add(mgr manager.Manager) error { return nil } -// Handle clusterResourcePlacementEvictionValidator checks to see if resource override is valid. +// Handle clusterResourcePlacementEvictionValidator checks to see if the eviction is valid. func (v *clusterResourcePlacementEvictionValidator) Handle(ctx context.Context, req admission.Request) admission.Response { var crpe fleetv1beta1.ClusterResourcePlacementEviction klog.V(2).InfoS("Validating webhook handling cluster resource placement eviction", "operation", req.Operation, "clusterResourcePlacementEviction", req.Name) @@ -67,7 +67,7 @@ func (v *clusterResourcePlacementEvictionValidator) Handle(ctx context.Context, if err := v.client.Get(ctx, types.NamespacedName{Name: crpe.Spec.PlacementName}, &crp); err != nil { if k8serrors.IsNotFound(err) { klog.V(2).InfoS(condition.EvictionInvalidMissingCRPMessage, "clusterResourcePlacementEviction", crpe.Name, "clusterResourcePlacement", crpe.Spec.PlacementName) - return admission.Denied(err.Error()) + return admission.Allowed("Associated clusterResourcePlacement object for clusterResourcePlacementEviction is not found") } return admission.Errored(http.StatusBadRequest, fmt.Errorf("failed to get clusterResourcePlacement %s for clusterResourcePlacementEviction %s: %w", crpe.Spec.PlacementName, crpe.Name, err)) } diff --git a/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook_test.go b/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook_test.go index dcc6d3f37..352ac47a7 100644 --- a/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook_test.go +++ b/pkg/webhook/clusterresourceplacementeviction/clusterresourceplacementeviction_validating_webhook_test.go @@ -45,7 +45,7 @@ func TestHandle(t *testing.T) { PlacementName: "test-crp", }, } - invalidCRPEObjectInvalidPlacementName := &placementv1beta1.ClusterResourcePlacementEviction{ + validCRPEObjectPlacementNameNotFound := &placementv1beta1.ClusterResourcePlacementEviction{ ObjectMeta: metav1.ObjectMeta{ Name: "test-crpe", }, @@ -72,7 +72,7 @@ func TestHandle(t *testing.T) { validCRPEObjectBytes, err := json.Marshal(validCRPEObject) assert.Nil(t, err) - invalidCRPEObjectInvalidPlacementNameBytes, err := json.Marshal(invalidCRPEObjectInvalidPlacementName) + validCRPEObjectPlacementNameNotFoundBytes, err := json.Marshal(validCRPEObjectPlacementNameNotFound) assert.Nil(t, err) invalidCRPEObjectCRPDeletingBytes, err := json.Marshal(invalidCRPEObjectCRPDeleting) assert.Nil(t, err) @@ -156,13 +156,13 @@ func TestHandle(t *testing.T) { }, wantResponse: admission.Allowed("clusterResourcePlacementEviction has valid fields"), }, - "deny CRPE create - invalid CRPE object": { + "allow CRPE create - CRPE object with not found CRP": { req: admission.Request{ AdmissionRequest: admissionv1.AdmissionRequest{ Name: "test-crpe", Object: runtime.RawExtension{ - Raw: invalidCRPEObjectInvalidPlacementNameBytes, - Object: invalidCRPEObjectInvalidPlacementName, + Raw: validCRPEObjectPlacementNameNotFoundBytes, + Object: validCRPEObjectPlacementNameNotFound, }, UserInfo: authenticationv1.UserInfo{ Username: "test-user", @@ -176,7 +176,7 @@ func TestHandle(t *testing.T) { decoder: decoder, client: fakeClient, }, - wantResponse: admission.Denied("clusterresourceplacements.placement.kubernetes-fleet.io \"does-not-exist\" not found"), + wantResponse: admission.Allowed("Associated clusterResourcePlacement object for clusterResourcePlacementEviction is not found"), }, "deny CRPE create - CRP is deleting": { req: admission.Request{ diff --git a/pkg/webhook/webhook.go b/pkg/webhook/webhook.go index cda433597..e4eab298e 100644 --- a/pkg/webhook/webhook.go +++ b/pkg/webhook/webhook.go @@ -59,6 +59,7 @@ import ( "go.goms.io/fleet/cmd/hubagent/options" "go.goms.io/fleet/pkg/webhook/clusterresourceoverride" "go.goms.io/fleet/pkg/webhook/clusterresourceplacement" + "go.goms.io/fleet/pkg/webhook/clusterresourceplacementeviction" "go.goms.io/fleet/pkg/webhook/fleetresourcehandler" "go.goms.io/fleet/pkg/webhook/membercluster" "go.goms.io/fleet/pkg/webhook/pod" @@ -113,6 +114,7 @@ const ( podResourceName = "pods" clusterResourceOverrideName = "clusterresourceoverrides" resourceOverrideName = "resourceoverrides" + evictionName = "clusterresourceplacementevictions" ) var ( @@ -362,6 +364,22 @@ func (w *Config) buildFleetValidatingWebhooks() []admv1.ValidatingWebhook { }, TimeoutSeconds: longWebhookTimeout, }, + { + Name: "fleet.clusterresourceplacementeviction.validating", + ClientConfig: w.createClientConfig(clusterresourceplacementeviction.ValidationPath), + FailurePolicy: &failFailurePolicy, + SideEffects: &sideEffortsNone, + AdmissionReviewVersions: admissionReviewVersions, + Rules: []admv1.RuleWithOperations{ + { + Operations: []admv1.OperationType{ + admv1.Create, + }, + Rule: createRule([]string{placementv1beta1.GroupVersion.Group}, []string{placementv1beta1.GroupVersion.Version}, []string{evictionName}, &clusterScope), + }, + }, + TimeoutSeconds: longWebhookTimeout, + }, } return webHooks diff --git a/pkg/webhook/webhook_test.go b/pkg/webhook/webhook_test.go index dbfd5da00..1a90dbb2c 100644 --- a/pkg/webhook/webhook_test.go +++ b/pkg/webhook/webhook_test.go @@ -25,7 +25,7 @@ func TestBuildFleetValidatingWebhooks(t *testing.T) { serviceURL: "test-url", clientConnectionType: &url, }, - wantLength: 7, + wantLength: 8, }, } diff --git a/test/apis/placement/v1beta1/api_validation_integration_test.go b/test/apis/placement/v1beta1/api_validation_integration_test.go index 2c30fc34d..9bdc7d7b6 100644 --- a/test/apis/placement/v1beta1/api_validation_integration_test.go +++ b/test/apis/placement/v1beta1/api_validation_integration_test.go @@ -372,7 +372,7 @@ var _ = Describe("Test placement v1beta1 API validation", func() { }, { Type: placementv1beta1.AfterStageTaskTypeTimedWait, - WaitTime: metav1.Duration{Duration: time.Second * 10}, + WaitTime: &metav1.Duration{Duration: time.Second * 10}, }, }, }, @@ -380,8 +380,8 @@ var _ = Describe("Test placement v1beta1 API validation", func() { }, } Expect(hubClient.Create(ctx, &strategy)).Should(Succeed()) - Expect(strategy.Spec.Stages[0].AfterStageTasks[0].WaitTime).Should(Equal(metav1.Duration{Duration: 0})) - Expect(strategy.Spec.Stages[0].AfterStageTasks[1].WaitTime).Should(Equal(metav1.Duration{Duration: time.Second * 10})) + Expect(strategy.Spec.Stages[0].AfterStageTasks[0].WaitTime).Should(BeNil()) + Expect(strategy.Spec.Stages[0].AfterStageTasks[1].WaitTime).Should(Equal(&metav1.Duration{Duration: time.Second * 10})) Expect(hubClient.Delete(ctx, &strategy)).Should(Succeed()) }) }) @@ -460,7 +460,7 @@ var _ = Describe("Test placement v1beta1 API validation", func() { }, { Type: placementv1beta1.AfterStageTaskTypeTimedWait, - WaitTime: metav1.Duration{Duration: time.Second * 10}, + WaitTime: &metav1.Duration{Duration: time.Second * 10}, }, }, }, @@ -472,6 +472,121 @@ var _ = Describe("Test placement v1beta1 API validation", func() { Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Create updateRunStrategy call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("Too many: 3: must have at most 2 items")) }) + + It("Should deny creation of ClusterStagedUpdateStrategy with AfterStageTask of type Approval with waitTime specified", func() { + strategy := placementv1beta1.ClusterStagedUpdateStrategy{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf(updateRunStrategyNameTemplate, GinkgoParallelProcess()), + }, + Spec: placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + Name: fmt.Sprintf(updateRunStageNameTemplate, GinkgoParallelProcess(), 1), + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &metav1.Duration{Duration: time.Minute * 30}, + }, + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + WaitTime: &metav1.Duration{Duration: time.Minute * 10}, + }, + }, + }, + }, + }, + } + err := hubClient.Create(ctx, &strategy) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Create updateRunStrategy call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("AfterStageTaskType is Approval, waitTime is not allowed")) + }) + + It("Should deny update of ClusterStagedUpdateStrategy when adding waitTime to AfterStageTask of type Approval", func() { + strategy := placementv1beta1.ClusterStagedUpdateStrategy{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf(updateRunStrategyNameTemplate, GinkgoParallelProcess()), + }, + Spec: placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + Name: fmt.Sprintf(updateRunStageNameTemplate, GinkgoParallelProcess(), 1), + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + }, + }, + }, + }, + } + Expect(hubClient.Create(ctx, &strategy)).Should(Succeed()) + + strategy.Spec.Stages[0].AfterStageTasks[0].WaitTime = &metav1.Duration{Duration: time.Minute * 10} + err := hubClient.Update(ctx, &strategy) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Update ClusterStagedUpdateStrategy call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("AfterStageTaskType is Approval, waitTime is not allowed")) + + Expect(hubClient.Delete(ctx, &strategy)).Should(Succeed()) + }) + + It("Should deny creation of ClusterStagedUpdateStrategy with AfterStageTask of type TimedWait with waitTime not specified", func() { + strategy := placementv1beta1.ClusterStagedUpdateStrategy{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf(updateRunStrategyNameTemplate, GinkgoParallelProcess()), + }, + Spec: placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + Name: fmt.Sprintf(updateRunStageNameTemplate, GinkgoParallelProcess(), 1), + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + }, + }, + }, + }, + }, + } + err := hubClient.Create(ctx, &strategy) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Create updateRunStrategy call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("AfterStageTaskType is TimedWait, waitTime is required")) + }) + + It("Should deny update of ClusterStagedUpdateStrategy when removing waitTime from AfterStageTask of type TimedWait", func() { + strategy := placementv1beta1.ClusterStagedUpdateStrategy{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf(updateRunStrategyNameTemplate, GinkgoParallelProcess()), + }, + Spec: placementv1beta1.StagedUpdateStrategySpec{ + Stages: []placementv1beta1.StageConfig{ + { + Name: fmt.Sprintf(updateRunStageNameTemplate, GinkgoParallelProcess(), 1), + AfterStageTasks: []placementv1beta1.AfterStageTask{ + { + Type: placementv1beta1.AfterStageTaskTypeTimedWait, + WaitTime: &metav1.Duration{Duration: time.Minute * 10}, + }, + { + Type: placementv1beta1.AfterStageTaskTypeApproval, + }, + }, + }, + }, + }, + } + Expect(hubClient.Create(ctx, &strategy)).Should(Succeed()) + + strategy.Spec.Stages[0].AfterStageTasks[0].WaitTime = nil + err := hubClient.Update(ctx, &strategy) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Update ClusterStagedUpdateStrategy call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.ErrStatus.Message).Should(MatchRegexp("AfterStageTaskType is TimedWait, waitTime is required")) + + Expect(hubClient.Delete(ctx, &strategy)).Should(Succeed()) + }) }) Context("Test ClusterApprovalRequest API validation - valid cases", func() { diff --git a/test/e2e/placement_eviction_test.go b/test/e2e/placement_eviction_test.go index d93ad71d9..74cc3010a 100644 --- a/test/e2e/placement_eviction_test.go +++ b/test/e2e/placement_eviction_test.go @@ -31,74 +31,6 @@ import ( testutilseviction "go.goms.io/fleet/test/utils/eviction" ) -var _ = Describe("ClusterResourcePlacement eviction of bound binding - PickFixed CRP, invalid eviction denied - No PDB specified", Ordered, func() { - crpName := fmt.Sprintf(crpNameTemplate, GinkgoParallelProcess()) - crpEvictionName := fmt.Sprintf(crpEvictionNameTemplate, GinkgoParallelProcess()) - - BeforeAll(func() { - By("creating work resources") - createWorkResources() - - // Create the CRP. - crp := &placementv1beta1.ClusterResourcePlacement{ - ObjectMeta: metav1.ObjectMeta{ - Name: crpName, - // Add a custom finalizer; this would allow us to better observe - // the behavior of the controllers. - Finalizers: []string{customDeletionBlockerFinalizer}, - }, - Spec: placementv1beta1.ClusterResourcePlacementSpec{ - Policy: &placementv1beta1.PlacementPolicy{ - PlacementType: placementv1beta1.PickFixedPlacementType, - ClusterNames: allMemberClusterNames, - }, - ResourceSelectors: workResourceSelector(), - }, - } - Expect(hubClient.Create(ctx, crp)).To(Succeed(), "Failed to create CRP %s", crpName) - }) - - AfterAll(func() { - ensureCRPEvictionDeleted(crpEvictionName) - ensureCRPAndRelatedResourcesDeleted(crpName, allMemberClusters) - }) - - It("should update cluster resource placement status as expected", func() { - crpStatusUpdatedActual := crpStatusUpdatedActual(workResourceIdentifiers(), allMemberClusterNames, nil, "0") - Eventually(crpStatusUpdatedActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to update cluster resource placement status as expected") - }) - - It("should place resources on the all available member clusters", checkIfPlacedWorkResourcesOnAllMemberClusters) - - It("create cluster resource placement eviction targeting member cluster 1", func() { - crpe := &placementv1beta1.ClusterResourcePlacementEviction{ - ObjectMeta: metav1.ObjectMeta{ - Name: crpEvictionName, - }, - Spec: placementv1beta1.PlacementEvictionSpec{ - PlacementName: crpName, - ClusterName: memberCluster1EastProdName, - }, - } - Expect(hubClient.Create(ctx, crpe)).To(Succeed(), "Failed to create CRP eviction %s", crpe.Name) - }) - - It("should update cluster resource placement eviction status as expected", func() { - crpEvictionStatusUpdatedActual := testutilseviction.StatusUpdatedActual( - ctx, hubClient, crpEvictionName, - &testutilseviction.IsValidEviction{IsValid: false, Msg: condition.EvictionInvalidPickFixedCRPMessage}, - nil) - Eventually(crpEvictionStatusUpdatedActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to update cluster resource placement eviction status as expected") - }) - - It("should ensure cluster resource placement status is unchanged", func() { - crpStatusUpdatedActual := crpStatusUpdatedActual(workResourceIdentifiers(), allMemberClusterNames, nil, "0") - Eventually(crpStatusUpdatedActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to update cluster resource placement status as expected") - }) - - It("should still place resources on the all available member clusters", checkIfPlacedWorkResourcesOnAllMemberClusters) -}) - var _ = Describe("ClusterResourcePlacement eviction of bound binding, taint cluster before eviction - No PDB specified", Ordered, Serial, func() { crpName := fmt.Sprintf(crpNameTemplate, GinkgoParallelProcess()) crpEvictionName := fmt.Sprintf(crpEvictionNameTemplate, GinkgoParallelProcess()) diff --git a/test/e2e/updaterun_test.go b/test/e2e/updaterun_test.go index cd97f3cab..678eff88f 100644 --- a/test/e2e/updaterun_test.go +++ b/test/e2e/updaterun_test.go @@ -540,7 +540,7 @@ func createStagedUpdateStrategySucceed(strategyName string) *placementv1beta1.Cl }, { Type: placementv1beta1.AfterStageTaskTypeTimedWait, - WaitTime: metav1.Duration{ + WaitTime: &metav1.Duration{ Duration: time.Second * 5, }, }, diff --git a/test/e2e/webhook_test.go b/test/e2e/webhook_test.go index 70ea8d168..d86ab9214 100644 --- a/test/e2e/webhook_test.go +++ b/test/e2e/webhook_test.go @@ -1291,3 +1291,78 @@ var _ = Describe("webhook tests for ResourceOverride UPDATE operations", Ordered }, testutils.PollTimeout, testutils.PollInterval).Should(Succeed()) }) }) + +var _ = Describe("webhook tests for ClusterResourcePlacementEviction CREATE operations", Ordered, func() { + crpName := fmt.Sprintf(crpNameTemplate, GinkgoParallelProcess()) + crpeName := fmt.Sprintf(crpEvictionNameTemplate, GinkgoParallelProcess()) + + AfterEach(func() { + By("deleting CRP") + cleanupCRP(crpName) + }) + + It("should deny create on CRPE with deleting crp", func() { + // Create the CRP with deletion timestamp. + crp := &placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: crpName, + Finalizers: []string{"example.com/finalizer"}, + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + ResourceSelectors: workResourceSelector(), + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickAllPlacementType, + }, + }, + } + Expect(hubClient.Create(ctx, crp)).Should(Succeed(), "Failed to create CRP %s", crpName) + // Delete the CRP to add deletion timestamp for check + Expect(hubClient.Delete(ctx, crp)).Should(Succeed(), "Failed to delete CRP %s", crpName) + + // Create the CRPE. + crpe := &placementv1beta1.ClusterResourcePlacementEviction{ + ObjectMeta: metav1.ObjectMeta{ + Name: crpeName, + }, + Spec: placementv1beta1.PlacementEvictionSpec{ + PlacementName: crpName, + }, + } + By(fmt.Sprintf("expecting denial of CREATE eviction %s", crpeName)) + err := hubClient.Create(ctx, crpe) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Create CRPE call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.Status().Message).Should(MatchRegexp(fmt.Sprintf("cluster resource placement %s is being deleted", crpName))) + }) + + It("should deny create on CRPE with PickFixed crp", func() { + crp := &placementv1beta1.ClusterResourcePlacement{ + ObjectMeta: metav1.ObjectMeta{ + Name: crpName, + }, + Spec: placementv1beta1.ClusterResourcePlacementSpec{ + ResourceSelectors: workResourceSelector(), + Policy: &placementv1beta1.PlacementPolicy{ + PlacementType: placementv1beta1.PickFixedPlacementType, + ClusterNames: []string{"cluster1", "cluster2"}, + }, + }, + } + Expect(hubClient.Create(ctx, crp)).Should(Succeed(), "Failed to create CRP %s", crpName) + + // Create the CRPE. + crpe := &placementv1beta1.ClusterResourcePlacementEviction{ + ObjectMeta: metav1.ObjectMeta{ + Name: crpeName, + }, + Spec: placementv1beta1.PlacementEvictionSpec{ + PlacementName: crpName, + }, + } + By(fmt.Sprintf("expecting denial of CREATE eviction %s", crpName)) + err := hubClient.Create(ctx, crpe) + var statusErr *k8sErrors.StatusError + Expect(errors.As(err, &statusErr)).To(BeTrue(), fmt.Sprintf("Create CRPE call produced error %s. Error type wanted is %s.", reflect.TypeOf(err), reflect.TypeOf(&k8sErrors.StatusError{}))) + Expect(statusErr.Status().Message).Should(MatchRegexp("cluster resource placement policy type PickFixed is not supported")) + }) +}) diff --git a/test/upgrade/setup.sh b/test/upgrade/setup.sh index daf501e5b..b33ecfd7a 100755 --- a/test/upgrade/setup.sh +++ b/test/upgrade/setup.sh @@ -1,5 +1,8 @@ #!/usr/bin/env bash +# Note: this script is used to set up the before upgrade environment for the +# version compatibility test **in the current commit**. + set -o errexit set -o nounset set -o pipefail @@ -24,26 +27,6 @@ export OUTPUT_TYPE="${OUTPUT_TYPE:-type=docker}" export HUB_AGENT_IMAGE="${HUB_AGENT_IMAGE:-hub-agent}" export MEMBER_AGENT_IMAGE="${MEMBER_AGENT_IMAGE:-member-agent}" export REFRESH_TOKEN_IMAGE="${REFRESH_TOKEN_IMAGE:-refresh-token}" -export GIT_TAG="${GIT_TAG:-}" - -PREVIOUS_BRANCH="" -PREVIOUS_COMMIT="" -if [ -z "${GIT_TAG}" ]; then - echo "No tag is specified; use the latest tag." - - PREVIOUS_BRANCH=$(git branch --show-current) - PREVIOUS_COMMIT=$(git rev-parse HEAD) - echo "Current at branch $PREVIOUS_BRANCH, commit $PREVIOUS_COMMIT." - - echo "Fetch all tags..." - git fetch --all - GIT_TAG=$(git describe --tags $(git rev-list --tags --max-count=1)) - git checkout $GIT_TAG - echo "Checked out source code at $GIT_TAG." - - echo "Switch back to the root directory to avoid consistency issues." - cd ../.. -fi # Build the Fleet agent images. echo "Building and the Fleet agent images..." @@ -52,16 +35,6 @@ TAG=$IMAGE_TAG make docker-build-hub-agent TAG=$IMAGE_TAG make docker-build-member-agent TAG=$IMAGE_TAG make docker-build-refresh-token -# Restore to the previous branch. This must be done immediately after the image building to avoid -# consistency issues. -if [ -n "$PREVIOUS_COMMIT" ]; then - git checkout $PREVIOUS_COMMIT - echo "Checked out source code at $PREVIOUS_COMMIT." -fi - -echo "Switch back to the test/upgrade directory to avoid consistency issues." -cd test/upgrade - # Create the kind clusters. echo "Creating the kind clusters..." @@ -154,3 +127,5 @@ do --set enableV1Alpha1APIs=false \ --set enableV1Beta1APIs=true done + +echo "Setup for the before upgrade environment has been completed." diff --git a/test/upgrade/upgrade.sh b/test/upgrade/upgrade.sh index 1f9bfc9c6..d3fd1d584 100755 --- a/test/upgrade/upgrade.sh +++ b/test/upgrade/upgrade.sh @@ -19,7 +19,6 @@ export OUTPUT_TYPE="${OUTPUT_TYPE:-type=docker}" export HUB_AGENT_IMAGE="${HUB_AGENT_IMAGE:-hub-agent}" export MEMBER_AGENT_IMAGE="${MEMBER_AGENT_IMAGE:-member-agent}" export REFRESH_TOKEN_IMAGE="${REFRESH_TOKEN_IMAGE:-refresh-token}" -export GIT_TAG="${GIT_TAG:-}" export UPGRADE_HUB_SIDE="${UPGRADE_HUB_SIDE:-}" export UPGRADE_MEMBER_SIDE="${UPGRADE_MEMBER_SIDE:-}" @@ -28,18 +27,6 @@ if [ -z "$UPGRADE_HUB_SIDE" ] && [ -z "$UPGRADE_MEMBER_SIDE" ]; then exit 1 fi -PREVIOUS_BRANCH="" -if [ -n "${GIT_TAG}" ]; then - echo "A tag ($GIT_TAG) has been specified; re-build image using the given tag." - - PREVIOUS_BRANCH=$(git branch --show-current) - - echo "Fetch all tags..." - git fetch --all - git checkout $GIT_TAG - echo "Checked out source code at $GIT_TAG." -fi - # Build the Fleet agent images. echo "Building and the Fleet agent images..." @@ -103,9 +90,3 @@ if [ -n "$UPGRADE_MEMBER_SIDE" ]; then --set enableV1Beta1APIs=true done fi - -# Restore to the previous branch. -if [ -n "$PREVIOUS_BRANCH" ]; then - git checkout $PREVIOUS_BRANCH - echo "Checked out source code at $PREVIOUS_BRANCH." -fi