[AKS] aks-preview: add --enable/--disable-control-plane-metrics#9855
[AKS] aks-preview: add --enable/--disable-control-plane-metrics#9855davidkydd wants to merge 23 commits into
Conversation
Surface the new first-class API property azureMonitorProfile.metrics.controlPlane.enabled (API version 2026-02-02-preview) so users can opt in/out of Azure Monitor managed Prometheus control plane metrics (kube-apiserver, etcd, etc.) without relying on the AFEC-gated preview. - Add --enable-control-plane-metrics on `az aks create` and `az aks update`, plus --disable-control-plane-metrics on `az aks update`. - Validate that --enable-control-plane-metrics requires Azure Monitor metrics to be enabled (either already on the cluster or via --enable-azure-monitor-metrics in the same command), and that enable and disable cannot be combined. - Wire the flags into the create (set_up_azure_monitor_profile) and update (update_azure_monitor_profile) decorator paths. - Bump extension to 20.0.0b7 and add HISTORY entry.
❌Azure CLI Extensions Breaking Change Test
|
|
The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR. Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions). pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>
|
|
Thank you for your contribution! We will review the pull request and get back to you soon. |
|
There was a problem hiding this comment.
Pull request overview
This PR updates the aks-preview extension to surface the new azureMonitorProfile.metrics.controlPlane.enabled API property (2026-02-02-preview) by adding CLI flags to enable/disable Azure Monitor managed Prometheus control plane metrics collection for AKS clusters.
Changes:
- Added
--enable-control-plane-metricstoaz aks create/az aks update, and--disable-control-plane-metricstoaz aks update. - Implemented decorator-context validation and wiring to set
azure_monitor_profile.metrics.control_plane.enabledduring create/update. - Bumped extension version to
20.0.0b7and added release notes entry.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/aks-preview/setup.py | Bumps extension version to 20.0.0b7. |
| src/aks-preview/HISTORY.rst | Adds release notes entry for the new flags. |
| src/aks-preview/azext_aks_preview/managed_cluster_decorator.py | Adds new context getters/validation and wires control plane metrics into create/update Azure Monitor profile flows. |
| src/aks-preview/azext_aks_preview/custom.py | Adds new parameters to aks_create / aks_update function signatures so they flow into raw parameters. |
| src/aks-preview/azext_aks_preview/_params.py | Registers the new CLI arguments for aks create and aks update. |
| src/aks-preview/azext_aks_preview/_help.py | Documents the new CLI flags in command help. |
| # Enable control plane metrics if requested. | ||
| if self.context.get_enable_control_plane_metrics(): | ||
| mc.azure_monitor_profile.metrics.control_plane = ( | ||
| self.models.ManagedClusterAzureMonitorProfileMetricsControlPlane(enabled=True) | ||
| ) | ||
|
|
There was a problem hiding this comment.
Fixed in a7b923c — set_up_azure_monitor_profile now calls get_enable_control_plane_metrics() unconditionally so passing --enable-control-plane-metrics without the parent flag raises RequiredArgumentMissingError on create instead of being silently ignored when _setup_azure_monitor_metrics is skipped.
| # Handle enable / disable of control plane metrics independently of the parent metrics flag, | ||
| # so users can toggle control plane metrics on a cluster that already has metrics enabled. | ||
| if self.context.get_enable_control_plane_metrics(): | ||
| if mc.azure_monitor_profile is None: | ||
| mc.azure_monitor_profile = self.models.ManagedClusterAzureMonitorProfile() # pylint: disable=no-member | ||
| if mc.azure_monitor_profile.metrics is None: | ||
| # Should not normally happen — validation requires metrics to be enabled — but guard | ||
| # against partially-populated profiles to avoid AttributeError. | ||
| mc.azure_monitor_profile.metrics = ( | ||
| self.models.ManagedClusterAzureMonitorProfileMetrics(enabled=True) # pylint: disable=no-member | ||
| ) | ||
| mc.azure_monitor_profile.metrics.control_plane = ( | ||
| self.models.ManagedClusterAzureMonitorProfileMetricsControlPlane(enabled=True) # pylint: disable=no-member | ||
| ) | ||
|
|
||
| if self.context.get_disable_control_plane_metrics(): | ||
| if ( | ||
| mc.azure_monitor_profile and | ||
| mc.azure_monitor_profile.metrics | ||
| ): | ||
| mc.azure_monitor_profile.metrics.control_plane = ( | ||
| self.models.ManagedClusterAzureMonitorProfileMetricsControlPlane(enabled=False) # pylint: disable=no-member | ||
| ) |
There was a problem hiding this comment.
Fixed in 3f343fe — added 5 update-decorator tests covering: (1) enable-cp without parent flag raises RequiredArgumentMissingError, (2) enable-cp succeeds when AM metrics already on cluster, (3) enable-cp + disable-am-metrics raises MutuallyExclusiveArgumentError, (4) enable-cp + disable-cp together raises MutuallyExclusiveArgumentError, (5) disable-cp writes control_plane.enabled=False.
…e-azure-monitor-metrics Combining the two flags in the same command produced an inconsistent payload (azureMonitorProfile.metrics.enabled=False AND metrics.controlPlane.enabled=True). Validator now raises MutuallyExclusiveArgumentError up-front. Addresses Copilot review feedback on PR Azure#9855.
…even without parent flag set_up_azure_monitor_profile now invokes get_enable_control_plane_metrics() unconditionally so passing --enable-control-plane-metrics without --enable-azure-monitor-metrics raises RequiredArgumentMissingError instead of being silently ignored when _setup_azure_monitor_metrics is skipped. Addresses Copilot review feedback on PR Azure#9855.
…e wording Previous wording "Requires --enable-azure-monitor-metrics" was misleading on update for clusters that already have Azure Monitor metrics enabled. Updated to "Requires Azure Monitor metrics to be enabled (already enabled or via --enable-azure-monitor-metrics)" for both 'aks create' and 'aks update'. Addresses Copilot review feedback on PR Azure#9855.
The two unreleased bullets under Pending (cli core minimum bump, --k8s-support-plan/--tier on upgrade) are part of 20.0.0b7. Moved into the 20.0.0b7 section so the changelog reflects the actual shipping version. Addresses Copilot review feedback on PR Azure#9855.
Adds 5 update-decorator tests covering the new validation branches and update path: - enable-control-plane-metrics without parent flag raises - enable-control-plane-metrics succeeds when AM metrics already on cluster - enable-control-plane-metrics + disable-azure-monitor-metrics raises - enable + disable control-plane-metrics together raises - disable-control-plane-metrics writes control_plane.enabled=False Addresses Copilot review feedback on PR Azure#9855.
Linter (option_length_too_long, threshold 22) flagged --enable-control-plane-metrics (31) and --disable-control-plane-metrics (32) as too long. Adds --enable-cp-metrics and --disable-cp-metrics short aliases on both 'aks create' and 'aks update'. Canonical long names retained for backward compatibility.
Linter (missing_parameter_help) requires inline help text on c.argument when options_list is set, since the YAML help block in _help.py is not matched once the canonical option name is one of several. Adds inline help= mirroring the YAML wording on all three argument definitions.
- Wrap inline help= strings in _params.py to satisfy line-too-long (C0301). - Update _help.py entries to include the alias forms (--enable-cp-metrics, --disable-cp-metrics) so the linter's unrecognized_help_parameter_rule recognizes the canonical option list.
|
/azp run |
|
Commenter does not have sufficient privileges for PR 9855 in repo Azure/azure-cli-extensions |
|
@yonzhan @zhoxing-ms all copilot comments and gates are passing except for the azure build which I don't have permissions to trigger, could I get a review plz? Keen to get this out to use for testing purposes, thanks! |
|
Azure Pipelines successfully started running 2 pipeline(s). |
FumingZhang
left a comment
There was a problem hiding this comment.
lgtm, could you please add a scenario test to validate the change?
Adds live ScenarioTest coverage for --enable-control-plane-metrics and --disable-control-plane-metrics on `az aks create` and `az aks update`: Positive cases: - create with --enable-azure-monitor-metrics + --enable-control-plane-metrics asserts azureMonitorProfile.metrics.controlPlane.enabled == True - update flow on an AM-metrics-enabled cluster: enable then disable control plane metrics, asserting the controlPlane.enabled toggles Negative cases (expect_failure=True): - create --enable-control-plane-metrics without --enable-azure-monitor-metrics - create --enable-control-plane-metrics together with --disable-azure-monitor-metrics - create with both --enable-control-plane-metrics and --disable-control-plane-metrics - update --enable-control-plane-metrics on a cluster lacking AM metrics Addresses scenario-test request on PR Azure#9855.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
| @AKSCustomResourceGroupPreparer( | ||
| random_name_length=17, name_prefix="clitest", location="westus2" | ||
| ) | ||
| def test_aks_create_with_control_plane_metrics( |
There was a problem hiding this comment.
tests failed, please fix them according to the error msg
There was a problem hiding this comment.
all passing now, could you please re-review @FumingZhang
There was a problem hiding this comment.
E azure.core.exceptions.ResourceExistsError: (OperationNotAllowed) Operation is not allowed because there's an in progress update managed cluster operation (operation ID: e375d315-218c-4dff-b806-7ab63b0bb63a) on the managed cluster started on UTC 2026-05-22T07:19:23Z. Please wait for it to finish before starting a new operation. You can also use 'az aks operation-abort ...' to abort the ongoing operation.
E Code: OperationNotAllowed
E Message: Operation is not allowed because there's an in progress update managed cluster operation (operation ID: e375d315-218c-4dff-b806-7ab63b0bb63a) on the managed cluster started on UTC 2026-05-22T07:19:23Z. Please wait for it to finish before starting a new operation. You can also use 'az aks operation-abort ...' to abort the ongoing operation.azEnv/lib/python3.12/site-packages/azure/core/exceptions.py:163: ResourceExistsError
There was a problem hiding this comment.
Requeued live test passed: https://msazure.visualstudio.com/CloudNativeCompute/_build/results?buildId=165200323&view=results
Linter failed for unrelated reason:
https://github.com/Azure/azure-cli-extensions/actions/runs/26276562484/job/77342118689?pr=9855
Command: aks nodepool auto-scale wait - Missing help
@FumingZhang is this good to approve now?
There was a problem hiding this comment.
Please fill in the pipeline variables as shown above before queuing the pipeline.
https://msazure.visualstudio.com/CloudNativeCompute/_build/results?buildId=165422948&view=results
Regarding the linter failure, you can either revert your change that added the wait command to the nodepool auto-scale command group, or follow the error message’s instructions to add a help message for it.
There was a problem hiding this comment.
reverted node-pool change, re-running live test:
https://msazure.visualstudio.com/CloudNativeCompute/_build/results?buildId=166347743&view=results
@FumingZhang can you /azp run again please
There was a problem hiding this comment.
…ait to nodepool auto-scale
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Updated command group for AKS nodepool auto-scaling to use managed_clusters_sdk instead of agent_pools_sdk.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
…pdate --disable-azure-monitor-metrics and --disable-control-plane-metrics are not registered for aks create, so passing them caused argparse SystemExit(2) instead of a CLIError, breaking the az_aks_tool test parser. Move both negative scenarios to aks update where the flags are registered.
…-metrics test aks create with --enable-azure-monitor-metrics returns Succeeded but leaves an in-progress AMW background operation, causing a 409 OperationNotAllowed when aks update fires immediately after. Mirrors the pattern used in test_aks_update_with_azuremonitormetrics.
…02-02-preview The controlPlane metrics feature was introduced in 2026-02-02-preview. The live test backend rejects the controlPlane property when called with api-version= 2026-03-02-preview. The SDK default was bumped to 2026-03-02-preview by a main merge, but this PR's features target 2026-02-02-preview.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
@davidkydd don't forget to resolve the merge conflict |
Surface the first-class API property azureMonitorProfile.metrics.controlPlane.enabled (API version 2026-02-02-preview, already in the vendored SDK) so users can opt clusters in/out of Azure Monitor managed Prometheus control plane metrics (kube-apiserver, etcd, etc.) without the AFEC-gated preview. - Add --enable-control-plane-metrics on `az aks create` and `az aks update`, plus --disable-control-plane-metrics on `az aks update`. - Validate that --enable-control-plane-metrics requires Azure Monitor metrics to be enabled (either already on the cluster or via --enable-azure-monitor-metrics in the same command), and that enable and disable cannot be combined. - Wire the flags into the create (set_up_azure_monitor_profile) and update (update_azure_monitor_profile) decorator paths. - Decorator unit tests + live-only command tests (positive & negative). - Add HISTORY.rst Pending entry. Mirrors the upstream proposal at Azure#9855. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
* [AKS] aks-preview: add --enable/--disable-control-plane-metrics Surface the first-class API property azureMonitorProfile.metrics.controlPlane.enabled (API version 2026-02-02-preview, already in the vendored SDK) so users can opt clusters in/out of Azure Monitor managed Prometheus control plane metrics (kube-apiserver, etcd, etc.) without the AFEC-gated preview. - Add --enable-control-plane-metrics on `az aks create` and `az aks update`, plus --disable-control-plane-metrics on `az aks update`. - Validate that --enable-control-plane-metrics requires Azure Monitor metrics to be enabled (either already on the cluster or via --enable-azure-monitor-metrics in the same command), and that enable and disable cannot be combined. - Wire the flags into the create (set_up_azure_monitor_profile) and update (update_azure_monitor_profile) decorator paths. - Decorator unit tests + live-only command tests (positive & negative). - Add HISTORY.rst Pending entry. Mirrors the upstream proposal at Azure#9855. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * [AKS] aks-preview: defer control-plane-metrics flip to post-DCRA addon_put on create On greenfield `az aks create --enable-azure-monitor-metrics --enable-control-plane-metrics`, setting `azureMonitorProfile.metrics.controlPlane.enabled=true` on the initial cluster PUT causes the AKS RP to schedule the CCP collector pod before the Data Collection Rule Association (DCRA) has been created. The pod then CrashLoopBackOffs until the postprocessing step finishes creating the AMW/DCE/DCR/DCRA and the RP reconciles. Fix: on the create flow, leave `metrics.controlPlane` unset on the initial PUT. After postprocessing creates the DCRA, the existing fire-and-forget addon_put PUT now also flips `metrics.controlPlane.enabled=true`, so the CCP pod is only scheduled once its DCRA exists. Changes: * `_setup_azure_monitor_metrics` no longer mutates `metrics.control_plane` on the create path; it still calls `get_enable_control_plane_metrics()` so the mutually-exclusive flag validation fires early. * New `_addon_put_with_control_plane` helper in `azuremonitormetrics/azuremonitorprofile.py` that mirrors core `addon_put` and additionally sets `metrics.controlPlane.enabled=true`. * `link_azure_monitor_profile_artifacts` dispatches to the new helper when `create_flow=True` and `enable_control_plane_metrics=True`. * Update path is unchanged (single-PUT update on an existing cluster that already has its DCRA does not race). * Added unit tests covering create-path deferral and the `--enable-control-plane-metrics` without `--enable-azure-monitor-metrics` validation error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Surface the new first-class API property
azureMonitorProfile.metrics.controlPlane.enabled (API version 2026-02-02-preview) so users can opt in/out of Azure Monitor managed Prometheus control plane metrics (kube-apiserver, etcd, etc.) without relying on the AFEC-gated preview.
az aks createandaz aks update, plus --disable-control-plane-metrics onaz aks update.This checklist is used to make sure that common guidelines for a pull request are followed.
Related command
General Guidelines
azdev style <YOUR_EXT>locally? (pip install azdevrequired)python scripts/ci/test_index.py -qlocally? (pip install wheel==0.30.0required)For new extensions:
About Extension Publish
There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update
src/index.jsonautomatically.You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify
src/index.json.