Skip to content

Commit fca83f4

Browse files
vhvb1989Copilot
andauthored
feat(provision): prompt to cancel Azure deployment on Ctrl+C (Bicep) (#7795)
* feat(provision): prompt to cancel Azure deployment on Ctrl+C (Bicep) When a user presses Ctrl+C during 'azd provision' or 'azd up' while a Bicep deployment is in flight on Azure, azd now pauses and asks whether to leave the Azure deployment running (default) or to cancel it via the ARM Cancel API and wait for a terminal state. - pkg/input: register-able interrupt handler stack with re-entrant Ctrl+C suppression while a handler is running. - pkg/azapi + pkg/infra: Cancel methods on DeploymentService / Deployment for both subscription- and resource-group-scoped deployments. Deployment Stacks return 'not supported' (no Cancel API surface today). - pkg/infra/provisioning: typed sentinel errors for the 4 outcomes (leave running / canceled / cancel timed out / cancel too late) plus telemetry attribute provision.cancellation. - pkg/infra/provisioning/bicep: interactive prompt + cancel-and-poll flow with 30s cancel-request timeout and 2-min terminal-state wait. - cmd/middleware + internal/cmd: bypass agent troubleshooting and map sentinels to telemetry codes. - docs/provision-cancellation.md: user-facing behavior, outcomes, provider scope, telemetry, and non-interactive fallback. Terraform and Deployment Stacks are out of scope and unchanged. Closes #2810 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 1) - pkg/input: LIFO test now invokes handlers and asserts distinct call counts to prove ordering. - pkg/infra/provisioning: add ErrDeploymentCancelFailed sentinel so the cancel-request-failure path no longer misclassifies as a timeout; wire it through error middleware skip-list and telemetry mapping. - pkg/infra: switch new TestScopeCancel subtests to t.Context(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 2) - pkg/azapi: add typed ErrCancelNotSupported sentinel; stack CancelSubscriptionDeployment / CancelResourceGroupDeployment now return it instead of an opaque string. - pkg/infra/provisioning/bicep: interrupt handler treats ErrCancelNotSupported as the safer 'leave running' outcome (matches documented stacks behavior + telemetry). Cancel-request error path routes through terminalToOutcome when the deployment is already in a terminal state, so the portal URL and consistent messaging are surfaced. Canceled terminal branch now prints the portal URL too. - pkg/infra/provisioning: ErrDeploymentCancelFailed doc comment now references errors.Is/errors.As (matches the multi-%w joined-error wrapping pattern used here). - pkg/infra/provisioning/bicep/bicep_provider: tear down the interrupt handler immediately after deployModule returns (sync.OnceFunc) to avoid a small window where a late Ctrl+C could surface the prompt over post-processing output. - internal/cmd/errors: map ErrCancelNotSupported in classifySentinel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: close interrupt-vs-natural-completion race (iteration 3) If Ctrl+C arrives but the ARM deployment happens to finish naturally before the user picks an option in the prompt, the previous design could take the success path and silently drop the interrupt. - installDeploymentInterruptHandler now exposes a 'started' channel that is closed the instant Ctrl+C is received, before the prompt is shown. deployCtx is also cancelled immediately so PollUntilDone unblocks ASAP. - BicepProvider.Deploy block-receives the outcome whenever 'started' is closed (instead of a non-blocking drain), so the user's choice is always honored regardless of who wins the race. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: re-entrant Ctrl+C suppression hardening (iteration 4) - pkg/input/console: watchTerminalInterrupt now reserves the running slot before consulting the handler stack so re-entrant Ctrl+C is suppressed even if the stack is briefly empty (e.g. handler popped but still executing the prompt). - pkg/infra/provisioning/bicep/bicep_provider: defer cleanup until after the interrupt outcome is received so a second Ctrl+C during the prompt is still suppressed; the no-interrupt path tears down immediately as before. - pkg/infra/provisioning/cancel: doc reads 'sentinel errors' instead of 'typed errors' to match the implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 5) - pkg/input/interrupt: enforce strict LIFO when popping handlers (only pop when this handler is still top-of-stack), so out-of-order pops never accidentally remove unrelated newer handlers. - pkg/infra/provisioning/bicep/interrupt: defensive default in terminalToOutcome now stops the spinner and emits a warning with the observed state and portal URL, leaving the UI clean if an unexpected terminal state is ever observed. - pkg/infra/provisioning/bicep/interrupt: treat DeploymentProvisioningStateDeleted as terminal in the cancel poll so we don't keep polling until the deadline if the deployment is deleted out from under us. Test updated accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 6) - pkg/infra/provisioning/bicep/interrupt: wrap the interrupt handler closure with sync.OnceValue so close(started), cancelDeploy() and the outcome channel send all run at most once. Combined with the in-flight guard from tryStartInterruptHandler and the strict LIFO pop, additional Ctrl+C signals after the prompt completes can no longer panic or block on the buffered channel. - pkg/infra/provisioning/bicep/interrupt: print the portal URL on the prompt-failure leave-running path so the user always has a link to follow up when the URL is available. - docs/provision-cancellation: clarify that the portal URL is printed when available (not 'in every case'). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 7) - pkg/input/interrupt: nil out the popped slot before truncating the interrupt stack so the GC can reclaim the popped handler and any state it captured, even before the underlying array is reallocated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback (iteration 8) - pkg/input/console: run the registered interrupt handler inline on the signal goroutine instead of in a nested goroutine. This removes the scheduling window where SIGINT was received but the handler had not yet run, which could let a deploy goroutine complete naturally and silently drop the Ctrl+C. Re-entrant signals remain suppressed via tryStartInterruptHandler. - pkg/infra/provisioning/bicep/interrupt: switch the cancel poll loop to a time.Ticker and move the wait before each Get, so a slow Get cannot produce back-to-back ARM polls (preventing throttling). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address review findings from wbreza and jongio on PR #7795 Fixes from wbreza's review: - H1/N1: Race between deploy-success and interrupt handler — replaced select-based check with atomic CAS state machine (deployStateRunning → deployStateInterrupting or deployStateCompleted) so the handler and Deploy goroutine never conflict. - H2: Panic in interrupt handler — added recover() with stack trace logging in watchTerminalInterrupt so a handler panic doesn't leave the process unkillable. - H3: Second Ctrl+C force-exit — added forceExitPending counter in interrupt.go; second suppressed Ctrl+C while a handler is running triggers os.Exit(130) matching POSIX convention (kubectl, terraform). - M13: terminalToOutcome now a BicepProvider method with ctx as first parameter per AGENTS.md convention. - L7: Spelling consistency — 'Cancelling' → 'Canceling' to match ARM API and codebase convention. - L8: Removed stray blank line in cmd/middleware/error.go. Fixes from jongio's review: - N1: Same race fix as H1 above (CAS state machine). - N2: Panic recovery (defense-in-depth per Jon's suggestion). - N3: Test cleanup — added t.Cleanup() for PushInterruptHandler pops and finishInterruptHandler to prevent global state leaks on assertion failure. Fixes from Copilot bot review: - Unbounded Get call in cancel-request error path — added context.WithTimeout wrapper (30s). - DeploymentUrl fetch in prompt — added timeout to prevent indefinite blocking on slow/unreachable ARM. - Deleted state mismatch — added explicit case in terminalToOutcome for DeploymentProvisioningStateDeleted. - Test cleanup in interrupt_test.go (same as N3 above). New tests: - TestForceExitCounter: validates force-exit on 2nd suppressed Ctrl+C. - TestForceExitCounter_ResetsOnNewHandler: ensures counter resets between handler lifecycles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(bicep): add unit tests for interrupt prompt/cancel/install flow Addresses follow-up review feedback from @wbreza on PR #7795: - Adds targeted unit tests for runInterruptPrompt, cancelAndAwaitTerminal, and installDeploymentInterruptHandler using a programmable fake infra.Deployment and the existing MockConsole. Tests cover the leave-running, cancel, prompt-failure, URL-fetch-failure, cancel-not-supported, cancel-failed-with-fallback-Get, polled-canceled, and poll-timeout paths, plus the markCompleted/interrupt CAS race. - Promotes cancelRequestTimeout, cancelTerminalTimeout, and cancelPollInterval from const to package-level var so unit tests can shrink them to keep the suite sub-second. - Logs the post-cancel Get error when the cancel API itself failed and the fallback Get also fails (it was previously silently dropped, hurting production diagnosability). - Exposes input.SnapshotInterruptStack as a test-only helper so cross-package tests can invoke the registered handler without installing the OS signal pipeline. Refs: #7933 (follow-up for replacing the process-global interrupt state with an injectable InterruptBroker). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * perf(bicep): issue first cancel-poll Get immediately Previously cancelAndAwaitTerminal entered a ticker-driven loop that waited cancelPollInterval (5s in production) BEFORE every Get, including the first one issued right after the cancel API succeeded. For deployments that Azure transitions to Canceled quickly (e.g. deployments that just started), the user saw a needless ~5s pause before azd reported the cancellation. Fix: do an immediate Get right after the cancel request returns; only the subsequent retries are ticker-spaced. This preserves the original 'no back-to-back Gets' guarantee for the slow path while removing the unnecessary delay on the fast path. Adds TestCancelAndAwaitTerminal_FirstGetIsImmediate which sets cancelPollInterval to a deliberately large value and asserts the call returns in well under a poll interval when the first Get already returns Canceled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(bicep): wait for nested deployments after cancel + URL formatting Addresses review comments on PR #7795: - #5/#6: Wrap portal URLs in output.WithLinkFormat across all user-facing emission sites; extract printLeaveRunningMessage helper to deduplicate the leave-running prompt body. - #7: After the top-level deployment reaches Canceled, walk the operations tree to discover descendant (nested) deployments, best-effort cancel any that are still non-terminal, and wait for them to reach a terminal state. The whole interrupt flow now lives under a single 5-minute global budget (previously 2-minute terminal timeout). If one or more nested deployments remain non-terminal at budget exhaustion, azd surfaces them by name with portal links and records a new 'cancel_timed_out_nested' telemetry value (still ErrDeploymentCancelTimeout). - #9: Fix misleading 'second Ctrl+C is treated as a force-exit' comment in console.go — the second additional press arms the force-exit latch (and is suppressed); the next press triggers exit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ux(bicep): clearer interrupt copy + per-state telemetry split Addresses customer-experience review on PR #7795 (kristenwomack): - R1/R11 Branch terminalToOutcome by state. Succeeded is reframed as a *success* ("completed successfully before cancellation took effect — your resources are deployed") so users who Ctrl+C suspecting a hang aren't told their successful deployment was a 'too-late' failure. Failed/Deleted get distinct copy. Telemetry value split into cancel_raced_succeeded / _failed / _deleted; cancel_too_late kept as the fallback for unexpected terminal states. - R2 Non-interactive prompt fallback now always emits a breadcrumb (portal URL when available, otherwise 'find it under Subscription → Deployments') plus the az-deployment-cancel hint, so CI runs can't silently leak abandoned Azure deployments. - R3 Prompt title is now active and names the cause: 'You pressed Ctrl+C. An Azure deployment is still running — what do you want to do?'. - R4 Prompt help text and leave-running message no longer drop the pointer when the URL fetch fails — they degrade to a portal-search hint that includes the deployment name. - R9 Replace the leaking Go duration in the cancel-timeout message ('within 5m0s') with prose ('within 5 minutes'). Title updated to 'Azure is still canceling — azd will exit'. - R10 Cancel-failed copy reframed: 'Couldn't cancel — Azure deployment is still running. The cancel request was rejected by Azure. The deployment will continue.' - R15 'Deployment was deleted' message now explains it's unusual and suggests checking audit logs. - R16 Leave-running path always prints the 'az deployment sub|group cancel --name <n>' hint so users have a copy-pasteable next step. Docs (provision-cancellation.md) and the ProvisionCancellationKey field comment updated to describe the new telemetry values. R5 (option ordering), R6 (5-min wait UX), R7 (discoverability), R8 (docs site), R12 (skip prompt for Stacks), R13 (a11y), R14 (color/WCAG) are tracked in a follow-up issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent afb62d5 commit fca83f4

16 files changed

Lines changed: 2172 additions & 5 deletions

File tree

cli/azd/cmd/middleware/error.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ import (
3030
"github.com/azure/azure-dev/cli/azd/pkg/environment/azdcontext"
3131
"github.com/azure/azure-dev/cli/azd/pkg/errorhandler"
3232
"github.com/azure/azure-dev/cli/azd/pkg/extensions"
33+
"github.com/azure/azure-dev/cli/azd/pkg/infra/provisioning"
3334
"github.com/azure/azure-dev/cli/azd/pkg/infra/provisioning/bicep"
3435
"github.com/azure/azure-dev/cli/azd/pkg/input"
3536
"github.com/azure/azure-dev/cli/azd/pkg/output"
@@ -86,6 +87,11 @@ func shouldSkipAgentHandling(err error) bool {
8687
errors.Is(err, consent.ErrElicitationDenied) ||
8788
errors.Is(err, consent.ErrSamplingDenied) ||
8889
errors.Is(err, internal.ErrAbortedByUser) ||
90+
errors.Is(err, provisioning.ErrDeploymentInterruptedLeaveRunning) ||
91+
errors.Is(err, provisioning.ErrDeploymentCanceledByUser) ||
92+
errors.Is(err, provisioning.ErrDeploymentCancelTimeout) ||
93+
errors.Is(err, provisioning.ErrDeploymentCancelTooLate) ||
94+
errors.Is(err, provisioning.ErrDeploymentCancelFailed) ||
8995

9096
errors.Is(err, environment.ErrNotFound) ||
9197
errors.Is(err, environment.ErrNameNotSpecified) ||
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Provision cancellation (Ctrl+C)
2+
3+
When `azd provision` (or `azd up`) submits a Bicep deployment to Azure, the
4+
deployment runs asynchronously on the Azure side. If the user presses
5+
<kbd>Ctrl</kbd>+<kbd>C</kbd> while azd is waiting for that deployment to
6+
finish, azd will pause and ask what to do instead of exiting immediately.
7+
8+
## Behavior
9+
10+
1. azd stops the live progress reporter and presents an interactive prompt
11+
that includes the Azure portal URL of the running deployment.
12+
2. The user picks one of:
13+
- **Leave the Azure deployment running and stop azd** (default). azd
14+
exits with a non-zero status; the Azure deployment continues to
15+
completion. The user can monitor or cancel it from the portal link.
16+
- **Cancel the Azure deployment**. azd submits an ARM cancel request
17+
against the deployment and waits up to **5 minutes** (a single global
18+
budget) for Azure to confirm a terminal state (`Canceled`, `Failed`,
19+
or `Succeeded`). Once the top-level deployment reaches `Canceled`,
20+
azd best-effort cancels and waits for any descendant (nested)
21+
deployments within the same 5-minute budget so leftover children do
22+
not keep running on Azure.
23+
3. Additional <kbd>Ctrl</kbd>+<kbd>C</kbd> presses while the prompt is
24+
showing (or while a cancel request is in flight) are ignored so the user
25+
can finish reading and choose deliberately.
26+
27+
## Outcomes when "Cancel" is selected
28+
29+
| Outcome | When |
30+
|---------|------|
31+
| Cancellation confirmed | Azure transitions the deployment to `Canceled` within the wait budget. azd exits non-zero with a clear message. |
32+
| Cancel raced succeeded | Azure reached `Succeeded` before cancel took effect. azd surfaces this as a success-toned message — resources *are* deployed. |
33+
| Cancel raced failed | Azure reached `Failed` before cancel took effect. azd surfaces the failure plus the portal URL. |
34+
| Cancel raced deleted | The deployment record was deleted before cancel took effect (unusual; suggests an external actor). |
35+
| Cancel still pending | Azure does not reach a terminal state within the wait budget. azd warns that cancellation may still complete and prints the portal URL. |
36+
| Cancel request failed | The ARM `Cancel` API itself returned an error. azd surfaces that the deployment is still running and prints the portal URL. |
37+
38+
When the deployment URL is available, azd prints it so the user can follow
39+
up manually from the browser. The URL is omitted if azd was unable to
40+
resolve it (for example, when the ARM service is unreachable).
41+
42+
## Provider scope
43+
44+
| Provider | Behavior on Ctrl+C during provision |
45+
|---------|--------------------------------------|
46+
| Bicep (subscription scope) | Interactive prompt (described above). |
47+
| Bicep (resource group scope) | Interactive prompt (described above). |
48+
| Deployment Stacks | Currently treated as "leave running" — the stacks ARM API does not expose a per-deployment cancel surface today. |
49+
| Terraform | Unchanged: the Terraform CLI does not expose a safe per-apply cancel; pressing Ctrl+C exits azd and Terraform handles its own teardown. |
50+
51+
## Telemetry
52+
53+
A `provision.cancellation` attribute is recorded on the provisioning span
54+
with one of:
55+
56+
- `none` — provisioning completed normally without an interrupt.
57+
- `leave_running` — user chose to let the Azure deployment continue.
58+
- `canceled` — cancel request succeeded and Azure reached `Canceled`.
59+
- `cancel_raced_succeeded` — Azure reached `Succeeded` before cancel took
60+
effect (resources are deployed; we surface this as a success-toned message
61+
rather than a failure).
62+
- `cancel_raced_failed` — Azure reached `Failed` before cancel took effect.
63+
- `cancel_raced_deleted` — the deployment record was deleted before cancel
64+
could take effect (unusual; suggests an external actor).
65+
- `cancel_too_late` — fallback for unexpected terminal states (kept for
66+
backwards-compat; the three `cancel_raced_*` values cover the documented
67+
terminal states).
68+
- `cancel_timed_out` — top-level deployment did not reach a terminal state
69+
within the wait budget (5 minutes).
70+
- `cancel_timed_out_nested` — top-level reached `Canceled`, but one or
71+
more descendant (nested) deployments did not reach a terminal state
72+
within the same 5-minute global budget. The user-facing output lists
73+
the stuck deployment(s) with portal links so they can be investigated
74+
manually.
75+
- `cancel_failed` — the ARM `Cancel` API call itself returned an error.
76+
77+
## Non-interactive mode
78+
79+
If azd is running without a TTY (e.g. CI), the prompt cannot be displayed.
80+
In that case azd defaults to **leave running** behavior so that an
81+
unattended deployment is never silently cancelled by an environment
82+
signal.

cli/azd/internal/cmd/errors.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,8 +295,20 @@ func classifySentinel(err error) string {
295295
return "internal.not_git_repo"
296296
case errors.Is(err, azapi.ErrPreviewNotSupported):
297297
return "internal.preview_not_supported"
298+
case errors.Is(err, azapi.ErrCancelNotSupported):
299+
return "internal.cancel_not_supported"
298300
case errors.Is(err, provisioning.ErrBindMountOperationDisabled):
299301
return "internal.bind_mount_disabled"
302+
case errors.Is(err, provisioning.ErrDeploymentInterruptedLeaveRunning):
303+
return "user.canceled.leave_running"
304+
case errors.Is(err, provisioning.ErrDeploymentCanceledByUser):
305+
return "user.canceled.deployment_canceled"
306+
case errors.Is(err, provisioning.ErrDeploymentCancelTimeout):
307+
return "user.canceled.cancel_timed_out"
308+
case errors.Is(err, provisioning.ErrDeploymentCancelTooLate):
309+
return "user.canceled.cancel_too_late"
310+
case errors.Is(err, provisioning.ErrDeploymentCancelFailed):
311+
return "user.canceled.cancel_failed"
300312
case errors.Is(err, update.ErrNeedsElevation):
301313
return "update.elevationRequired"
302314
case errors.Is(err, pipeline.ErrRemoteHostIsNotAzDo):

cli/azd/internal/tracing/fields/fields.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,30 @@ var (
454454
}
455455
)
456456

457+
// Provision-related fields
458+
var (
459+
// ProvisionCancellationKey records how a Ctrl+C interrupt during
460+
// `azd provision` / `azd up` was handled.
461+
//
462+
// Example: "none" (no interrupt observed), "leave_running" (user chose to
463+
// keep the Azure deployment running), "canceled" (Azure confirmed the
464+
// deployment reached the Canceled state), "cancel_timed_out" (cancel was
465+
// submitted but azd stopped waiting for the top-level terminal state),
466+
// "cancel_timed_out_nested" (top-level was canceled, but one or more
467+
// descendant deployments did not reach terminal state within the global
468+
// budget), "cancel_raced_succeeded" / "cancel_raced_failed" /
469+
// "cancel_raced_deleted" (Azure reached the corresponding terminal state
470+
// before the cancel took effect — split from the legacy "cancel_too_late"
471+
// so dashboards can answer "how often does cancel race a *successful*
472+
// deployment?"), "cancel_too_late" (fallback for unexpected terminal
473+
// states), "cancel_failed" (the cancel request itself returned an error).
474+
ProvisionCancellationKey = AttributeKey{
475+
Key: attribute.Key("provision.cancellation"),
476+
Classification: SystemMetadata,
477+
Purpose: FeatureInsight,
478+
}
479+
)
480+
457481
// The value used for ServiceNameKey
458482
const ServiceNameAzd = "azd"
459483

cli/azd/pkg/azapi/deployments.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ const (
3232

3333
var ErrPreviewNotSupported = errors.New("preview not supported")
3434

35+
// ErrCancelNotSupported indicates that the deployment provider does not support
36+
// cancelling an in-flight deployment (e.g. deployment stacks). Callers can use
37+
// errors.Is to detect this case and fall back to "leave running" behavior.
38+
var ErrCancelNotSupported = errors.New("cancel not supported for this deployment kind")
39+
3540
const emptySubscriptionArmTemplate = `{
3641
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
3742
"contentVersion": "1.0.0.0",
@@ -226,6 +231,25 @@ type DeploymentService interface {
226231
options map[string]any,
227232
progress *async.Progress[DeleteDeploymentProgress],
228233
) error
234+
// CancelSubscriptionDeployment requests Azure to cancel a running
235+
// subscription-scoped deployment. The call returns immediately after the
236+
// cancel request is accepted; callers should poll the deployment to observe
237+
// the terminal state (Canceled, Failed, or Succeeded).
238+
CancelSubscriptionDeployment(
239+
ctx context.Context,
240+
subscriptionId string,
241+
deploymentName string,
242+
) error
243+
// CancelResourceGroupDeployment requests Azure to cancel a running
244+
// resource-group-scoped deployment. The call returns immediately after the
245+
// cancel request is accepted; callers should poll the deployment to observe
246+
// the terminal state (Canceled, Failed, or Succeeded).
247+
CancelResourceGroupDeployment(
248+
ctx context.Context,
249+
subscriptionId string,
250+
resourceGroupName string,
251+
deploymentName string,
252+
) error
229253
}
230254

231255
type DeleteResourceState string

cli/azd/pkg/azapi/stack_deployments.go

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -677,6 +677,29 @@ func (d *StackDeployments) CalculateTemplateHash(
677677
return d.standardDeployments.CalculateTemplateHash(ctx, subscriptionId, template)
678678
}
679679

680+
// CancelSubscriptionDeployment is not supported for deployment stacks. The
681+
// deployment stacks ARM API does not expose a per-stack cancel operation;
682+
// stopping a stack mid-deployment requires deleting the stack itself. Returns
683+
// ErrCancelNotSupported so callers can distinguish this from a real failure.
684+
func (d *StackDeployments) CancelSubscriptionDeployment(
685+
ctx context.Context,
686+
subscriptionId string,
687+
deploymentName string,
688+
) error {
689+
return ErrCancelNotSupported
690+
}
691+
692+
// CancelResourceGroupDeployment is not supported for deployment stacks. See
693+
// CancelSubscriptionDeployment for details.
694+
func (d *StackDeployments) CancelResourceGroupDeployment(
695+
ctx context.Context,
696+
subscriptionId string,
697+
resourceGroupName string,
698+
deploymentName string,
699+
) error {
700+
return ErrCancelNotSupported
701+
}
702+
680703
func (d *StackDeployments) createClient(ctx context.Context, subscriptionId string) (*armdeploymentstacks.Client, error) {
681704
credential, err := d.credentialProvider.CredentialForSubscription(ctx, subscriptionId)
682705
if err != nil {

cli/azd/pkg/azapi/standard_deployments.go

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,47 @@ func (ds *StandardDeployments) DeleteResourceGroupDeployment(
580580
return nil
581581
}
582582

583+
// CancelSubscriptionDeployment requests Azure to cancel a running
584+
// subscription-scoped deployment. The ARM Cancel call returns immediately once
585+
// the request is accepted; callers should poll the deployment to observe the
586+
// terminal state (Canceled, Failed, or Succeeded).
587+
func (ds *StandardDeployments) CancelSubscriptionDeployment(
588+
ctx context.Context,
589+
subscriptionId string,
590+
deploymentName string,
591+
) error {
592+
deploymentClient, err := ds.createDeploymentsClient(ctx, subscriptionId)
593+
if err != nil {
594+
return fmt.Errorf("creating deployments client: %w", err)
595+
}
596+
597+
if _, err := deploymentClient.CancelAtSubscriptionScope(ctx, deploymentName, nil); err != nil {
598+
return fmt.Errorf("cancelling subscription deployment: %w", err)
599+
}
600+
return nil
601+
}
602+
603+
// CancelResourceGroupDeployment requests Azure to cancel a running
604+
// resource-group-scoped deployment. The ARM Cancel call returns immediately
605+
// once the request is accepted; callers should poll the deployment to observe
606+
// the terminal state (Canceled, Failed, or Succeeded).
607+
func (ds *StandardDeployments) CancelResourceGroupDeployment(
608+
ctx context.Context,
609+
subscriptionId string,
610+
resourceGroupName string,
611+
deploymentName string,
612+
) error {
613+
deploymentClient, err := ds.createDeploymentsClient(ctx, subscriptionId)
614+
if err != nil {
615+
return fmt.Errorf("creating deployments client: %w", err)
616+
}
617+
618+
if _, err := deploymentClient.Cancel(ctx, resourceGroupName, deploymentName, nil); err != nil {
619+
return fmt.Errorf("cancelling resource group deployment: %w", err)
620+
}
621+
return nil
622+
}
623+
583624
func (ds *StandardDeployments) WhatIfDeployToSubscription(
584625
ctx context.Context,
585626
subscriptionId string,

cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -875,18 +875,46 @@ func (p *BicepProvider) Deploy(ctx context.Context) (*provisioning.DeployResult,
875875
// Start the deployment
876876
p.console.ShowSpinner(ctx, "Creating/Updating resources", input.Step)
877877

878+
deployCtx, interruptStarted, interruptCh, markDeployCompleted, interruptCleanup :=
879+
p.installDeploymentInterruptHandler(ctx, deployment, cancelProgress)
880+
cleanupOnce := sync.OnceFunc(interruptCleanup)
881+
defer cleanupOnce()
882+
878883
deployResult, err := p.deployModule(
879-
ctx,
884+
deployCtx,
880885
deployment,
881886
planned.RawArmTemplate,
882887
planned.Parameters,
883888
deploymentTags,
884889
optionsMap,
885890
)
891+
892+
// Try to atomically claim the "completed" state. If the interrupt
893+
// handler already claimed "interrupting", the CAS fails and we must
894+
// wait for the handler's outcome so the user's Ctrl+C is never
895+
// silently dropped.
896+
if !markDeployCompleted() {
897+
// Handler has claimed the interrupt — wait for its outcome.
898+
<-interruptStarted
899+
outcome := <-interruptCh
900+
cleanupOnce()
901+
tracing.SetUsageAttributes(
902+
fields.ProvisionCancellationKey.String(outcome.telemetryValue))
903+
return nil, applyInterruptOutcome(outcome, err)
904+
}
905+
906+
// Deploy completed naturally — tear the handler down before
907+
// post-processing to avoid resurfacing the cancel/leave prompt over
908+
// subsequent output.
909+
cleanupOnce()
910+
886911
if err != nil {
912+
tracing.SetUsageAttributes(fields.ProvisionCancellationKey.String("none"))
887913
return nil, err
888914
}
889915

916+
tracing.SetUsageAttributes(fields.ProvisionCancellationKey.String("none"))
917+
890918
result.Outputs = provisioning.OutputParametersFromArmOutputs(
891919
planned.Template.Outputs,
892920
azapi.CreateDeploymentOutput(deployResult.Outputs),

0 commit comments

Comments
 (0)