fix(orchestrator): include phase/step context and original error chain in build cancellation messages#3161
Conversation
There was a problem hiding this comment.
Code Review
Using '%s' with 'sanitizeErrorMessage(err)' in 'fmt.Errorf' discards the original error chain, preventing callers from using 'errors.Is' or 'errors.As' to inspect underlying causes like 'context.Canceled'. To preserve the error chain while keeping the single-line output formatting, wrap the original error using a custom wrapper type that implements 'Unwrap() error' and overrides 'Error()' to sanitize newlines.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if errors.Is(err, context.Canceled) { | ||
| return phases.NewPhaseBuildError(phases.PhaseMeta{}, ErrCanceled) | ||
| return phases.NewPhaseBuildError(phases.PhaseMeta{}, fmt.Errorf("%w: %s", ErrCanceled, sanitizeErrorMessage(err))) | ||
| } | ||
|
|
||
| // If it's a timeout context, wrap it as a user error | ||
| // If it's a timeout context, wrap it as a user error while preserving the | ||
| // original error chain for diagnostics. | ||
| if errors.Is(err, context.DeadlineExceeded) { | ||
| return phases.NewPhaseBuildError(phases.PhaseMeta{}, ErrTimeout) | ||
| return phases.NewPhaseBuildError(phases.PhaseMeta{}, fmt.Errorf("%w: %s", ErrTimeout, sanitizeErrorMessage(err))) | ||
| } | ||
|
|
||
| return err | ||
| } | ||
|
|
||
| // sanitizeErrorMessage replaces newlines in error messages (e.g. from errors.Join) | ||
| // with "; " so the user-facing message stays on a single line. | ||
| func sanitizeErrorMessage(err error) string { | ||
| return strings.ReplaceAll(err.Error(), "\n", "; ") | ||
| } |
There was a problem hiding this comment.
Using %s with sanitizeErrorMessage(err) in fmt.Errorf discards the original error chain, preventing callers from using errors.Is or errors.As to inspect underlying causes like context.Canceled. To preserve the error chain while keeping the single-line output formatting, wrap the original error using a custom wrapper type that implements Unwrap() error and overrides Error() to sanitize newlines.
if errors.Is(err, context.Canceled) {
return phases.NewPhaseBuildError(phases.PhaseMeta{}, fmt.Errorf("%w: %w", ErrCanceled, sanitizedError{err}))
}
// If it's a timeout context, wrap it as a user error while preserving the
// original error chain for diagnostics.
if errors.Is(err, context.DeadlineExceeded) {
return phases.NewPhaseBuildError(phases.PhaseMeta{}, fmt.Errorf("%w: %w", ErrTimeout, sanitizedError{err}))
}
return err
}
type sanitizedError struct {
err error
}
func (e sanitizedError) Error() string {
return strings.ReplaceAll(e.err.Error(), "\n", "; ")
}
func (e sanitizedError) Unwrap() error {
return e.err
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 172189830f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | ||
| return NewPhaseBuildError(meta, err) |
There was a problem hiding this comment.
Preserve timeout/cancel sentinels for phase context errors
When a build timeout or cancellation happens inside a phase, this wraps the raw context.Canceled/context.DeadlineExceeded in a PhaseBuildError; builderrors.WrapContextAsUserError then returns existing phase errors unchanged, so the reason persisted by create_template.go via UnwrapUserError becomes context deadline exceeded (phase: ...)/context canceled (phase: ...) instead of the standardized build timed out/build was cancelled wrapper, and errors.Is(err, ErrTimeout/ErrCanceled) no longer works for these phase failures. This affects the normal build-timeout path because Build joins ctx.Err() before calling the wrapper.
Useful? React with 👍 / 👎.
1721898 to
0e257cc
Compare
|
Addressed both review comments in the latest force-push (
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
Wrapping the original PhaseBuildError directly in sanitizedError causes the phase and step metadata to be appended twice in the final error message, as both the outer and inner PhaseBuildError instances format the metadata. Extracting and wrapping the underlying error pbe.Err instead of the full err prevents this duplication while preserving the complete error chain and context.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| var phase, step string | ||
| if pbe := phases.UnwrapPhaseBuildError(err); pbe != nil { | ||
| phase = pbe.Phase | ||
| step = pbe.Step | ||
| // For non-cancel/timeout user errors, return as-is. | ||
| if !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) { | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| // If it's a canceled context, wrap it as a user error | ||
| // wrapWithPhase creates a PhaseBuildError preserving any phase/step context | ||
| // extracted above, with the given sentinel wrapping the original error chain. | ||
| wrapWithPhase := func(sentinel error) error { | ||
| return &phases.PhaseBuildError{ | ||
| Phase: phase, | ||
| Step: step, | ||
| Err: fmt.Errorf("%w: %w", sentinel, sanitizedError{err}), | ||
| } | ||
| } |
There was a problem hiding this comment.
Wrapping the original PhaseBuildError directly in sanitizedError causes the phase and step metadata to be appended twice in the final error message, because both the outer and inner PhaseBuildError instances will format the metadata. Extracting and wrapping the underlying error pbe.Err instead of the full err prevents this duplication while preserving the complete error chain and context.
var phase, step string
errToWrap := err
if pbe := phases.UnwrapPhaseBuildError(err); pbe != nil {
phase = pbe.Phase
step = pbe.Step
// For non-cancel/timeout user errors, return as-is.
if !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) {
return err
}
errToWrap = pbe.Err
}
// wrapWithPhase creates a PhaseBuildError preserving any phase/step context
extracted above, with the given sentinel wrapping the original error chain.
wrapWithPhase := func(sentinel error) error {
return &phases.PhaseBuildError{
Phase: phase,
Step: step,
Err: fmt.Errorf("%w: %w", sentinel, sanitizedError{errToWrap}),
}
}References
- Focus on error handling correctness and avoiding duplicate metadata formatting in user-facing errors. (link)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0e257cc6b3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return &phases.PhaseBuildError{ | ||
| Phase: phase, | ||
| Step: step, | ||
| Err: fmt.Errorf("%w: %w", sentinel, sanitizedError{err}), |
There was a problem hiding this comment.
Preserve phase metadata without printing it twice
When WrapContextAsUserError receives the PhaseBuildError produced by phases.Run for a canceled/expired phase, err.Error() already includes the phase/step suffix added by PhaseBuildError.Error(). Wrapping that same err inside a new PhaseBuildError with the extracted phase/step makes the status returned by UnwrapUserError print the metadata twice, e.g. build was cancelled: getting hash: context canceled (phase: provision, step: 1) (phase: provision, step: 1), which corrupts the user-facing cancellation message on exactly the path this change is trying to improve.
Useful? React with 👍 / 👎.
0e257cc to
1a50a94
Compare
There was a problem hiding this comment.
Code Review
In packages/orchestrator/pkg/template/build/builderrors/errors.go, setting errToWrap unconditionally to pbe.Err when err wraps a PhaseBuildError discards any outer wrapping context. Comparing err directly to pbe ensures that we only unwrap to pbe.Err when err is exactly the PhaseBuildError itself, preserving any outer error context in the diagnostics.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| errToWrap := err | ||
| if pbe := phases.UnwrapPhaseBuildError(err); pbe != nil { | ||
| phase = pbe.Phase | ||
| step = pbe.Step | ||
| errToWrap = pbe.Err |
There was a problem hiding this comment.
When err wraps a PhaseBuildError but is not identical to it (for example, if an outer function wrapped the error with additional context), setting errToWrap unconditionally to pbe.Err discards the outer wrapping context. Comparing err directly to pbe ensures that we only unwrap to pbe.Err when err is exactly the PhaseBuildError itself, preserving any outer error context in the diagnostics.
errToWrap := err
if pbe := phases.UnwrapPhaseBuildError(err); pbe != nil {
phase = pbe.Phase
step = pbe.Step
if err == pbe {
errToWrap = pbe.Err
}1a50a94 to
ce7d373
Compare
|
To use Codex here, create a Codex account and connect to github. |
There was a problem hiding this comment.
Code Review
Calling WrapContextAsUserError on an error that has already been wrapped with ErrCanceled or ErrTimeout will result in duplicate wrapping and nested sentinel messages because errors.Is(err, context.Canceled) remains true even after wrapping. Adding an early return when the error already wraps ErrCanceled or ErrTimeout prevents this infinite wrapping behavior.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func WrapContextAsUserError(err error) error { | ||
| if err == nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Calling WrapContextAsUserError on an error that has already been wrapped with ErrCanceled or ErrTimeout will result in duplicate wrapping and nested sentinel messages (e.g., "build was cancelled: build was cancelled: context canceled"). This occurs because errors.Is(err, context.Canceled) remains true even after wrapping, causing subsequent calls to re-wrap the error. Adding an early return when the error already wraps ErrCanceled or ErrTimeout prevents this infinite wrapping behavior.
| func WrapContextAsUserError(err error) error { | |
| if err == nil { | |
| return nil | |
| } | |
| func WrapContextAsUserError(err error) error { | |
| if err == nil { | |
| return nil | |
| } | |
| if errors.Is(err, ErrCanceled) || errors.Is(err, ErrTimeout) { | |
| return err | |
| } |
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
When a PhaseBuildError wraps another error that already contains a PhaseBuildError, both the inner and outer errors will append the phase and step metadata to the error string, resulting in duplicate metadata suffixes in the final user-facing message. To prevent this duplication while preserving the metadata fields for unwrapping, the Error() method should check if the underlying error chain already contains a PhaseBuildError and avoid appending the suffix again if so.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (e *PhaseBuildError) Error() string { | ||
| return e.Err.Error() | ||
| if e.Phase == "" && e.Step == "" { | ||
| return e.Err.Error() | ||
| } | ||
|
|
||
| if e.Step == "" { | ||
| return fmt.Sprintf("%s (phase: %s)", e.Err.Error(), e.Phase) | ||
| } | ||
|
|
||
| return fmt.Sprintf("%s (phase: %s, step: %s)", e.Err.Error(), e.Phase, e.Step) | ||
| } |
There was a problem hiding this comment.
When a PhaseBuildError wraps another error that already contains a PhaseBuildError (for example, when WrapContextAsUserError wraps an outer error chain), both the inner and outer errors will append the phase and step metadata to the error string. This results in duplicate metadata suffixes in the final user-facing message. To prevent this duplication while preserving the metadata fields for unwrapping, the Error() method should check if the underlying error chain already contains a PhaseBuildError and avoid appending the suffix again if so.
| func (e *PhaseBuildError) Error() string { | |
| return e.Err.Error() | |
| if e.Phase == "" && e.Step == "" { | |
| return e.Err.Error() | |
| } | |
| if e.Step == "" { | |
| return fmt.Sprintf("%s (phase: %s)", e.Err.Error(), e.Phase) | |
| } | |
| return fmt.Sprintf("%s (phase: %s, step: %s)", e.Err.Error(), e.Phase, e.Step) | |
| } | |
| func (e *PhaseBuildError) Error() string { | |
| if (e.Phase == "" && e.Step == "") || UnwrapPhaseBuildError(e.Err) != nil { | |
| return e.Err.Error() | |
| } | |
| if e.Step == "" { | |
| return fmt.Sprintf("%s (phase: %s)", e.Err.Error(), e.Phase) | |
| } | |
| return fmt.Sprintf("%s (phase: %s, step: %s)", e.Err.Error(), e.Phase, e.Step) | |
| } |
ce7d373 to
1c5d2a3
Compare
|
@gemini-code-assist review 1c5d2a3 |
There was a problem hiding this comment.
Code Review
This pull request enhances error wrapping and context propagation during template builds. However, potential nil pointer dereferences exist in the Error() methods of both PhaseBuildError and sanitizedError if their underlying error fields are nil, which could lead to application panics during error formatting.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (e *PhaseBuildError) Error() string { | ||
| return e.Err.Error() | ||
| // If the inner error chain already contains a PhaseBuildError, skip | ||
| // appending phase/step to avoid duplicate metadata in the message. | ||
| if (e.Phase == "" && e.Step == "") || UnwrapPhaseBuildError(e.Err) != nil { | ||
| return e.Err.Error() | ||
| } | ||
|
|
||
| if e.Step == "" { | ||
| return fmt.Sprintf("%s (phase: %s)", e.Err.Error(), e.Phase) | ||
| } | ||
|
|
||
| return fmt.Sprintf("%s (phase: %s, step: %s)", e.Err.Error(), e.Phase, e.Step) | ||
| } |
There was a problem hiding this comment.
If e.Err is nil, calling e.Err.Error() will cause a nil pointer dereference panic. To prevent application crashes during error formatting, check if e.Err is nil at the beginning of the Error() method and return a safe fallback message.
func (e *PhaseBuildError) Error() string {
if e.Err == nil {
if e.Phase == "" && e.Step == "" {
return "unknown build error"
}
if e.Step == "" {
return fmt.Sprintf("build error (phase: %s)", e.Phase)
}
return fmt.Sprintf("build error (phase: %s, step: %s)", e.Phase, e.Step)
}
// If the inner error chain already contains a PhaseBuildError, skip
// appending phase/step to avoid duplicate metadata in the message.
if (e.Phase == "" && e.Step == "") || UnwrapPhaseBuildError(e.Err) != nil {
return e.Err.Error()
}
if e.Step == "" {
return fmt.Sprintf("%s (phase: %s)", e.Err.Error(), e.Phase)
}
return fmt.Sprintf("%s (phase: %s, step: %s)", e.Err.Error(), e.Phase, e.Step)
}| func (e sanitizedError) Error() string { | ||
| return strings.ReplaceAll(e.err.Error(), "\n", "; ") | ||
| } |
There was a problem hiding this comment.
If e.err is nil, calling e.err.Error() will cause a nil pointer dereference panic. To prevent application crashes during error formatting, check if e.err is nil at the beginning of the Error() method and return an empty string.
func (e sanitizedError) Error() string {
if e.err == nil {
return ""
}
return strings.ReplaceAll(e.err.Error(), "\n", "; ")
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c5d2a3c31
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if errors.Is(err, context.Canceled) { | ||
| return phases.NewPhaseBuildError(phases.PhaseMeta{}, ErrCanceled) | ||
| return wrapWithPhase(ErrCanceled) |
There was a problem hiding this comment.
Preserve existing phase failures when the build context ends
When an existing PhaseBuildError is joined with the top-level context error, this branch now rewraps it as ErrCanceled/ErrTimeout instead of leaving the original user failure intact. Builder.Build joins ctx.Err() into e immediately before calling this helper (builder.go:190), so a real step/configuration failure that races with build cancellation or shutdown is reported and counted as cancellation/timeout rather than the actual phase failure; the previous IsUserError short-circuit avoided that case. Only phase errors whose own chain is the context sentinel should be normalized here.
Useful? React with 👍 / 👎.
1c5d2a3 to
5a5830f
Compare
|
@gemini-code-assist review 5a5830f |
There was a problem hiding this comment.
Code Review
In PhaseBuildError.Error(), unconditionally returning e.Err.Error() when UnwrapPhaseBuildError(e.Err) != nil will silently discard the outer error's phase and step metadata if the nested PhaseBuildError has different phase or step values. To prevent losing this diagnostic context, only skip appending the metadata if the inner error's phase and step exactly match the outer error's phase and step.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (e.Phase == "" && e.Step == "") || UnwrapPhaseBuildError(e.Err) != nil { | ||
| return e.Err.Error() | ||
| } |
There was a problem hiding this comment.
In PhaseBuildError.Error(), unconditionally returning e.Err.Error() when UnwrapPhaseBuildError(e.Err) != nil will silently discard the outer error's phase and step metadata if the nested PhaseBuildError has different phase or step values. To prevent losing this diagnostic context, only skip appending the metadata if the inner error's phase and step exactly match the outer error's phase and step.
if pbe := UnwrapPhaseBuildError(e.Err); pbe != nil && pbe.Phase == e.Phase && pbe.Step == e.Step {
return e.Err.Error()
}
if e.Phase == "" && e.Step == "" {
return e.Err.Error()
}…n in build cancellation messages Previously, when a build was cancelled or timed out, the user-facing error message was always the generic "build was cancelled" or "build timed out", regardless of the underlying cause. This made it very difficult to diagnose why a build actually failed (e.g. envd crash due to GLIBC incompatibility, VM process exit, timeout during a specific phase). Changes: - PhaseBuildError.Error() now includes phase and step context when available, e.g. "context canceled (phase: provision, step: 1)" - phases.Run() annotates context cancellation/timeout errors with the metadata of the phase that was active when the interruption occurred - WrapContextAsUserError() preserves the original error chain instead of replacing it, so diagnostic details like "wait for envd: failed to init new envd" are visible to the user - Added sanitizeErrorMessage() to handle newlines from errors.Join Before: Build failed: build was cancelled After: Build failed: build was cancelled: wait for envd: failed to init new envd: context canceled or: Build failed: context canceled (phase: provision, step: 1)
5a5830f to
6aeea39
Compare
|
@gemini-code-assist review 6aeea39 |
There was a problem hiding this comment.
Code Review
This pull request refactors error handling in the template build process by updating WrapContextAsUserError to preserve phase metadata and the original error chain during context cancellations or timeouts, introducing a sanitizedError helper to format multiline errors, and refactoring PhaseBuildError.Error() to avoid metadata duplication. Additionally, a wrapContextErr helper is added to the build runner to consistently annotate context errors with the active phase and step. No review comments were provided, and there is no additional feedback.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Closing in favor of a cleaner PR with the same fix. |
1 similar comment
|
Closing in favor of a cleaner PR with the same fix. |
Problem
When a build is cancelled or times out,
WrapContextAsUserError()replaces the entire error chain with a bareErrCanceled/ErrTimedOut, so the user only seesbuild was cancelledwith no diagnostic context.Changes
packages/orchestrator/pkg/template/build/phases/errors.goPhaseBuildError.Error()now includes phase and step in the messagepackages/orchestrator/pkg/template/build/phases/phase.gowrapContextErrclosure inRun()that annotatescontext.Canceled/DeadlineExceedederrors with the active phase metadatapackages/orchestrator/pkg/template/build/builderrors/errors.goWrapContextAsUserError()now preserves the original error chain usingfmt.Errorf("%w: %s", ErrCanceled, ...)instead of replacing with bare sentinelsanitizeErrorMessage()to clean up multi-lineerrors.JoinoutputBefore / After
build was cancelledcontext canceled (phase: provision, step: 1)build was cancelledbuild was cancelled: wait for envd: failed to init new envd: context canceledbuild was cancelledbuild was cancelled: context canceledbuild timed outbuild timed out: context deadline exceededBackward Compatibility
Fully compatible —
errors.Is(err, ErrCanceled)anderrors.Is(err, ErrTimedOut)still work because the sentinels are wrapped with%w.