Summary
I would like to clarify the intended semantics of ProcessEvent.EndEvent.error in envd.
Currently, when a process starts successfully but exits with a non-zero status, envd sets both exit_code and error. For example, a normal exit 1 is reported roughly as:
exit_code: 1
exited: true
status: "exit status 1"
error: "exit status 1"
This makes it unclear whether error represents a runtime/infrastructure-level failure, or simply any non-zero process result.
This is not tied to a specific E2B Cloud Sandbox ID or Build ID. The behavior was observed in a downstream integration using envd's Process API, and it also appears to follow from the current process handler code path.
Current behavior
In packages/envd/internal/services/process/handler/handler.go, Wait() calls cmd.Wait() and copies any returned error into EndEvent.error:
err := p.cmd.Wait()
var errMsg *string
if err != nil {
msg := err.Error()
errMsg = &msg
}
endEvent := &rpc.ProcessEvent_EndEvent{
Error: errMsg,
ExitCode: int32(p.cmd.ProcessState.ExitCode()),
Exited: p.cmd.ProcessState.Exited(),
Status: p.cmd.ProcessState.String(),
}
However, in Go, exec.Cmd.Wait() returns *exec.ExitError when the process starts successfully but exits with a non-zero status.
So a normal command failure such as exit 1 becomes both:
exit_code = 1
error = "exit status 1"
This appears to be the relevant envd code path that constructs the process end event for Process.Start.
Question: should normal non-zero exits set EndEvent.error?
The current proto definition is:
message EndEvent {
sint32 exit_code = 1;
bool exited = 2;
string status = 3;
optional string error = 4;
}
The part I would like to clarify is the relationship between exit_code and error.
When a process starts successfully and exits with a non-zero code, should that be represented only by exit_code / exited / status, or should it also populate error?
My expectation is that exit_code should describe the result of the user's process, while error should describe envd/runtime-level failures, such as failing to start, wait for, or observe the process.
If this is the intended contract, it would be helpful to document it in process.proto so clients can handle process failures consistently.
Why this matters
For Process API clients, the current behavior makes it difficult to distinguish between:
- the process ran and returned a non-zero exit code; and
- envd/runtime failed to start, run, wait for, or observe the process.
In a downstream Kubernetes operator integration, this caused failed lifecycle hook scripts to be treated as runtime execution errors. As a result, the client entered an error-handling path and lost the collected stdout/stderr from the failed script.
The user-visible error became only:
hook execution error: process error: exit status 1
instead of showing the actual script output.
Root cause
The underlying issue seems to be the mapping from Go's os/exec API to the envd Process API.
exec.Cmd.Wait() uses an error return for normal non-zero process exits via *exec.ExitError. That is useful at the Go API level, but it may not map cleanly to the RPC-level EndEvent.error field if that field is intended to mean a runtime/infrastructure-level failure.
The current envd implementation passes the cmd.Wait() error through to EndEvent.error without distinguishing *exec.ExitError from other wait/runtime errors.
Possible direction
If the intended semantics are to keep error for runtime/infrastructure-level failures, envd could special-case *exec.ExitError from cmd.Wait() when ProcessState is valid and the process has exited:
err := p.cmd.Wait()
var errMsg *string
if err != nil{
var exitErr *exec.ExitError
if !(errors.As(err, &exitErr) && p.cmd.ProcessState != nil && p.cmd.ProcessState.Exited()) {
msg := err.Error()
errMsg = &msg
}
}
Normal non-zero exits would then be represented by exit_code, exited, and status, while error would remain available for runtime/infrastructure-level failures.
It may also be useful to add comments to process.proto documenting the intended meaning of error.
Compatibility note
This could be a behavior change for existing Process API clients that currently rely on EndEvent.error being set for non-zero exits.
So I would like to confirm the intended semantics before submitting a PR. I am happy to contribute a PR if maintainers agree with this direction.
Summary
I would like to clarify the intended semantics of
ProcessEvent.EndEvent.errorin envd.Currently, when a process starts successfully but exits with a non-zero status, envd sets both
exit_codeanderror. For example, a normalexit 1is reported roughly as:This makes it unclear whether
errorrepresents a runtime/infrastructure-level failure, or simply any non-zero process result.This is not tied to a specific E2B Cloud Sandbox ID or Build ID. The behavior was observed in a downstream integration using envd's Process API, and it also appears to follow from the current process handler code path.
Current behavior
In
packages/envd/internal/services/process/handler/handler.go,Wait()callscmd.Wait()and copies any returned error intoEndEvent.error:However, in Go,
exec.Cmd.Wait()returns*exec.ExitErrorwhen the process starts successfully but exits with a non-zero status.So a normal command failure such as
exit 1becomes both:This appears to be the relevant envd code path that constructs the process end event for
Process.Start.Question: should normal non-zero exits set
EndEvent.error?The current proto definition is:
The part I would like to clarify is the relationship between
exit_codeanderror.When a process starts successfully and exits with a non-zero code, should that be represented only by
exit_code/exited/status, or should it also populateerror?My expectation is that
exit_codeshould describe the result of the user's process, whileerrorshould describe envd/runtime-level failures, such as failing to start, wait for, or observe the process.If this is the intended contract, it would be helpful to document it in
process.protoso clients can handle process failures consistently.Why this matters
For Process API clients, the current behavior makes it difficult to distinguish between:
In a downstream Kubernetes operator integration, this caused failed lifecycle hook scripts to be treated as runtime execution errors. As a result, the client entered an error-handling path and lost the collected stdout/stderr from the failed script.
The user-visible error became only:
instead of showing the actual script output.
Root cause
The underlying issue seems to be the mapping from Go's
os/execAPI to the envd Process API.exec.Cmd.Wait()uses an error return for normal non-zero process exits via*exec.ExitError. That is useful at the Go API level, but it may not map cleanly to the RPC-levelEndEvent.errorfield if that field is intended to mean a runtime/infrastructure-level failure.The current envd implementation passes the
cmd.Wait()error through toEndEvent.errorwithout distinguishing*exec.ExitErrorfrom other wait/runtime errors.Possible direction
If the intended semantics are to keep
errorfor runtime/infrastructure-level failures, envd could special-case*exec.ExitErrorfromcmd.Wait()whenProcessStateis valid and the process has exited:Normal non-zero exits would then be represented by
exit_code,exited, andstatus, whileerrorwould remain available for runtime/infrastructure-level failures.It may also be useful to add comments to
process.protodocumenting the intended meaning oferror.Compatibility note
This could be a behavior change for existing Process API clients that currently rely on
EndEvent.errorbeing set for non-zero exits.So I would like to confirm the intended semantics before submitting a PR. I am happy to contribute a PR if maintainers agree with this direction.