Clarify EndEvent.error semantics for normal non-zero exits in envd's process handler

## Summary

I would like to clarify the intended semantics of `ProcessEvent.EndEvent.error` in envd.

Currently, when a process starts successfully but exits with a non-zero status, envd sets both `exit_code` and `error`. For example, a normal `exit 1` is reported roughly as:

```text
exit_code: 1
exited: true
status: "exit status 1"
error: "exit status 1"
```

This makes it unclear whether `error` represents a runtime/infrastructure-level failure, or simply any non-zero process result.

This is not tied to a specific E2B Cloud Sandbox ID or Build ID. The behavior was observed in a downstream integration using envd's Process API, and it also appears to follow from the current process handler code path.

## Current behavior

In `packages/envd/internal/services/process/handler/handler.go`, `Wait()` calls `cmd.Wait()` and copies any returned error into `EndEvent.error`:

```go
err := p.cmd.Wait()

var errMsg *string
if err != nil {
    msg := err.Error()
    errMsg = &msg
}

endEvent := &rpc.ProcessEvent_EndEvent{
    Error:    errMsg,
    ExitCode: int32(p.cmd.ProcessState.ExitCode()),
    Exited:   p.cmd.ProcessState.Exited(),
    Status:   p.cmd.ProcessState.String(),
}
```

However, in Go, `exec.Cmd.Wait()` returns `*exec.ExitError` when the process starts successfully but exits with a non-zero status.

So a normal command failure such as `exit 1` becomes both:

```text
exit_code = 1
error = "exit status 1"
```

This appears to be the relevant envd code path that constructs the process end event for `Process.Start`.

## Question: should normal non-zero exits set `EndEvent.error`?

The current proto definition is:

```protobuf
message EndEvent {
    sint32 exit_code = 1;
    bool exited = 2;
    string status = 3;
    optional string error = 4;
}
```

The part I would like to clarify is the relationship between `exit_code` and `error`.

When a process starts successfully and exits with a non-zero code, should that be represented only by `exit_code` / `exited` / `status`, or should it also populate `error`?

My expectation is that `exit_code` should describe the result of the user's process, while `error` should describe envd/runtime-level failures, such as failing to start, wait for, or observe the process.

If this is the intended contract, it would be helpful to document it in `process.proto` so clients can handle process failures consistently.

## Why this matters

For Process API clients, the current behavior makes it difficult to distinguish between:

1. the process ran and returned a non-zero exit code; and
2. envd/runtime failed to start, run, wait for, or observe the process.

In a downstream Kubernetes operator integration, this caused failed lifecycle hook scripts to be treated as runtime execution errors. As a result, the client entered an error-handling path and lost the collected stdout/stderr from the failed script.

The user-visible error became only:

```text
hook execution error: process error: exit status 1
```

instead of showing the actual script output.

## Root cause

The underlying issue seems to be the mapping from Go's `os/exec` API to the envd Process API.

`exec.Cmd.Wait()` uses an error return for normal non-zero process exits via `*exec.ExitError`. That is useful at the Go API level, but it may not map cleanly to the RPC-level `EndEvent.error` field if that field is intended to mean a runtime/infrastructure-level failure.

The current envd implementation passes the `cmd.Wait()` error through to `EndEvent.error` without distinguishing `*exec.ExitError` from other wait/runtime errors.

## Possible direction

If the intended semantics are to keep `error` for runtime/infrastructure-level failures, envd could special-case `*exec.ExitError` from `cmd.Wait()` when `ProcessState` is valid and the process has exited:

```go
err := p.cmd.Wait()

var errMsg *string
if err != nil{
    var exitErr *exec.ExitError
    if !(errors.As(err, &exitErr) && p.cmd.ProcessState != nil && p.cmd.ProcessState.Exited()) {
        msg := err.Error()
        errMsg = &msg
    }
}
```

Normal non-zero exits would then be represented by `exit_code`, `exited`, and `status`, while `error` would remain available for runtime/infrastructure-level failures.

It may also be useful to add comments to `process.proto` documenting the intended meaning of `error`.

## Compatibility note

This could be a behavior change for existing Process API clients that currently rely on `EndEvent.error` being set for non-zero exits.

So I would like to confirm the intended semantics before submitting a PR. I am happy to contribute a PR if maintainers agree with this direction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify EndEvent.error semantics for normal non-zero exits in envd's process handler #2657

Summary

Current behavior

Question: should normal non-zero exits set `EndEvent.error`?

Why this matters

Root cause

Possible direction

Compatibility note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarify EndEvent.error semantics for normal non-zero exits in envd's process handler #2657

Description

Summary

Current behavior

Question: should normal non-zero exits set EndEvent.error?

Why this matters

Root cause

Possible direction

Compatibility note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Question: should normal non-zero exits set `EndEvent.error`?