Skip to content

Commit a575cba

Browse files
authored
docs: Expand troubleshooting event types for root cause diagnosis (#184)
Add detailed event type documentation to step 2b of the troubleshooting guide, providing explicit guidance on key execution history events: - Execution-level failures (ExecutionFailed, ExecutionTimedOut, ExecutionStopped) - Context and step failures (ContextFailed, StepFailed with RetryDetails) - Callback issues (CallbackStarted, CallbackTimedOut, CallbackFailed) - Chained invocation failures (ChainedInvokeFailed/TimedOut/Stopped) - Other signals (WaitCancelled, InvocationCompleted with Error) This gives agents clearer guidance for interpreting execution history events rather than relying on implicit knowledge. Syncs with awslabs/agent-plugins#160.
1 parent c451dcb commit a575cba

1 file changed

Lines changed: 26 additions & 1 deletion

File tree

aws-lambda-durable-functions-power/steering/troubleshooting-executions.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,32 @@ Steps:
3939
4040
2. If the command succeeds, analyze and provide a user-friendly diagnosis:
4141
a. Report the execution status (RUNNING/SUCCEEDED/FAILED/STOPPED/TIMED_OUT)
42-
b. Identify the root cause:
42+
b. Identify the root cause by looking for these key events in the history:
43+
44+
**Execution-level failures:**
45+
- `ExecutionFailed` — entire execution crashed; extract the error and cause fields
46+
- `ExecutionTimedOut` — the execution exceeded its configured timeout
47+
- `ExecutionStopped` — execution was manually stopped via StopDurableExecution
48+
49+
**Context and step failures:**
50+
- `ContextFailed` — a child context threw an unhandled error; check the parent context for what triggered it
51+
- `StepFailed` — an individual step failed; includes RetryDetails (CurrentAttempt, NextAttemptDelaySeconds) showing retry state
52+
53+
**Callback issues:**
54+
- `CallbackStarted` with a Timeout field — confirms a timeout was registered; correlate with any subsequent `CallbackTimedOut`
55+
- `CallbackTimedOut` — a timeout fired but may not have been caught by the function code
56+
- `CallbackFailed` — the callback was resolved with an error
57+
58+
**Chained invocation failures:**
59+
- `ChainedInvokeFailed` — a chained (child) durable execution failed
60+
- `ChainedInvokeTimedOut` — a chained execution exceeded its timeout
61+
- `ChainedInvokeStopped` — a chained execution was stopped
62+
63+
**Other signals:**
64+
- `WaitCancelled` — a scheduled wait was cancelled before completing
65+
- `InvocationCompleted` with an Error field — the Lambda invocation itself errored (e.g., runtime crash)
66+
67+
**Diagnosis patterns:**
4368
- Failed operations: Show the EXACT error message verbatim in a code block
4469
- Stuck in WAIT_FOR_CALLBACK: Extract callback ID, show how long it's been waiting
4570
- Timeout: Show which operation was running when timeout occurred

0 commit comments

Comments
 (0)