|
| 1 | +# E2E CodeBuild Failure Analysis |
| 2 | + |
| 3 | +**Date**: March 16, 2026 |
| 4 | +**Batches Analyzed**: |
| 5 | +- `AmplifyCLI-E2E-Testing:c26a7126-ab8e-451a-8eee-24c2f4e89973` (March 5, 2026) - **FAILED** (2 of 276 builds) |
| 6 | +- `AmplifyCLI-E2E-Testing:02b55f32-a0f8-46b6-82b3-2c23a156a970` (March 12, 2026) - **FAILED** (12 of 276 builds) |
| 7 | +- `AmplifyCLI-E2E-Testing:1dff647e-6a86-492b-bbd2-112a9f33ae0f` (Feb 27, 2026) - **SUCCEEDED** (reference) |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +14 unique build failures across 2 batches, representing 8 distinct test files. All failures show the same root error pattern: `Process exited with non zero exit code 1` from `nexpect.ts:442:24`, meaning the amplify CLI process itself exits non-zero during test execution. |
| 12 | + |
| 13 | +Additionally, a cross-cutting bug was found in the test infrastructure: `TypeError: The "code" argument must be of type number. Received type boolean` (28 occurrences on Windows builds). |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Failure Pattern #1: Windows `process.exit` TypeError (ALREADY FIXED in dev) |
| 18 | + |
| 19 | +**Error**: `TypeError: The "code" argument must be of type number. Received type boolean (false)` |
| 20 | +**Location**: `cli-test-runner.js:21` (source-mapped) |
| 21 | +**Frequency**: 28 occurrences across multiple Windows builds |
| 22 | +**Root Cause**: The previous version of `cli-test-runner.js` had `process.exit(result.numFailingTests !== 0)` which passed a **boolean** directly to `process.exit()`. Node.js 20+ strictly validates exit code types. |
| 23 | + |
| 24 | +**Status**: Fixed in `1eba0c3f22` (March 13, 2026) - `process.exit(result.numFailingTests !== 0 ? 1 : 0)` |
| 25 | + |
| 26 | +**Related Bug Found**: In `nexpect.ts` (line 776), the environment variable `CI` is set to boolean `false` instead of string `'false'`: |
| 27 | +```typescript |
| 28 | +childEnv.CI = false; // BUG: should be 'false' (string) - env vars must be strings |
| 29 | +``` |
| 30 | +→ **Fixed in this PR** |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Failure Pattern #2: Container API Tests (4 test files) |
| 35 | + |
| 36 | +### Failing Tests: |
| 37 | +| Test File | Test Name | Failures | |
| 38 | +|-----------|-----------|----------| |
| 39 | +| `containers-api-1.test.ts` | init project, enable containers and add multi-container api | 4 | |
| 40 | +| `containers-api-2.test.ts` | init project, enable containers and add multi-container api push, edit and push | 4 | |
| 41 | +| `containers-api-secrets.test.ts` | init project, api container secrets should work | 4 | |
| 42 | +| `custom_policies_container.test.ts` | should init and deploy a api container, attach custom policies to the Fargate task | 4 | |
| 43 | + |
| 44 | +**Error**: `Process exited with non zero exit code 1` during `amplifyPushWithoutCodegen` or `amplifyPushSecretsWithoutCodegen` |
| 45 | +**Root Cause**: Container deployments via ECS/Fargate are failing during CloudFormation stack creation. The CLI exits with code 1 when a push/deployment fails. These tests involve Docker container builds, ECR image pushes, ECS Fargate service creation, ALB, and VPC — all prone to transient failures. |
| 46 | + |
| 47 | +**Classification**: Infrastructure/Flaky — The CLI itself correctly reports failure; the underlying CloudFormation deployment fails. |
| 48 | + |
| 49 | +**Fix Applied**: Enabled `jest.retryTimes(1)` for CodeBuild environments (was only enabled for CircleCI). |
| 50 | + |
| 51 | +--- |
| 52 | + |
| 53 | +## Failure Pattern #3: Function Secrets Tests (function_7.test.ts) |
| 54 | + |
| 55 | +### Failing Tests (ALL 7 tests in the suite): |
| 56 | +| Test Name | Failures | |
| 57 | +|-----------|----------| |
| 58 | +| configures secret that is accessible in the cloud | 3+ | |
| 59 | +| removes secrets immediately when unpushed function is removed from project | 3+ | |
| 60 | +| removes secrets on push when func is already pushed | 3+ | |
| 61 | +| removes secrets on push when pushed function is removed | 3+ | |
| 62 | +| removes / copies secrets when env removed / added | 3+ | |
| 63 | +| prompts for missing secrets and removes unused secrets on push | 3+ | |
| 64 | +| keeps old secrets when pushing secrets added in another env | 3+ | |
| 65 | + |
| 66 | +**Error**: `Process exited with non zero exit code 1` during various amplify CLI operations |
| 67 | +**Root Cause**: The entire suite fails on both attempts, suggesting a systemic issue. The March 12 batch was testing the `sanjrkmr/dev` branch which includes SSM retry mechanism changes (PR #14659). These SSM changes may have introduced regressions affecting function secret operations. |
| 68 | + |
| 69 | +**Classification**: Likely product code regression from SSM retry changes — not fixable in e2e test code alone. |
| 70 | + |
| 71 | +**Fixes Applied**: |
| 72 | +- `amplifyPushMissingFuncSecret` was missing `noOutputTimeout` (using default 5min instead of 20min push timeout) → Fixed |
| 73 | +- Enabled `jest.retryTimes(1)` for CodeBuild |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Failure Pattern #4: Custom Resources Tests (2 test files) |
| 78 | + |
| 79 | +### Failing Tests: |
| 80 | +| Test File | Test Name | Failures | |
| 81 | +|-----------|-----------|----------| |
| 82 | +| `custom_resources.test.ts` | add/update CDK and CFN custom resources | 2 | |
| 83 | +| `custom-resource-with-storage.test.ts` | verify export custom storage types | 2 | |
| 84 | + |
| 85 | +**Error**: `Process exited with non zero exit code 1` during `amplifyPushAuth` or `buildCustomResources` |
| 86 | +**Root Cause**: CDK custom resource compilation and CloudFormation deployment failures. |
| 87 | + |
| 88 | +**Classification**: Infrastructure/Flaky |
| 89 | + |
| 90 | +**Fixes Applied**: |
| 91 | +- `buildCustomResources` no-output timeout increased from 5min to 10min |
| 92 | +- Enabled `jest.retryTimes(1)` for CodeBuild |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## Failure Pattern #5: Notifications SMS Test |
| 97 | + |
| 98 | +### Failing Tests: |
| 99 | +| Test File | Test Name | Failures | |
| 100 | +|-----------|-----------|----------| |
| 101 | +| `notifications-sms.test.ts` | should add and remove the SMS channel correctly when no pinpoint is configured | 4 | |
| 102 | + |
| 103 | +**Error**: `Process exited with non zero exit code 1` during notification channel operations |
| 104 | +**Root Cause**: Notification operations (add/remove) create Pinpoint, Analytics, and Auth resources. The CLI exits with code 1 during one of these operations. |
| 105 | + |
| 106 | +**Classification**: Infrastructure/Flaky — Pinpoint operations are slow and prone to throttling. |
| 107 | + |
| 108 | +**Fixes Applied**: |
| 109 | +- All notification operations (`addNotificationChannel`, `removeNotificationChannel`, `removeAllNotificationChannel`, `updateNotificationChannel`) increased from 5min to 10min no-output timeout |
| 110 | +- Enabled `jest.retryTimes(1)` for CodeBuild |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Summary of Fixes Applied |
| 115 | + |
| 116 | +### 1. `nexpect.ts` - Fix boolean environment variable (P0) |
| 117 | +**File**: `packages/amplify-e2e-core/src/utils/nexpect.ts` |
| 118 | +`childEnv.CI = false` → `childEnv.CI = 'false'` |
| 119 | +Environment variables must be strings. The boolean caused `TypeError: The "code" argument` errors on Windows with Node.js 20+. |
| 120 | + |
| 121 | +### 2. `setup-tests.ts` - Enable jest retries for CodeBuild (P1) |
| 122 | +**File**: `packages/amplify-e2e-tests/src/setup-tests.ts` |
| 123 | +`if (process.env.CIRCLECI)` → `if (process.env.CIRCLECI || process.env.CODEBUILD_BUILD_ID)` |
| 124 | +Previously, `jest.retryTimes(1)` was only enabled for CircleCI. CodeBuild was missing this, meaning flaky tests had no per-test retry. |
| 125 | + |
| 126 | +### 3. `amplifyPush.ts` - Add missing noOutputTimeout to push functions (P1) |
| 127 | +**File**: `packages/amplify-e2e-core/src/init/amplifyPush.ts` |
| 128 | +Added `noOutputTimeout: pushTimeoutMS` (20 min) to: |
| 129 | +- `amplifyPushMissingFuncSecret` |
| 130 | +- `amplifyPushIterativeRollback` |
| 131 | +- `amplifyPushMissingEnvVar` |
| 132 | + |
| 133 | +These push functions were using the default 5-minute timeout instead of the standard 20-minute push timeout. |
| 134 | + |
| 135 | +### 4. `notifications.ts` - Increase notification operation timeouts (P1) |
| 136 | +**File**: `packages/amplify-e2e-core/src/categories/notifications.ts` |
| 137 | +Increased no-output timeout from 5min to 10min for all notification operations. |
| 138 | + |
| 139 | +### 5. `custom.ts` - Increase custom resource build timeout (P1) |
| 140 | +**File**: `packages/amplify-e2e-core/src/categories/custom.ts` |
| 141 | +Increased `buildCustomResources` no-output timeout from 5min to 10min. |
0 commit comments