|
| 1 | +--- |
| 2 | +name: nightly-test-debugger |
| 3 | +description: Analyze and debug failing nightly acceptance tests in this Go CLI repository. Use this skill whenever the user mentions nightly tests failing, CI failures in nightly.yml, acceptance test failures, or wants to investigate why a test failed in the nightly CI run. This includes diagnosing test failures, understanding error logs, identifying root causes, and suggesting fixes. |
| 4 | +--- |
| 5 | + |
| 6 | +# Nightly Test Debugger |
| 7 | + |
| 8 | +This skill helps you systematically analyze and debug failing nightly acceptance tests in the Scaleway CLI repository. |
| 9 | + |
| 10 | +## Understanding Nightly Tests |
| 11 | + |
| 12 | +The nightly test suite (`.github/workflows/nightly.yml`) runs acceptance tests for all products daily at 00:00 UTC. Each product is tested in isolation with: |
| 13 | + |
| 14 | +- `CLI_UPDATE_CASSETTES: true` - Records API interactions |
| 15 | +- `CLI_UPDATE_GOLDENS: true` - Updates expected output files |
| 16 | +- Real API calls against Scaleway infrastructure |
| 17 | + |
| 18 | +Common failure modes: |
| 19 | +1. **API changes** - Upstream API changed, test expectations no longer match |
| 20 | +2. **Cassette mismatches** - Recorded HTTP interactions don't match current behavior |
| 21 | +3. **Golden file drift** - Output format changed |
| 22 | +4. **Infrastructure issues** - Resource provisioning failures, quotas, timeouts |
| 23 | +5. **Race conditions** - Async operations not completing in expected time |
| 24 | +6. **Dependency changes** - Go module updates breaking compatibility |
| 25 | + |
| 26 | +## Debugging Workflow |
| 27 | + |
| 28 | +### Step 1: Gather Context |
| 29 | + |
| 30 | +When investigating a failing nightly test, first collect the ID of the most recent run using: |
| 31 | + |
| 32 | +gh run list --workflow "Nightly Acceptance Tests" --limit 1 |
| 33 | + |
| 34 | +### Step 2: Read the Failure Logs |
| 35 | + |
| 36 | +Fetch the full logs from the GitHub Actions run: |
| 37 | + |
| 38 | +```bash |
| 39 | +gh run view <RUN_ID> --log | grep FAIL: |
| 40 | +``` |
| 41 | + |
| 42 | +This will give you names of the failed tests. |
| 43 | + |
| 44 | +### Step 3: Locate the Test File |
| 45 | + |
| 46 | +Test files live in: |
| 47 | +``` |
| 48 | +internal/namespaces/<product>/v<version>/<test_file>.go |
| 49 | +``` |
| 50 | + |
| 51 | +Common patterns: |
| 52 | +- `custom_<feature>_test.go` - Feature-specific tests |
| 53 | +- `<product>_test.go` - General tests |
| 54 | + |
| 55 | +Read the failing test function to understand: |
| 56 | +- What the test is trying to do |
| 57 | +- What `BeforeFunc` setup runs |
| 58 | +- What `Check` assertions are made |
| 59 | +- What `AfterFunc` cleanup runs |
| 60 | + |
| 61 | +### Step 4: Categorize the Failure |
| 62 | + |
| 63 | +#### Timeout/Infrastructure Failure |
| 64 | + |
| 65 | +**Symptoms:** |
| 66 | +``` |
| 67 | +context deadline exceeded |
| 68 | +timeout waiting for resource |
| 69 | +``` |
| 70 | + |
| 71 | +**Diagnosis:** |
| 72 | +1. Check if the timeout value is reasonable |
| 73 | +2. Look for resource provisioning delays |
| 74 | +3. Check for quota issues in the nightly environment |
| 75 | + |
| 76 | +**Fix:** |
| 77 | +- Increase timeout in test config |
| 78 | +- Add retry logic in `BeforeFunc` |
| 79 | +- Check nightly credentials have proper quotas |
| 80 | + |
| 81 | +## Resources |
| 82 | + |
| 83 | +- Nightly workflow: `.github/workflows/nightly.yml` |
| 84 | +- Test framework: `core/test_*.go` |
| 85 | +- Example tests: `internal/namespaces/*/v*/*_test.go` |
| 86 | +- Testdata: `internal/namespaces/*/v*/testdata/` |
| 87 | + |
| 88 | +## When to Escalate |
| 89 | + |
| 90 | +Some failures may not be fixable in the test itself: |
| 91 | + |
| 92 | +1. **Upstream API breaking change** - May need SDK update |
| 93 | +2. **Service degradation** - Contact Scaleway platform team |
| 94 | +3. **Credential issues** - Check nightly secrets in GitHub |
| 95 | + |
| 96 | +In these cases, clearly document: |
| 97 | +- What you investigated |
| 98 | +- What you ruled out |
| 99 | +- What needs external action |
0 commit comments