Skip to content

Commit 36221fa

Browse files
authored
feat: add collect-on-failure to copy files from pods/containers before cleanup (#141)
## Summary - Adds `cleanup.collect` config section to copy files from inside pods (Kind) or containers (Compose) when e2e tests fail, before the environment is destroyed - Collection runs independently of `cleanup.on` — has its own `collect.on` condition (always/failure/never) - Fires on any phase failure (setup/trigger/verify), tolerates unreachable targets with explicit logging - Adds standalone `e2e collect` command for manual debugging with `e2e setup` - Also auto-collects `kubectl describe` (Kind) / `docker inspect` (Compose) alongside files ### Config example ```yaml cleanup: on: always collect: on: failure output-dir: /tmp/e2e-collect items: # Kind - namespace: default label-selector: app=oap paths: - /skywalking/logs/ # Compose - service: oap-service paths: - /skywalking/logs/ ``` ### Output structure Files are organized by full source path to avoid collisions: - Kind: `output-dir/<namespace>/<pod-name>/<source-path>` - Compose: `output-dir/<service-name>/<source-path>`
1 parent 8c21e43 commit 36221fa

21 files changed

Lines changed: 1147 additions & 19 deletions

File tree

.github/workflows/e2e-test.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,14 @@ jobs:
6060
with:
6161
e2e-file: ./test/e2e/e2e.yaml
6262

63-
- uses: engineerd/setup-kind@v0.6.2
63+
- name: Install KinD
64+
uses: helm/kind-action@ef37e7f390d99f746eb8b610417061a60e82a6cc # v1.14.0
6465
with:
6566
version: "v0.27.0"
67+
install_only: true
6668

6769
- name: Run KinD E2E Test
6870
run: make e2e-test-kind
71+
72+
- name: Run KinD Collect-on-Failure E2E Test
73+
run: make e2e-test-kind-collect

CLAUDE.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# CLAUDE.md - Project Guide for skywalking-infra-e2e
2+
3+
## Project Overview
4+
5+
Apache SkyWalking Infra E2E is a CLI tool for end-to-end testing. It orchestrates test environments
6+
(Kubernetes/Kind or Docker Compose), generates traffic, verifies results, and cleans up.
7+
8+
## Build & Test Commands
9+
10+
```bash
11+
make all # clean + lint + test + build
12+
make test # run unit tests with coverage
13+
make lint # run golangci-lint (auto-installs if missing)
14+
make build # build for windows/linux/darwin
15+
make darwin # build for macOS only (use your current OS target)
16+
make e2e-test # run e2e test with Docker Compose (test/e2e/e2e.yaml)
17+
make e2e-test-kind # run e2e test with Kind (test/e2e/kind/e2e.yaml)
18+
```
19+
20+
- Go module: `github.com/apache/skywalking-infra-e2e`
21+
- Entry point: `cmd/e2e/main.go`
22+
- Binary output: `bin/<os>/e2e`
23+
- Version injected via ldflags at build time
24+
25+
## Architecture
26+
27+
### CLI Commands (Cobra)
28+
29+
| Command | File | Purpose |
30+
|---------------|-------------------------------|--------------------------------|
31+
| `e2e run` | `commands/run/run.go` | Full lifecycle orchestration |
32+
| `e2e setup` | `commands/setup/setup.go` | Setup env only (debug mode) |
33+
| `e2e trigger` | `commands/trigger/trigger.go` | Run trigger only |
34+
| `e2e verify` | `commands/verify/verify.go` | Run verification only |
35+
| `e2e cleanup` | `commands/cleanup/cleanup.go` | Run cleanup only |
36+
37+
Global flags defined in `commands/root.go`:
38+
- `-c, --config` (default: `e2e.yaml`)
39+
- `-v, --verbosity` (debug/info/warn/error)
40+
- `-w, --work-dir` (default: `~/.skywalking-infra-e2e`)
41+
- `-l, --log-dir` (default: `~/.skywalking-infra-e2e/logs`)
42+
43+
### Lifecycle (`e2e run`)
44+
45+
```
46+
setup → trigger → verify → cleanup (deferred)
47+
```
48+
49+
Cleanup runs via Go `defer` and is controlled by `cleanup.on`:
50+
- `always` / `success` / `failure` / `never`
51+
- Default: `success` locally, `always` in CI (`CI=true` env var)
52+
- Constants in `internal/constant/cleanup.go`
53+
54+
### Environment Modes
55+
56+
Determined by `setup.env` in e2e.yaml (`"kind"` or `"compose"`).
57+
58+
Constants: `constant.Kind` and `constant.Compose` in `internal/constant/`.
59+
60+
**Kind mode** (`internal/components/setup/kind.go`):
61+
- Creates Kind cluster, loads Docker images, applies K8s manifests
62+
- Pod log streaming via K8s client-go
63+
- Port forwarding via SPDY
64+
- Cleanup: `kind delete cluster` with retry (up to 5x)
65+
66+
**Compose mode** (`internal/components/setup/compose.go`):
67+
- Uses testcontainers-go for Docker Compose
68+
- Container log streaming
69+
- Cleanup: `docker-compose down`
70+
71+
### Configuration
72+
73+
Config struct: `internal/config/e2eConfig.go``E2EConfig`
74+
75+
```yaml
76+
setup:
77+
env: kind|compose
78+
file: path/to/kind-config.yaml # or docker-compose.yml
79+
kubeconfig: path # alternative to file (use existing cluster)
80+
timeout: 20m
81+
steps:
82+
- name: step-name
83+
path: manifest.yaml # or command: "shell cmd"
84+
wait:
85+
- namespace: default
86+
resource: pod
87+
label-selector: app=foo
88+
for: condition=Ready
89+
kind:
90+
import-images: [image:tag]
91+
expose-ports:
92+
- namespace: default
93+
resource: pod/name
94+
port: "8080"
95+
96+
cleanup:
97+
on: always|success|failure|never
98+
99+
trigger:
100+
action: http
101+
interval: 3s
102+
times: 5
103+
url: http://...
104+
method: GET
105+
106+
verify:
107+
retry: { count: 10, interval: 10s }
108+
fail-fast: true
109+
concurrency: false
110+
cases:
111+
- name: case-name
112+
query: "shell command" # or actual: path/to/file
113+
expected: path/to/expected.yaml
114+
```
115+
116+
### Key Packages
117+
118+
| Package | Role |
119+
|--------------------------------------|-------------------------------------------|
120+
| `internal/config/` | YAML config parsing, global config state |
121+
| `internal/components/setup/` | Kind & Compose setup implementations |
122+
| `internal/components/trigger/` | HTTP trigger action |
123+
| `internal/components/verifier/` | Test case verification with retry |
124+
| `internal/components/cleanup/` | Kind & Compose cleanup implementations |
125+
| `internal/util/` | K8s client, Docker helpers, env/log utils |
126+
| `internal/constant/` | Constants for both modes and cleanup |
127+
| `internal/logger/` | Logrus-based logging |
128+
| `pkg/output/` | Test result formatting (YAML/summary) |
129+
| `third-party/go/template/` | Extended Go template functions for verify |
130+
131+
### Test Structure
132+
133+
**Unit tests** (6 files):
134+
- `internal/config/e2eConfig_test.go`
135+
- `internal/util/config_test.go`, `utils_test.go`
136+
- `commands/verify/verify_test.go`
137+
- `internal/components/verifier/verifier_test.go`
138+
- `third-party/go/template/funcs_test.go`
139+
140+
**E2E tests** (`test/e2e/`):
141+
- `e2e.yaml` — Compose-based e2e test
142+
- `kind/e2e.yaml` — Kind-based e2e test
143+
- Verify scenarios under `concurrency/` and `non-concurrency/` dirs
144+
145+
### Log Collection
146+
147+
- Logs streamed during setup to `LogDir/namespace/podName.log` (Kind) or `LogDir/serviceName/std.log` (Compose)
148+
- `internal/util/env_log.go` — `ResourceLogFollower` manages log writers
149+
- GitHub Actions: log dir defaults to `${runner.temp}/skywalking-infra-e2e/logs`
150+
- **No existing mechanism to copy arbitrary files from containers on failure**
151+
152+
### GitHub Actions Integration
153+
154+
- `action.yaml` at project root defines the composite action
155+
- Inputs: e2e-file, log-dir, plus matrix vars for log isolation

Makefile

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,3 +118,19 @@ e2e-test-kind:
118118
$(MAKE) $(GOOS)
119119
docker pull busybox:latest
120120
./bin/$(GOOS)/$(PROJECT) run -c ./test/e2e/kind/e2e.yaml
121+
122+
.PHONY: e2e-test-kind-collect
123+
# Run E2E test with KinD to verify collect-on-failure functionality
124+
# This test intentionally fails verification to trigger file collection.
125+
# It succeeds if the collected files exist at the expected output directory.
126+
e2e-test-kind-collect:
127+
$(MAKE) $(GOOS)
128+
docker pull busybox:latest
129+
rm -rf /tmp/e2e-collect-test
130+
./bin/$(GOOS)/$(PROJECT) run -c ./test/e2e/kind/collect-on-failure/e2e.yaml || true
131+
@echo "Verifying collected files..."
132+
@test -d /tmp/e2e-collect-test/default && echo "PASS: output directory created" || (echo "FAIL: output directory not found" && exit 1)
133+
@ls /tmp/e2e-collect-test/default/collect-test/describe.txt > /dev/null 2>&1 && echo "PASS: describe.txt collected" || (echo "FAIL: describe.txt not found" && exit 1)
134+
@ls /tmp/e2e-collect-test/default/collect-test/test-data/logs > /dev/null 2>&1 && echo "PASS: logs/ collected" || (echo "FAIL: logs/ not found" && exit 1)
135+
@ls /tmp/e2e-collect-test/default/collect-test/test-data/debug.txt > /dev/null 2>&1 && echo "PASS: debug.txt collected" || (echo "FAIL: debug.txt not found" && exit 1)
136+
@echo "All collect-on-failure checks passed."

commands/collect/collect.go

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// Licensed to Apache Software Foundation (ASF) under one or more contributor
2+
// license agreements. See the NOTICE file distributed with
3+
// this work for additional information regarding copyright
4+
// ownership. Apache Software Foundation (ASF) licenses this file to you under
5+
// the Apache License, Version 2.0 (the "License"); you may
6+
// not use this file except in compliance with the License.
7+
// You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
package collect
19+
20+
import (
21+
"fmt"
22+
23+
"github.com/spf13/cobra"
24+
25+
"github.com/apache/skywalking-infra-e2e/internal/components/collector"
26+
"github.com/apache/skywalking-infra-e2e/internal/config"
27+
)
28+
29+
var Collect = &cobra.Command{
30+
Use: "collect",
31+
Short: "Collect files from pods/containers for debugging",
32+
RunE: func(cmd *cobra.Command, args []string) error {
33+
if config.GlobalConfig.Error != nil {
34+
return config.GlobalConfig.Error
35+
}
36+
err := collector.DoCollect(&config.GlobalConfig.E2EConfig)
37+
if err != nil {
38+
return fmt.Errorf("[Collect] %s", err)
39+
}
40+
return nil
41+
},
42+
}

commands/root.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ import (
2424
"github.com/spf13/cobra"
2525

2626
"github.com/apache/skywalking-infra-e2e/commands/cleanup"
27+
"github.com/apache/skywalking-infra-e2e/commands/collect"
2728
"github.com/apache/skywalking-infra-e2e/commands/run"
2829
"github.com/apache/skywalking-infra-e2e/commands/setup"
2930
"github.com/apache/skywalking-infra-e2e/commands/trigger"
@@ -66,6 +67,10 @@ var Root = &cobra.Command{
6667
return err
6768
}
6869

70+
// Finalize collect config after LogDir is expanded, so any paths that
71+
// depend on the log directory see the fully resolved value.
72+
config.GlobalConfig.E2EConfig.Cleanup.Collect.Finalize()
73+
6974
return nil
7075
},
7176
}
@@ -88,6 +93,7 @@ func Execute() error {
8893
Root.AddCommand(trigger.Trigger)
8994
Root.AddCommand(verify.Verify)
9095
Root.AddCommand(cleanup.Cleanup)
96+
Root.AddCommand(collect.Collect)
9197

9298
Root.PersistentFlags().StringVarP(&verbosity, "verbosity", "v", logrus.InfoLevel.String(), "log level (debug, info, warn, error, fatal, panic")
9399
Root.PersistentFlags().StringVarP(&util.WorkDir, "work-dir", "w", "~/.skywalking-infra-e2e", "the working directory for skywalking-infra-e2e")

commands/run/run.go

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import (
2222
"github.com/apache/skywalking-infra-e2e/commands/setup"
2323
"github.com/apache/skywalking-infra-e2e/commands/trigger"
2424
"github.com/apache/skywalking-infra-e2e/commands/verify"
25+
"github.com/apache/skywalking-infra-e2e/internal/components/collector"
2526
t "github.com/apache/skywalking-infra-e2e/internal/components/trigger"
2627
"github.com/apache/skywalking-infra-e2e/internal/config"
2728
"github.com/apache/skywalking-infra-e2e/internal/constant"
@@ -43,7 +44,7 @@ var Run = &cobra.Command{
4344
},
4445
}
4546

46-
func runAccordingE2E() error {
47+
func runAccordingE2E() (err error) {
4748
if config.GlobalConfig.Error != nil {
4849
return config.GlobalConfig.Error
4950
}
@@ -54,14 +55,24 @@ func runAccordingE2E() error {
5455
action.Stop()
5556
}
5657
}
58+
5759
// If cleanup.on == Always and there is error in setup step, we should defer cleanup step right now.
5860
cleanupOnCondition := config.GlobalConfig.E2EConfig.Cleanup.On
5961
if cleanupOnCondition == constant.CleanUpAlways {
6062
defer doCleanup(stopAction)
6163
}
6264

65+
// Collection runs independently of cleanup — it has its own collect.on condition.
66+
// Registered before setup so it fires even on partial setup failures.
67+
// Registered AFTER the Always-cleanup defer so it runs BEFORE cleanup (LIFO),
68+
// ensuring files are collected while pods/containers are still alive.
69+
// Errors are tolerated — partial setup may leave some pods unreachable.
70+
defer func() {
71+
doCollect(err)
72+
}()
73+
6374
// setup part
64-
err := setup.DoSetupAccordingE2E()
75+
err = setup.DoSetupAccordingE2E()
6576
if err != nil {
6677
return err
6778
}
@@ -106,6 +117,22 @@ func runAccordingE2E() error {
106117
return nil
107118
}
108119

120+
// doCollect runs file collection independently of cleanup.
121+
// It evaluates collect.on against the run error to decide whether to collect.
122+
func doCollect(runErr error) {
123+
collectCfg := config.GlobalConfig.E2EConfig.Cleanup.Collect
124+
shouldCollect := (collectCfg.On == constant.CollectAlways) ||
125+
(collectCfg.On == constant.CollectOnFailure && runErr != nil)
126+
if !shouldCollect || len(collectCfg.Items) == 0 {
127+
return
128+
}
129+
if err := collector.DoCollect(&config.GlobalConfig.E2EConfig); err != nil {
130+
logger.Log.Warnf("collect files error: %s", err)
131+
} else {
132+
logger.Log.Infof("collect files finished successfully")
133+
}
134+
}
135+
109136
func doCleanup(stopAction func()) {
110137
if stopAction != nil {
111138
stopAction()

docs/en/setup/Configuration-File.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,13 +261,48 @@ After the E2E finished, how to clean up the environment.
261261
```yaml
262262
cleanup:
263263
on: always # Clean up strategy
264+
collect: # Collect files from pods/containers for debugging
265+
on: failure # When to collect: always|failure|never, default: failure
266+
output-dir: /tmp/collect # Where to save the collected files (required)
267+
items:
268+
# For Kind environment
269+
- namespace: default # Pod namespace
270+
label-selector: app=oap # Label selector to find pods
271+
resource: pod/oap-0 # Specific pod resource (optional, instead of label-selector)
272+
container: oap # Container name (optional, defaults to first container)
273+
paths: # Paths in the container to collect
274+
- /skywalking/logs/
275+
# For Compose environment
276+
- service: ui-service # Compose service name
277+
paths:
278+
- /var/log/nginx/
264279
```
265280

266281
If the `on` option under `cleanup` is not set, it will be automatically set to `always` if there is environment
267282
variable `CI=true`, which is present on many popular CI services, such as GitHub Actions, CircleCI, etc., otherwise it
268283
will be set to `success`, so the testing environment can be preserved when tests failed in your local machine.
269284

270-
All available strategies:
285+
### Collect
286+
287+
The `collect` section allows you to copy files from containers to the local machine before the environment is destroyed. This is useful for debugging failures in CI.
288+
289+
* `on`: When to trigger collection.
290+
* `always`: Always collect.
291+
* `failure`: Only collect when any step (setup, trigger, verify) fails. Collection also attempts on setup failures — unreachable pods/containers are tolerated and logged.
292+
* `never`: Never collect.
293+
* `output-dir`: **Required.** The local directory to save files. Supports environment variable expansion (e.g. `$SW_INFRA_E2E_LOG_DIR/collect`).
294+
* `items`: A list of collection tasks.
295+
* For **Kind**: Specify `namespace` and either `label-selector` or `resource`. `container` is optional.
296+
* For **Compose**: Specify `service`.
297+
* `paths`: A list of file or directory paths inside the container.
298+
299+
Collected files are organized by the full source path to avoid collisions:
300+
* Kind: `output-dir/<namespace>/<pod-name>/<source-path>`
301+
* Compose: `output-dir/<service-name>/<source-path>`
302+
303+
Additionally, `kubectl describe` (for Kind) or `docker inspect` (for Compose) output is saved automatically alongside collected files.
304+
305+
All available strategies for `cleanup.on`:
271306
1. `always`: No matter the execution result is success or failure, cleanup will be performed.
272307
1. `success`: Only when the execution succeeds.
273308
1. `failure`: Only when the execution failed.

0 commit comments

Comments
 (0)