Skip to content

Commit f615443

Browse files
authored
ops(ci): upload full demo cluster log on Jepsen failure (#795)
## Summary Make the Jepsen workflows upload the full demo cluster log as an artifact (14-day retention) on failure, instead of only the last 500 lines inline. ## Motivation Scheduled run [26198185540](https://github.com/bootjp/elastickv/actions/runs/26198185540) surfaced a real `:duplicate-elements` anomaly on the Redis Jepsen workload. Server-side investigation was blocked because the "Dump demo cluster log on failure" step's `tail -n 500` captured only the startup section of a 3-minute workload — the actual start_ts / commit_ts / raft-term / write-conflict / lock-resolver events that would identify the offending code path were truncated. ## Change - **Scheduled workflow** (`jepsen-test-scheduled.yml`): inline dump now prints `head -n 200 + tail -n 1000` so the GH UI still shows enough at-a-glance; full log uploaded as `elastickv-demo-log` artifact. - **Per-push workflow** (`jepsen-test.yml`): inline dump prints `head -n 200 + tail -n 500` per node (n1/n2/n3); full per-node logs uploaded as `elastickv-demo-logs` artifact (three files). - Retention: 14 days, matching the existing `jepsen-store-types` artifact retention so logs and `history.txt` can be correlated for the same run. ## Test plan - [x] yaml syntactically valid (workflow change only) - [ ] CI green - [ ] Next failing scheduled run produces a downloadable `elastickv-demo-log` artifact
2 parents fe96c88 + 4986779 commit f615443

2 files changed

Lines changed: 49 additions & 1 deletion

File tree

.github/workflows/jepsen-test-scheduled.yml

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,28 @@ jobs:
229229
--host 127.0.0.1
230230
- name: Dump demo cluster log on failure
231231
if: failure()
232-
run: tail -n 500 /tmp/elastickv-demo.log || true
232+
# The previous `tail -n 500` truncated a 3-minute workload's
233+
# log down to startup-only lines, making it impossible to
234+
# correlate a Jepsen anomaly with the server-side state
235+
# (start_ts, commit_ts, raft term, write conflicts, lock-
236+
# resolver events). Print head + tail inline so the GH UI
237+
# still shows the most recent activity at-a-glance, then
238+
# upload the full log as an artifact for offline analysis.
239+
run: |
240+
echo "=== first 200 lines (startup) ==="
241+
head -n 200 /tmp/elastickv-demo.log || true
242+
echo "=== last 1000 lines (most recent activity) ==="
243+
tail -n 1000 /tmp/elastickv-demo.log || true
244+
echo "=== full log line count ==="
245+
wc -l /tmp/elastickv-demo.log || true
246+
- name: Upload demo cluster log on failure
247+
if: failure()
248+
uses: actions/upload-artifact@v4
249+
with:
250+
name: elastickv-demo-log
251+
path: /tmp/elastickv-demo.log
252+
retention-days: 14
253+
if-no-files-found: warn
233254
- name: Stop demo cluster
234255
if: always()
235256
run: |

.github/workflows/jepsen-test.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,33 @@ jobs:
194194
--partition-count 4 --group-count 6 \
195195
--drain-time 15 \
196196
--sqs-ports 63501,63502,63503 --host 127.0.0.1
197+
- name: Dump demo cluster logs on failure
198+
if: failure()
199+
# Same rationale as jepsen-test-scheduled.yml: inline summary
200+
# for the GH UI + full per-node logs uploaded as artifact so
201+
# any Jepsen anomaly can be traced to the server-side ts/raft
202+
# state without re-running the workflow.
203+
run: |
204+
for node in n1 n2 n3; do
205+
log=/tmp/elastickv-demo-${node}.log
206+
echo "=== ${node}: first 200 lines (startup) ==="
207+
head -n 200 "$log" || true
208+
echo "=== ${node}: last 500 lines (most recent activity) ==="
209+
tail -n 500 "$log" || true
210+
echo "=== ${node}: full log line count ==="
211+
wc -l "$log" || true
212+
done
213+
- name: Upload demo cluster logs on failure
214+
if: failure()
215+
uses: actions/upload-artifact@v4
216+
with:
217+
name: elastickv-demo-logs
218+
path: |
219+
/tmp/elastickv-demo-n1.log
220+
/tmp/elastickv-demo-n2.log
221+
/tmp/elastickv-demo-n3.log
222+
retention-days: 14
223+
if-no-files-found: warn
197224
- name: Stop demo cluster
198225
if: always()
199226
run: |

0 commit comments

Comments
 (0)