snapshot: --probe flag + weekly CI cron#182
Open
barbatos2011 wants to merge 2 commits into
Open
Conversation
…follow-up) Adds the feedback loop that would have caught tronprotocol#161 before a user did. The Nile S3 URL going stale was invisible from the codebase side -- nothing exercised the published URLs after a hardcoded edit, so the table aged silently until a human tried to download. Three pieces: internal/snapshot/probe.go Probe(ctx, src, opts) HEAD-checks the actual published tarball URL for a source. For 'date'-strategy mirrors (nile) it walks the generated date list newest-to-oldest; for 'html' mirrors it scrapes the index, same as Download does. First 200 wins and is classified ok / stale based on its age. We do NOT trust the existing LatestBackup helper here -- that one returns the topmost candidate without HEAD-checking, which would tell us nothing. ProbeAll concurrently probes a []Source preserving input order. cmd/snapshot/sources.go Existing 'trond snapshot sources' grows three flags: --probe, --probe-timeout, --stale-after. Probe path returns a non-nil error on any not-OK source so the CLI exits 1; cleaner for shell-pipe consumption than os.Exit() inside a Cobra RunE. JSON output still prints the full report before failing. .github/workflows/snapshot-sources-probe.yml Weekly cron (Mon 09:00 UTC) + workflow_dispatch. Builds trond, runs the probe in JSON mode, and on failure opens (or comments on the existing) a rolling 'snapshot-probe-stale'-labelled issue. Auto-closes when sources recover. Probe artifact uploaded for the 30-day retention so we can diff probe runs week-on-week. Smoke-tested locally against the live SourceTable on this branch (which still has the broken Nile S3 URL from before tronprotocol#161 lands): $ trond snapshot sources --probe STATUS NETWORK KIND DOMAIN LATEST AGE ok mainnet lite 34.143.247.77 backup20260522 3d ok mainnet full 34.143.247.77 backup20260522 3d ok mainnet full 35.247.128.170 backup20260522 3d ok mainnet full 34.86.86.229 backup20260523 2d ok mainnet full 34.48.6.163 backup20260520 5d ok mainnet full 35.197.17.205 backup20260522 3d unreachable nile lite database.nileex.io - - summary: ok=6 stale=0 unreachable=1 no_backups=0 bad_config=0 exit code: 1 Useful side finding: all five mainnet IPs from the 2025-Q1 entry are still healthy a year later, so the file-header staleness worry was over-cautious. The Nile failure is real and lines up exactly with tronprotocol#161. Branch is intentionally off develop so this can land independent of the shadow-fork PR. When tronprotocol#161's Nile-URL fix merges, the probe will go fully green on the next cron tick (or workflow_dispatch manually for an immediate verify).
Two CI failures on PR tronprotocol#182 after first push, both in code I added: 1. gofmt — probe.go's const block + struct field alignment differed from gofmt's column choice (one extra space on a comment). gofmt -w reflowed; no behavioural change. 2. unparam — buildProbeMirror returned (*httptest.Server, Source) but every caller used '_, src := ...'; the server itself is cleaned up via t.Cleanup(srv.Close) inside the helper, so callers never need the handle. Dropped the first return; updated all five callers from '_, src := ...' to 'src := ...'. Not fixed in this commit (pre-existing on develop, not introduced by this PR): - Vulnerability scan reports findings on internal/target/ssh.go's calls into golang.org/x/crypto/ssh. The vulnerable paths were committed before this branch was cut.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
trond snapshot sources --probesubcommand: HEAD-checks every upstream mirror, classifies each as ok / stale / unreachable / no_backups / bad_config, exits non-zero on any failure.Probe(ctx, src, opts)+ProbeAllininternal/snapshot/. Fordate-strategy mirrors (nile) walks the generated date list newest-to-oldest, stopping at the first HTTP 200; forhtmlmirrors scrapes the index. Distinguishes "no recent backup" from "endpoint vanished"..github/workflows/snapshot-sources-probe.yml): Mon 09:00 UTC + manual dispatch. On failure opens a rollingsnapshot-probe-staleissue (one issue, comments for subsequent failures), auto-closes when sources recover. Probe artifact uploaded for 30-day retention.Why
Task #161 (Nile S3 mirror → 403) was the visible symptom of a deeper gap: the
SourceTableis a hardcoded list with no feedback loop. The structural fix is a cron-driven probe so the next URL rotation surfaces in CI within a week instead of in a user bug report.Smoke test (live SourceTable, this branch)
(Nile is currently flagged because this branch is off
develop, predating the #161 URL fix on the shadowfork-phase1 PR.)Test plan
httptestcover ok / stale / unreachable / bad_config + ProbeAll order preservation + age parser for both date formats.🤖 Generated with Claude Code