fix(harness): make pfn_chain_stress robust on cross-FS / flaky-net hosts#720
Open
keanji-x wants to merge 1 commit into
Open
fix(harness): make pfn_chain_stress robust on cross-FS / flaky-net hosts#720keanji-x wants to merge 1 commit into
keanji-x wants to merge 1 commit into
Conversation
Three small defensive fixes hit while reproducing a mempool issue with
the pfn_chain_stress harness on a host where /tmp is tmpfs and the repo
lives on a separate device, and where GitHub fetches occasionally hiccup.
1. cluster/deploy.sh: gravity_node binary is hardlinked into each node's
/tmp/gravity-cluster-pfn-*/<node>/bin/. When /tmp and target/ are on
different filesystems the `ln -f` fails with "Invalid cross-device
link" and tears down the whole setup. Fall back to `cp -f` — matches
what the gravity_cli path already does a few lines up.
2. cluster/genesis.sh: every run does `git fetch origin` + `git pull
origin <ref>` on the external contracts repo. A transient network
error there kills the whole stress run. Demote both to warnings —
the local working copy is usually already at the right ref and a
stale local copy is strictly better than a hard failure for this
workflow.
3. regression/pfn_chain_stress/run.sh: the wait-for-chain loop does
`bn=$(curl ... | sed ...)` under `set -euo pipefail`. The first probe
typically lands while node1's RPC is still starting; curl exits 7,
pipefail propagates it through the command substitution, and set -e
kills the script before the chain is even up. Wrap curl in `{ ...
|| true; }` so the wait loop actually waits.
None of these change runtime behavior of the binary under test — they
only affect the harness's resilience on developer-machine variations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three small defensive fixes hit while reproducing a mempool issue with the pfn_chain_stress harness on a host where
/tmpis tmpfs and the repo lives on a separate device, and where GitHub fetches occasionally hiccup. None of these change runtime behavior of the binary under test — they only affect the harness's resilience on developer-machine variations.1.
cluster/deploy.sh— cross-device hardlinkgravity_nodeis hardlinked into each node's/tmp/gravity-cluster-pfn-*/<node>/bin/. When/tmpandtarget/are on different filesystems theln -ffails with "Invalid cross-device link" and tears down the whole setup.Fall back to
cp -f— matches what thegravity_clipath a few lines up already does (and the inline comment claims).2.
cluster/genesis.sh— flaky-network git fetchEvery run does
git fetch origin+git pull origin <ref>onexternal/gravity_chain_core_contracts. A transient network error there kills the whole stress run.Demote both to warnings — the local working copy is usually already at the right ref, and a stale local copy is strictly better than a hard failure for this workflow.
3.
regression/pfn_chain_stress/run.sh— wait-for-chain set -e brittlenessThe wait-for-chain loop does
bn=$(curl ... | sed ...)underset -euo pipefail. The first probe typically lands while node1's RPC is still starting; curl exits 7,pipefailpropagates it through the command substitution, andset -ekills the script before the chain is even up.Wrap curl in
{ ... || true; }so the wait loop actually waits.Test plan
./run.sh pfn3 --cleanend-to-end on a host where all three failure modes were hit; harness now completes cleanly through bench + TPS analysis.🤖 Generated with Claude Code