Skip to content

fix(traces): retry Cloudflare blocks instead of aborting identification#14981

Open
0xMars42 wants to merge 1 commit into
foundry-rs:masterfrom
0xMars42:fix/decode-internal-cloudflare-retry
Open

fix(traces): retry Cloudflare blocks instead of aborting identification#14981
0xMars42 wants to merge 1 commit into
foundry-rs:masterfrom
0xMars42:fix/decode-internal-cloudflare-retry

Conversation

@0xMars42
Copy link
Copy Markdown
Contributor

@0xMars42 0xMars42 commented May 30, 2026

Motivation

Addresses #9880. Under load, cast run --decode-internal (and other tracing that
resolves verified sources) identifies contracts only partially and needs several
runs to produce a complete trace, filling in from the on-disk cache between runs.

Solution

The trace ExternalIdentifier fetches contract metadata from Etherscan / Sourcify
through a rate-limit-aware stream (ExternalFetcher); cast run --decode-internal
then compiles those sources into source maps (get_compiled_contracts), so a
contract whose metadata never arrives is never decoded.

RateLimitExceeded is already handled with backoff and re-queue, but a
BlockedByCloudflare was routed into the InvalidApiKey handling (it set the
fetcher's invalid-key flag and returned Poll::Ready(None), carrying the same
// mark key as invalid comment). That classifies a transient block as a
permanent auth failure: it ends the stream and abandons every address still queued
in that pass, and disables the fetcher for the rest of the process. Etherscan sits
behind Cloudflare, which returns this when a request burst trips its protection, so
a single transient block stops identification for every remaining contract. The
on-disk cache is why re-running fills the gaps progressively.

This treats a Cloudflare block as the transient rate limiting it is: back off and
retry the address, bounded by MAX_CLOUDFLARE_RETRIES (5) so a persistent block
can't hang the command in an infinite loop. After the bound is reached the address
is given up individually (yielded as None) instead of aborting the whole stream,
so one blocked contract no longer prevents the rest from being identified.
InvalidApiKey is left unchanged and still aborts, since retrying a bad key is
pointless.

Additional context

Adds a regression test (cloudflare_block_retries_instead_of_abandoning_queue)
with a fetcher that returns a transient Cloudflare block on first contact with each
address and then succeeds. Before the change the stream aborts on the first block
and yields nothing; after it, every address is yielded in a single run. The test is
deterministic and does not hit the network.

The trace ExternalIdentifier fetches contract metadata through a rate-limit
aware stream. RateLimitExceeded is handled with backoff and re-queue, but
BlockedByCloudflare was treated like an invalid API key: it set the invalid-key
flag and returned Poll::Ready(None), ending the stream and abandoning every
address still queued. Those contracts never get sources or source maps, so
cast run --decode-internal decodes selectors only partially and needs several
runs to fill in from the on-disk cache.

A Cloudflare block is transient rate limiting, not a permanent failure. Treat it
like RateLimitExceeded (back off and retry), bounded by MAX_CLOUDFLARE_RETRIES so
a persistent block can't loop forever; after the bound the address is given up
individually instead of aborting the whole stream. InvalidApiKey still aborts.

Adds a regression test that is red without the fix.

Fixes foundry-rs#9880.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant