Skip to content

Expose per-upstream client timeouts and retries in ClientConfig#203

Merged
xlc merged 3 commits into
AcalaNetwork:masterfrom
rockbmb:expose-client-timeouts
May 20, 2026
Merged

Expose per-upstream client timeouts and retries in ClientConfig#203
xlc merged 3 commits into
AcalaNetwork:masterfrom
rockbmb:expose-client-timeouts

Conversation

@rockbmb
Copy link
Copy Markdown
Contributor

@rockbmb rockbmb commented May 20, 2026

Motivation

Client::new already accepts request_timeout, connection_timeout, and retries arguments, but from_config hardcodes all three to None because ClientConfig only exposes endpoints and shuffle_endpoints. As a result the only way to override the 30s per-upstream request timeout (and the 30s connection timeout, and the default retry count) is to construct Client directly in Rust, which isn't reachable from the YAML-driven config that downstream Subway deployments use.

The motivating case is heavy storage queries against slow public RPCs. When chopsticks issues a chain_getBlockHash (or similar) to Subway, the upstream chain doesn't always respond inside 30s. Subway cycles to the next endpoint, waits another 30s, and so on, never serving a response to chopsticks; the call eventually times out higher up the stack. Increasing chopsticks' own rpc-timeout doesn't help because Subway never gets past the 30s/endpoint cycle. This is what surfaced in polkadot-fellows/runtimes#1180 / open-web3-stack/polkadot-ecosystem-tests#621 (Acala under sharded-CI load).

Change

Adds three optional fields to ClientConfig:

  • request_timeout_seconds
  • connection_timeout_seconds
  • retries

from_config plumbs them into Client::new. None of the existing defaults change when the fields are omitted (30s request timeout, 30s connection timeout, default retry count). The three test-internal ClientConfig literals are updated to set the new fields to None for parity with the previous behaviour.

Trivial diff, four files, no behavioural change in the absence of new YAML entries.

rockbmb and others added 3 commits May 20, 2026 02:06
`Client::new` already accepts `request_timeout`, `connection_timeout`,
and `retries` arguments, but `from_config` hardcodes all three to
`None` because `ClientConfig` only exposes `endpoints` and
`shuffle_endpoints`. As a result the only way to override the 30s
per-upstream request timeout (and the 30s connection timeout, and the
default retry count) is to construct `Client` directly in Rust, which
isn't reachable from the YAML-driven config.

Adds three optional fields to `ClientConfig`:

  - `request_timeout_seconds`
  - `connection_timeout_seconds`
  - `retries`

`from_config` plumbs them into `Client::new`. None of the existing
defaults change when the fields are omitted.

The motivating case is heavy storage queries against slow public RPCs
(Acala under load is the case that surfaced this in
`polkadot-fellows/runtimes#1180` /
`open-web3-stack/polkadot-ecosystem-tests#621`) where 30s per upstream
is not enough and Subway exhausts its endpoint cycle without serving
a response.
@xlc xlc merged commit 0154bec into AcalaNetwork:master May 20, 2026
2 checks passed
@rockbmb rockbmb deleted the expose-client-timeouts branch May 20, 2026 11:27
rockbmb added a commit to open-web3-stack/polkadot-ecosystem-tests that referenced this pull request May 20, 2026
Subway's default per-upstream request timeout is 30s. With three Acala
public RPC endpoints, heavy storage queries that take longer than 30s
cause Subway to cycle through all three endpoints (~90s) before any
single upstream has a chance to respond, and the test-side waiting
client times out.

`request_timeout_seconds` was added to `ClientConfig` in
AcalaNetwork/subway#203 (Subway v0.1.1+). Setting it to 90 lets a
single upstream attempt run long enough to complete those queries
instead of being preempted by Subway's own per-endpoint clock.

The companion exclusion of Acala tests in `vitest.config.mts` is
intentionally left in place; this commit only restores Subway's
ability to wait long enough. Lifting the exclusion is a separate
verification step.
rockbmb added a commit to open-web3-stack/polkadot-ecosystem-tests that referenced this pull request May 20, 2026
…pstream timeout (#622)

* Install Subway from upstream `v0.1.0` musl release in `ci.yml`

Switches `cargo install --git` to a `curl | tar -xz` of the
released static binary
(https://github.com/AcalaNetwork/subway/releases/tag/v0.1.0,
published by AcalaNetwork/subway#202). Removes the Rust
toolchain install, Subway-HEAD commit-hash lookup, and
Swatinem cache layer that existed only to amortise the
`cargo install` cost — none of them have any other
consumer in this workflow.

* Install Subway from upstream `v0.1.0` musl release in `update-known-good.yml`

Same swap as the previous commit, applied to the periodic block-number
update workflow.

* Install Subway from upstream `v0.1.0` musl release in `update-snapshot.yml`

Same swap as the previous two commits, applied to the snapshot-update
workflow.

* Fail Subway download fast on HTTP errors (`curl -f`)

Without `-f`, an HTTP 4xx/5xx response (e.g. release deleted, GitHub
degraded) leaves `curl` exiting zero with the error body on stdout,
and the downstream `tar -xz` fails with a confusing "not in gzip
format" message instead. Per review on PR #622.

* Install Subway by extracting binary from `acala/subway:v0.1.1` Docker image

The `v0.1.1` GitHub Release at AcalaNetwork/subway is missing its
`x86_64-unknown-linux-musl.tar.gz` asset; the release workflow's
`Build release binary` step failed (`cargo build --locked` mismatched
the bumped `Cargo.toml` version), so the upload was skipped. The
upstream tag still produces a working Docker image because
`docker.yml` doesn't use `--locked`, so `acala/subway:v0.1.1` is
the only working consumption path for v0.1.1.

The image's binary lives at `/usr/local/bin/subway` (per Subway's
Dockerfile); copying it out with `docker create` + `docker cp` lands
in roughly the same wall time as the curl-and-untar path and unblocks
consumption of PR #203's `request_timeout_seconds` config field.

* Set Subway per-upstream `request_timeout_seconds` to 90s

Subway's default per-upstream request timeout is 30s. With three Acala
public RPC endpoints, heavy storage queries that take longer than 30s
cause Subway to cycle through all three endpoints (~90s) before any
single upstream has a chance to respond, and the test-side waiting
client times out.

`request_timeout_seconds` was added to `ClientConfig` in
AcalaNetwork/subway#203 (Subway v0.1.1+). Setting it to 90 lets a
single upstream attempt run long enough to complete those queries
instead of being preempted by Subway's own per-endpoint clock.

The companion exclusion of Acala tests in `vitest.config.mts` is
intentionally left in place; this commit only restores Subway's
ability to wait long enough. Lifting the exclusion is a separate
verification step.

* Re-enable Acala test suites

`request_timeout_seconds: 90` on Subway's upstream client (added to
`subway-template.yml` in the previous commit) gives Subway enough
time per upstream attempt for Acala storage queries to land before
the 30s default forced it to cycle endpoints. The exclusion added in
PR #621 is no longer needed and is removed; the exclusion comment is
narrowed to bifrostKusama, which still lacks a workable endpoint set.
rockbmb added a commit to rockbmb/runtimes that referenced this pull request May 20, 2026
… image

The `v0.1.1` GitHub Release artifact is missing because the release
workflow's `Build release binary` step failed against a stale
`Cargo.lock`; the upload step was skipped. The Docker image build at
the same tag succeeded (it doesn't use `--locked`), so
`acala/subway:v0.1.1` is the only working consumption path for the
release that includes AcalaNetwork/subway#203's new
`request_timeout_seconds` field, which the next commit relies on.

Mirrors the equivalent change in
open-web3-stack/polkadot-ecosystem-tests#622.
rockbmb added a commit to rockbmb/runtimes that referenced this pull request May 20, 2026
Subway's default per-upstream request timeout is 30s. With three
Acala public RPC endpoints, heavy storage queries that take longer
than 30s cause Subway to cycle through all three (~90s) before any
single upstream has time to respond, and the chopsticks-side client
times out.

`request_timeout_seconds` is the new field added in
AcalaNetwork/subway#203 (Subway v0.1.1+, installed in the previous
commit). Setting it to 90 lets a single upstream attempt run long
enough to complete those queries instead of being preempted by
Subway's per-endpoint clock.

Mirrors the equivalent change in
open-web3-stack/polkadot-ecosystem-tests#622.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants