Skip to content

Migrate query-service availability endpoints#4270

Open
ss-es wants to merge 52 commits into
mainfrom
ss/axum-4
Open

Migrate query-service availability endpoints#4270
ss-es wants to merge 52 commits into
mainfrom
ss/axum-4

Conversation

@ss-es
Copy link
Copy Markdown
Contributor

@ss-es ss-es commented May 7, 2026

Migrates remaining query-service availability endpoints to axum.

Notably, this PR does not include new v2 endpoints for any of these. I will add those in a separate PR.

There are tests in crates/espresso/node/src/api.rs that should ensure parity between the axum and tide APIs for all endpoints.

axum_msgs.len(),
);
Ok(())
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit hard to test streaming websocket endpoints for parity conceptually, since there may be subtle timing differences between them (neither returns deterministic data)

to make this test a bit more resilient, we wait for 10 messages from each and just check that the streams had at least 2 in common

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Unable to build without Cargo.lock file.

This means that after this change 3rd party projects may have
difficulties using crates in this repo as a dependency. If this
isn't easy to fix please open an issue so we can fix it later.

For the first failing build see: https://github.com/EspressoSystems/espresso-network/actions/runs/25563643402

To reproduce locally run

cargo generate-lockfile
cargo check --all-targets

This PR can still be merged.

Comment on lines +7714 to +7723
std::thread::Builder::new()
.stack_size(32 * 1024 * 1024)
.spawn(move || {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async move {
const EPOCH_HEIGHT: u64 = 10;
const NUM_NODES: usize = 5;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took a lot of experimentation to get this test to not hit a stack overflow. to me this seems innocuous so I didn't try to dig much further to figure out exactly where the line was

@ss-es ss-es marked this pull request as ready for review May 11, 2026 05:49
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Claude finished @ss-es's task in 3m 42s —— View job


Code Review: Migrate query-service availability endpoints

  • Read repository guidelines (CLAUDE.md)
  • Read existing review comments
  • Analyze PR diff and changed files
  • Review code quality, security, performance, and testing
  • Post detailed review feedback

Summary

This PR migrates the remaining query-service availability endpoints from tide-disco to Axum, adding a new HotShotAvailabilityApi trait with ~30 endpoints (leaves, headers, blocks, payloads, VID common, transactions, block summaries, limits, cert2) plus 8 WebSocket streaming endpoints. The trait design, handler wiring, and parity tests are solid overall.

Key Finding

Missing range limit enforcement on HotShotAvailabilityApi range endpoints — The original hotshot-query-service enforces enforce_range_limit on all range queries (small_object_range_limit=500 for leaves/vid_common, large_object_range_limit=100 for blocks/headers/payloads/summaries). The new Axum implementation passes from..until directly to the data source without any validation. A client can request an arbitrarily large range (e.g. /v1/availability/leaf/0/1000000) and the server will try to load and serialize the entire range into a Vec, risking OOM or excessive DB load. See inline comment.

Minor Notes

  • Inconsistent height type in stream_namespace_proofs handler (u64 vs usize used by all other stream handlers). See inline comment.
  • Hardcoded /limits response reports limits that aren't enforced server-side. See inline comment.

What looks good

  • Clean trait separation: AvailabilityApi for espresso-specific endpoints vs HotShotAvailabilityApi for generic hotshot mirroring.
  • drive_ws_stream is a clean, reusable helper for WebSocket streaming.
  • block_id_to_hs / payload_id_to_hs conversion helpers are well-structured.
  • Thorough parity tests comparing tide-disco and axum responses across all new endpoints, including WebSocket streams.
  • The RawAvidMShare custom Debug impl avoids log spam from large payloads — nice cleanup.
  • The 32 MiB stack thread workaround for the test is well-documented with the reasoning.

Comment thread crates/espresso/node/src/api/state.rs
Comment thread crates/espresso/node/src/api/state.rs
Comment thread crates/espresso/api/src/axum.rs Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Nextest failures (1) in this run

Test Attempts Time (s) Main history
hotshot-testing::test_epoch_upgrade::test_epoch_upgrade::testtypes_::memoryimpl_::test_epoch_upgrade 1 15.38 passing

See the step summary for flaky tests and slowest tests.

parking_lot = { workspace = true }
rand = "0.10.0"

# optional:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo?

from: usize,
namespace: u32,
) -> anyhow::Result<futures::stream::BoxStream<'static, Self::NamespaceProofQueryData>> {
use espresso_types::{NamespaceId, NamespaceProofQueryData, NsProof};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move imports to top?

type BlockSummary =
hotshot_query_service::availability::BlockSummaryQueryData<espresso_types::SeqTypes>;
type Limits = hotshot_query_service::availability::Limits;
type Cert2 = hotshot_new_protocol::message::Certificate2<espresso_types::SeqTypes>;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import some of these types for readability?

Comment thread light-client/Cargo.toml
async-trait = { workspace = true }

# Testing dependencies
bitvec = { workspace = true, optional = true }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental cargo sort without -g?

@sveitser
Copy link
Copy Markdown
Collaborator

@ss-es I think my main concern is that we should do some equivalence checks for common errors where they are informative, like range limits / not found. To ensure we are returning reasonable errors from Axum for clients. At least for errors that they probably care about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants