Skip to content

feat(spartan): client-observed inclusion latency for bench-10tps#23474

Open
spypsy wants to merge 1 commit into
nextfrom
spy/bench-10tps-client-latency
Open

feat(spartan): client-observed inclusion latency for bench-10tps#23474
spypsy wants to merge 1 commit into
nextfrom
spy/bench-10tps-client-latency

Conversation

@spypsy
Copy link
Copy Markdown
Member

@spypsy spypsy commented May 21, 2026

Summary

Replaces the Prometheus aztec_mempool_tx_mined_delay_milliseconds histogram (which mixes every mempool observer — RPC + validators + full-nodes — and both high- and low-value txs) with a client-observed measurement scoped to the high-value lane.

What changes

  • n_tps.test.ts: block-watcher RunningPromise polls getBlockNumber() every 1s and stamps wall-clock minedAtMs the first time each sent tx's block becomes visible to the test client. Records dumped (high-value group only) into /tmp/n_tps_timing_data.json alongside the existing startedAt/endedAt.
  • tx_metrics.ts: rename sentAt/minedAt/attestedAt*Ms (ms wall-clock); add observeBlockForMinedTxs (idempotent, first observer wins) and getInclusionRecords. recordMinedTx kept as fallback (slot-timestamp × 1000) for any tx the watcher misses.
  • bench_scrape.ts: load records via new --inclusion-records flag; compute summary.inclusionLatencyP{50,95,99}Ms and the txMinedDelayP{50,95,99} time series (60s bins by sentAtMs) directly from the per-tx data. The other Prom-based histograms (build duration, public-processor) are untouched.
  • bench_output.schema.json: widens timeSeries.source enum to include "client_observed", declares the three txMinedDelay* slugs.
  • bootstrap.sh: HIGH_VALUE_TPS=1, LOW_VALUE_TPS=9 (was 10/0) so the high-value scoping is non-trivial; passes --inclusion-records "$metadata" to the scraper.

Field names on the run JSON are unchanged, so the dashboard panel in AztecProtocol/explorations keeps working — descriptions updated to reflect the new source (PR spy/inclusion-latency-client-source).

Why

  • Prom histogram measures mempool-add → mempool-saw-block per node. Aggregating across RPC + validators dilutes the headline with the validator's gossip-arrival-based view.
  • It also doesn't distinguish high-value from low-value lanes. In the new 1/9 mix, low-value txs run at network min priority and are explicitly allowed to fail fee checks, which would dominate any aggregated quantile.
  • Client-observed wall-clock is what a paying client actually experiences end-to-end.

Caveats

  • minedAtMs has ~1s jitter from the watcher's poll interval. p50 is typically multiple slots, so this is small relative to the signal.
  • bench_output.schema.json diff is noisy: the precommit prettier hook expanded inline { "$ref": "..." } entries throughout. Semantic change is ~7 lines at the top of the file.

Test plan

  • Run bench-10tps end-to-end against a real namespace.
  • Verify /tmp/n_tps_timing_data.json contains inclusionRecords with non-trivial minedAtMs - sentAtMs deltas.
  • Verify run JSON has timeSeries.txMinedDelayP* with source: "client_observed" and summary.inclusionLatencyP*Ms populated.
  • Verify the dashboard panel renders the line chart on the new run.

Supersedes #23465.

@AztecBot AztecBot force-pushed the spy/bench-10tps-client-latency branch 2 times, most recently from 971a74d to b45c342 Compare May 21, 2026 14:45
@AztecBot AztecBot enabled auto-merge May 21, 2026 14:45
## Summary

Replaces the Prometheus `aztec_mempool_tx_mined_delay_milliseconds` histogram (which mixes every mempool observer — RPC + validators + full-nodes — and both high- and low-value txs) with a **client-observed measurement scoped to the high-value lane**.

### What changes
- **`n_tps.test.ts`**: block-watcher `RunningPromise` polls `getBlockNumber()` every 1s and stamps wall-clock `minedAtMs` the first time each sent tx's block becomes visible to the test client. Records dumped (high-value group only) into `/tmp/n_tps_timing_data.json` alongside the existing `startedAt`/`endedAt`.
- **`tx_metrics.ts`**: rename `sentAt`/`minedAt`/`attestedAt` → `*Ms` (ms wall-clock); add `observeBlockForMinedTxs` (idempotent, first observer wins) and `getInclusionRecords`. `recordMinedTx` kept as fallback (slot-timestamp × 1000) for any tx the watcher misses.
- **`bench_scrape.ts`**: load records via new `--inclusion-records` flag; compute `summary.inclusionLatencyP{50,95,99}Ms` and the `txMinedDelayP{50,95,99}` time series (60s bins by `sentAtMs`) directly from the per-tx data. The other Prom-based histograms (build duration, public-processor) are untouched.
- **`bench_output.schema.json`**: widens `timeSeries.source` enum to include `"client_observed"`, declares the three `txMinedDelay*` slugs.
- **`bootstrap.sh`**: `HIGH_VALUE_TPS=1`, `LOW_VALUE_TPS=9` (was 10/0) so the high-value scoping is non-trivial; passes `--inclusion-records "$metadata"` to the scraper.

Field names on the run JSON are unchanged, so the dashboard panel in `AztecProtocol/explorations` keeps working — descriptions updated to reflect the new source (PR `spy/inclusion-latency-client-source`).

### Why
- Prom histogram measures *mempool-add → mempool-saw-block* per node. Aggregating across RPC + validators dilutes the headline with the validator's gossip-arrival-based view.
- It also doesn't distinguish high-value from low-value lanes. In the new 1/9 mix, low-value txs run at network min priority and are explicitly allowed to fail fee checks, which would dominate any aggregated quantile.
- Client-observed wall-clock is what a paying client actually experiences end-to-end.

### Caveats
- `minedAtMs` has ~1s jitter from the watcher's poll interval. p50 is typically multiple slots, so this is small relative to the signal.
- `bench_output.schema.json` diff is noisy: the precommit prettier hook expanded inline `{ "$ref": "..." }` entries throughout. Semantic change is ~7 lines at the top of the file.

### Test plan
- [ ] Run bench-10tps end-to-end against a real namespace.
- [ ] Verify `/tmp/n_tps_timing_data.json` contains `inclusionRecords` with non-trivial `minedAtMs - sentAtMs` deltas.
- [ ] Verify run JSON has `timeSeries.txMinedDelayP*` with `source: "client_observed"` and `summary.inclusionLatencyP*Ms` populated.
- [ ] Verify the dashboard panel renders the line chart on the new run.

Supersedes #23465.
@AztecBot AztecBot force-pushed the spy/bench-10tps-client-latency branch from b45c342 to c5ad602 Compare May 21, 2026 14:48
@AztecBot AztecBot added this pull request to the merge queue May 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants