Skip to content

feat(core): implement attester_duties, sync_committee_duties, attestation_data handlers#451

Merged
varex83 merged 13 commits into
mainfrom
bohdan/validatorapi-5
Jun 2, 2026
Merged

feat(core): implement attester_duties, sync_committee_duties, attestation_data handlers#451
varex83 merged 13 commits into
mainfrom
bohdan/validatorapi-5

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented May 28, 2026

Summary

Implements three validator-API handlers, stacked over PR #449.

  • attester_duties (POST /eth/v1/validator/duties/attester/{epoch}) — uses auto-gen GetAttesterDuties* envelope, accepts both numeric and string-encoded validator index arrays via a new ValIndexes dual-format deserializer, performs pubshare swap on the upstream response (unknown pubkey rejected).
  • sync_committee_duties (POST /eth/v1/validator/duties/sync/{epoch}) — same shape as above, reuses ValIndexes.
  • attestation_data (GET /eth/v1/validator/attestation_data) — query-string slot/committee_index, served from the local DutyDB (MemDB::await_attestation) rather than upstream. Component now holds an Arc<MemDB>.

All three return the auto-generated response envelope directly; attestation_data uses a small hand-rolled wrapper around phase0::AttestationData to avoid pointless typed-↔-string conversion.

Test plan

  • cargo +nightly fmt --all --check
  • cargo clippy -p pluto-core --all-targets --all-features -- -D warnings
  • cargo test -p pluto-core validatorapi:: — 14/14 passing (5 new tests across router + component)

varex83 added 8 commits May 28, 2026 14:08
Threads the Handler through Axum state via AppState<H> + with_state,
wires the node_version route to the real handler, and adds a TestHandler
mock that future PRs will extend per-endpoint.
Re-uses the auto-generated pluto_eth2api envelopes
(GetProposerDutiesResponseResponse, GetVersionResponseResponse) as the
on-the-wire shape rather than hand-rolling parallel types. node_version
is migrated to the same pattern; the body.rs hand-rolled wrapper module
is removed.
Drops the per-handler generic parameter and routes through
Arc<dyn Handler> via AppState. The Handler trait is object-safe
(Send + Sync + 'static + async_trait-generated methods), so this
is a pure type change with no surface impact.
Adds the Handler impl that the router has been calling through.
node_version returns the obolnetwork/pluto/{version}-{commit}/{arch}-{os}
identity string; proposer_duties calls the upstream beacon node and
rewrites known DV root public keys to this node's public share so the
validator client sees keys matching its keystore. The remaining 17
trait methods are unimplemented!() stubs that land per-PR as their
router handlers are ported.
Wires POST /eth/v1/validator/duties/attester/{epoch}: dual-format
(numeric or string-encoded) validator index body, upstream call,
pubshare swap.
Wires POST /eth/v1/validator/duties/sync/{epoch}, reusing the
ValIndexes dual-format body extractor.
Wires GET /eth/v1/validator/attestation_data. The Component now
holds an Arc<MemDB> and awaits unsigned attestation data from the
local DutyDB rather than hitting upstream.
@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented May 28, 2026

@varex83agent /review-pr

Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implements three validator-API handlers (attester_duties, sync_committee_duties, attestation_data) with reasonable parity to Charon and unit tests for the swap helpers, the dual-format ValIndexes deserializer, and the DutyDB-backed attestation path. Code style is idiomatic. However, two bug-level issues need fixing before merge, plus several major hardening gaps on this attacker-reachable surface.

Bugs (must fix):

  1. attestation_data awaits MemDB with no timeout, no request-cancel propagation, and no eviction signal — a request for a slot that never arrives becomes a permanent zombie task, contending on every legitimate store via notify_waiters. Charon plumbs the request ctx through awaitAttFunc; the Rust port loses this entirely. Wrap the await in a tokio::time::timeout (matching the slot deadline) and either wake on delete_duty or plumb a per-request CancellationToken.
  2. format!("unexpected upstream … response: {other:?}") interpolates the upstream beacon node's Debug output — including BlindedBlock400Response.message and any stacktraces field — directly into the client-visible ApiError.message. This forwards upstream internals (and possibly file paths) to anyone reachable on the validator API. Move {other:?} into with_source(...) and use a generic message.

Majors:

  • Component::new_insecure (which disables submit-side signature verification) is plain pub fn, not #[cfg(test)] — once submit endpoints land, this is a footgun.
  • new_router installs no DefaultBodyLimit, no TimeoutLayer, and no concurrency cap. Combined with the unbounded await above, slowloris-style requests are trivially possible.
  • All non-Ok upstream variants (BadRequest / InternalServerError / ServiceUnavailable / Unknown) are collapsed under 502. A syncing upstream (503) should propagate as 503 so the VC retries with backoff; a 400 from upstream means our forwarded request was bad.
  • Test coverage is missing for: cancellation/timeout on attestation_data, the upstream non-Ok variants, malformed slot/committee_index query params, and empty indices body.

Minors and a nit inline. Verdict: REQUEST_CHANGES — please address the two bug-level findings before merge; majors can land in a follow-up or this PR at your discretion.

Comment thread crates/core/src/validatorapi/component.rs Outdated
Comment thread crates/core/src/validatorapi/component.rs Outdated
Comment thread crates/core/src/validatorapi/component.rs
Comment thread crates/core/src/validatorapi/router.rs
Comment thread crates/core/src/validatorapi/component.rs Outdated
Comment thread crates/core/src/validatorapi/component.rs Outdated
Comment thread crates/core/src/validatorapi/component.rs Outdated
Comment thread crates/core/src/validatorapi/router.rs
Comment thread crates/core/src/validatorapi/component.rs
Comment thread crates/core/src/validatorapi/component.rs Outdated
varex83agent added a commit that referenced this pull request May 29, 2026
Bug fixes (must-fix per review):

- attestation_data: wrap MemDB::await_attestation in tokio::time::timeout
  (24s) so a request for a slot that never produces consensus output
  cannot hold a handler task indefinitely. delete_duty now records
  evicted keys per duty type and notifies waiters, so await_data returns
  Error::AwaitDutyExpired immediately when the awaited duty is gone
  instead of spinning until the timeout fires. Maps to 408 on the wire.
- Stop leaking upstream BlindedBlock400Response Debug output (incl.
  stacktraces) into the client-visible ApiError.message. The variant
  payload is now attached as `source` for debug logs; the message stays
  generic.

Hardening:

- new_insecure is gated behind #[cfg(test)] so the insecure_test flag
  cannot reach production builds.
- new_router applies DefaultBodyLimit::max(64 KiB) on the two
  POST /duties/{attester,sync}/{epoch} routes — defends against the
  Vec<u64> parse amplification on the ValIndexes deserializer.
- All upstream eth2_cl calls are wrapped in tokio::time::timeout(12s)
  so a hanging beacon node cannot stall handler tasks.
- proposer_duties / attester_duties / sync_committee_duties propagate
  upstream BadRequest as 400 and ServiceUnavailable as 503 instead of
  collapsing every non-Ok variant to 502 — the VC can now back off on
  upstream syncing instead of treating it as a gateway failure.
- swap_attester_pubshares / swap_sync_committee_pubshares now return
  500 (cluster misconfig) instead of 502 when a pubshare is missing —
  the upstream returned well-formed data, the failure is local.

ValIndexes:

- Replace #[serde(untagged)] with a streaming Visitor that validates
  each element via SeqAccess::next_element. Avoids the speculative
  Vec<u64> parse and the serde Content cache. Now accepts mixed
  numeric/string elements and rejects negative integers.
- Hard cap at 8192 indices per request.

ApiError:

- with_boxed_source for sources that aren't std::error::Error (e.g.
  anyhow::Error from auto-gen request builders).

Router:

- attestation_data uses Result<Query<...>, QueryRejection> so 4xx
  responses from missing/malformed query params share the same
  { code, message } envelope as the rest of the router.

Tests (+13):

- attestation_data: timeout when data never arrives; 408 when duty is
  evicted while a waiter is parked; cancellation cleanup when the
  handler future is dropped; negative lookup on wrong committee_index.
- Status-mapping helpers: confirm upstream Debug output is never
  serialized into the message.
- Router: ApiError envelope on bad query; oversized body rejection;
  ValIndexes empty/mixed/oversized/negative cases.

Co-Authored-By: Bohdan Ohorodnii <35969035+varex83@users.noreply.github.com>
Bug fixes (must-fix per review):

- attestation_data: wrap MemDB::await_attestation in tokio::time::timeout
  (24s) so a request for a slot that never produces consensus output
  cannot hold a handler task indefinitely. delete_duty now records
  evicted keys per duty type and notifies waiters, so await_data returns
  Error::AwaitDutyExpired immediately when the awaited duty is gone
  instead of spinning until the timeout fires. Maps to 408 on the wire.
- Stop leaking upstream BlindedBlock400Response Debug output (incl.
  stacktraces) into the client-visible ApiError.message. The variant
  payload is now attached as `source` for debug logs; the message stays
  generic.

Hardening:

- new_insecure is gated behind #[cfg(test)] so the insecure_test flag
  cannot reach production builds.
- new_router applies DefaultBodyLimit::max(64 KiB) on the two
  POST /duties/{attester,sync}/{epoch} routes — defends against the
  Vec<u64> parse amplification on the ValIndexes deserializer.
- All upstream eth2_cl calls are wrapped in tokio::time::timeout(12s)
  so a hanging beacon node cannot stall handler tasks.
- proposer_duties / attester_duties / sync_committee_duties propagate
  upstream BadRequest as 400 and ServiceUnavailable as 503 instead of
  collapsing every non-Ok variant to 502 — the VC can now back off on
  upstream syncing instead of treating it as a gateway failure.
- swap_attester_pubshares / swap_sync_committee_pubshares now return
  500 (cluster misconfig) instead of 502 when a pubshare is missing —
  the upstream returned well-formed data, the failure is local.

ValIndexes:

- Replace #[serde(untagged)] with a streaming Visitor that validates
  each element via SeqAccess::next_element. Avoids the speculative
  Vec<u64> parse and the serde Content cache. Now accepts mixed
  numeric/string elements and rejects negative integers.
- Hard cap at 8192 indices per request.

ApiError:

- with_boxed_source for sources that aren't std::error::Error (e.g.
  anyhow::Error from auto-gen request builders).

Router:

- attestation_data uses Result<Query<...>, QueryRejection> so 4xx
  responses from missing/malformed query params share the same
  { code, message } envelope as the rest of the router.

Tests (+13):

- attestation_data: timeout when data never arrives; 408 when duty is
  evicted while a waiter is parked; cancellation cleanup when the
  handler future is dropped; negative lookup on wrong committee_index.
- Status-mapping helpers: confirm upstream Debug output is never
  serialized into the message.
- Router: ApiError envelope on bad query; oversized body rejection;
  ValIndexes empty/mixed/oversized/negative cases.

Co-Authored-By: Bohdan Ohorodnii <35969035+varex83@users.noreply.github.com>
@varex83agent varex83agent force-pushed the bohdan/validatorapi-5 branch from 33221ad to 5da9093 Compare May 29, 2026 11:48
@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented Jun 1, 2026

@varex83agent /review-pr

Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR wires up three validator-API handlers (attester_duties, sync_committee_duties, attestation_data) and is generally high quality — the hardening from the prior review round is solid (request timeouts, 64 KiB body cap, streaming ValIndexes visitor with an 8192 cap, no info leak into client-visible error messages, new_insecure gated to #[cfg(test)]), the await_data refactor is free of the classic lost-wakeup race (notified() is created before the lock-check), and the pubshare-swap parity with Charon (reject-unknown for attester/sync, pass-through for proposer) is correct. Test coverage is good.

Two must-fix issues before merge:

  1. Unbounded memory growth in the evicted_* HashSets (dutydb/memory.rs). The four eviction-tracking sets introduced in the fix commit are only ever inserted into and never pruned, so every evicted slot/key is retained for the entire process lifetime — an unbounded leak in a long-running daemon and a divergence from Charon, which keeps no such set. This optimization needs a bound (e.g. a min_live_slot watermark, or pruning below a retention horizon).
  2. Missing committee_index=0 dual-write breaks post-Electra VCs (dutydb/memory.rs). Charon's storeAttestationUnsafe deliberately stores each attestation twice — under the real committee index and under index 0 — so post-Electra VCs that request attestation_data?committee_index=0 resolve. Pluto's store_attestation writes only the real index; this PR makes the attestation_data endpoint live, so a conformant Electra VC asking for index 0 will now time out (408) and fail to attest. (Root cause is in pre-existing store logic, but it becomes reachable here.)

The remaining findings are minor/nit: an error-envelope inconsistency on the {epoch} path param, a ValIndexes parity superset (mixed numeric/string accepted where Charon is all-or-nothing), the std::io::Error::other debug-string carrier smell, and the fact that the source attached "for debug logging" is never actually logged. Requesting changes for the two majors; everything else is author discretion.

Comment thread crates/core/src/dutydb/memory.rs Outdated
Comment thread crates/core/src/dutydb/memory.rs Outdated
Comment thread crates/core/src/dutydb/memory.rs
Comment thread crates/core/src/validatorapi/router.rs
Comment thread crates/core/src/validatorapi/types.rs
Comment thread crates/core/src/validatorapi/component.rs
Comment thread crates/core/src/validatorapi/component.rs
Comment thread crates/core/src/validatorapi/component.rs
Comment thread crates/core/src/validatorapi/types.rs
Comment thread crates/core/src/validatorapi/types.rs
Base automatically changed from bohdan/validatorapi-2 to main June 1, 2026 14:57
varex83agent and others added 2 commits June 1, 2026 17:27
The evicted_* HashSets in MemDB were insert-only: delete_duty added an
entry per evicted slot/aggregation/contrib key and nothing ever removed
them, so on a long-running node they grew without bound (one entry per
12s slot, forever). Charon's Go dutydb keeps no eviction record at all
and relies on the per-request context timeout.

Because the deadliner expires duties in non-decreasing slot order and
store() refuses already-expired duties, the highest evicted slot is
sufficient: any awaited slot <= the mark that is not currently stored
will never be stored. Replace the three slot-keyed sets with O(1)
Option<u64> high-water marks (attester/proposer/sync-contribution),
preserving the immediate-AwaitDutyExpired fast path.

Aggregated attestations are awaited by root only (no slot at await
time), so they cannot consult a slot mark; drop their eviction tracking
and let an evicted root rely on the request timeout, matching Go.

Add a regression test asserting an evicted slot and any older slot fail
fast with AwaitDutyExpired without parking.

Co-Authored-By: varex83 <ivan@nethermind.io>
attester_duties and sync_committee_duties extracted the body as bare
`Json<ValIndexes>`, so a malformed body produced axum's default
plain-text rejection — 400 for a JSON syntax error but 422 for a wrong
type — instead of the router's standard {code,message} envelope. Charon's
unmarshal returns a uniform 400 with its apiError body for every request
body parse failure.

Extract as `Result<Json<ValIndexes>, JsonRejection>` and map through a new
json_rejection_to_api_error, mirroring the existing query-rejection path.
Genuine parse failures normalise to 400; the DefaultBodyLimit size cap
still surfaces as 413 (Pluto's DoS defense), preserved explicitly. Add a
router test asserting the 400 envelope for a wrong-shape body.

Co-Authored-By: varex83 <ivan@nethermind.io>
@varex83agent
Copy link
Copy Markdown
Collaborator

/loop-review-pr summary

Ran 3 review-and-fix iterations (plus 2 verification passes) against this PR, each driving the same four-agent pipeline as /review-pr (functional-equivalence, security, rust-style, code-quality). No inline comments were posted during the loop — this is the single summary.

Terminated by: completion (PR reached ideal state) — internal review produced zero bug/major findings for two consecutive passes and all gates are green.

Quality gates (final, full workspace, --all-features)

  • cargo +nightly fmt --all --checkpass
  • cargo clippy --workspace --all-targets --all-features -- -D warningspass
  • cargo test --workspace --all-featurespass (40/40 test groups, 0 failures)

Resolved during the loop

Bugs (0)

  • none

Major (1)

  • Unbounded MemDB eviction state — the four evicted_* tombstone HashSets were insert-only (one entry per evicted slot, forever), an unbounded memory leak on a long-running node. Replaced the three slot-keyed sets with O(1) Option<u64> high-water marks (attester/proposer/sync-contribution); dropped aggregation eviction tracking since it is awaited by root only and relies on the request timeout, matching Charon's Go dutydb (which keeps no eviction record at all). Added a regression test. — crates/core/src/dutydb/memory.rs — fix in 6a8a94c

Minor (1)

  • Malformed duties request body bypassed the {code,message} envelope and returned axum's default 422 for type errors. Now extracts Result<Json<ValIndexes>, JsonRejection> and maps via json_rejection_to_api_error to a uniform 400 (mirroring Charon's unmarshal and the router's own query-rejection path); the 413 body-size DoS defense is preserved explicitly. Added a router test. — crates/core/src/validatorapi/router.rs — fix in b5eb0ca

Nits (1)

  • Regression-test arithmetic style (SLOT - 1SLOT.saturating_sub(1)) for consistency — folded into 6a8a94c

Outstanding (minor/nit — do not block "ideal"; left for author judgement)

  • [minor] A duties POST with a missing Content-Type header returns 400, whereas Charon's wrap treats an absent/empty Content-Type as application/json and parses the body (→ 200). Genuine but narrow Go-parity edge case; real validator clients always send the header. The fix is invasive (a content-type-injecting middleware, or swapping Json for Bytes + manual serde_json), so it was not applied — flagging for your call.
  • [minor] ValIndexes accepts mixed numeric+string arrays (e.g. [1, "2"]) that Charon rejects. This appears intentional — the PR has a test (val_indexes_accepts_mixed_elements) asserting it — so it was left as-is.
  • [nit] Accepted/intentional deviations: VAL_INDEXES_MAX_LEN (8192) + 64 KiB body cap (DoS hardening Charon lacks); node_version arch token uses Rust naming (x86_64) vs Go's amd64 (cosmetic, no client impact); per-handler timeouts (12s/24s) vs Charon's uniform 10s (all resolve to 408).
  • [nit] wake() takes DutyType by value (the clone is actually required — DutyType isn't Copy and delete_duty consumes the Duty); parse_bls_pubkey(&str) vs impl AsRef<str>; json_rejection_to_api_error could use an @-binding instead of re-constructing PAYLOAD_TOO_LARGE; store() holds the write lock across the async deadliner.add() (pre-existing, not worsened by this PR); the high-water-mark soundness invariant (deadliner expires in non-decreasing slot order) could carry a debug_assert.

Verdict

PR is ideal — all bug/major findings resolved, the dutydb concurrency fix was independently verified sound (monotonic-slot invariant, no TOCTOU, no lost wakeup), and fmt/clippy/test are green across the full workspace. Remaining items are minor/nit and either intentional or low-value.

Resolve conflicts in crates/core/src/validatorapi/{component,handler,router,testutils,types}.rs.

Main re-landed the node_version + proposer_duties scaffold via #449 with
EthResponse<Vec<...>> placeholder return types and unimplemented stubs
for attester_duties / sync_committee_duties / attestation_data. This
branch had already implemented all three handlers on top of the
attester_duties / sync_committee_duties / attestation_data work plus
typed *Response wrappers, timeout-bounded upstream calls, status-mapping
helpers, ValIndexes streaming deserializer, and DutyDB integration.

Taking ours for all five files preserves the implemented handlers; main's
side was a strict subset (the same scaffold without the new handlers).
error.rs auto-merged with both with_source and with_boxed_source intact.

Verified: cargo check --workspace passes; pluto-core clippy passes; the
28 validatorapi tests pass.

Co-Authored-By: varex83 <ivan@nethermind.io>

/// Hard deadline for upstream beacon-node calls. Bounds the worst-case
/// handler latency when the upstream hangs or stalls. Roughly one slot.
const UPSTREAM_REQUEST_TIMEOUT: Duration = Duration::from_secs(12);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 12? Should be 10 seconds right?
defaultRequestTimeout = 10 * time.Second

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 12e2263 (timeout 12s → 10s, doc comment now points to core/validatorapi/router.go:61).

Comment thread crates/core/src/validatorapi/router.rs Outdated
/// instead of axum's default plain-text response.
///
/// Every genuine parse failure — malformed JSON (`400`), wrong element type
/// (`422`), missing `content-type` (`415`) — is normalised to a uniform
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing content-type is accepted in charon

if contentHeader == "" || strings.Contains(contentHeader, string(contentTypeJSON)) {
			typ = contentTypeJSON
		} else if strings.Contains(contentHeader, string(contentTypeSSZ)) {
			typ = contentTypeSSZ
		} else {
			writeError(ctx, w, endpoint, apiError{
				StatusCode: http.StatusUnsupportedMediaType,
				Message:    "unsupported media type " + contentHeader,
			})

			return
		}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 12e2263 — added an enforce_json_content_type middleware on the duties POST routes that mirrors Charon's policy (missing or JSON Content-Type passes; anything else → 415 with the offending value in the message). SSZ is not supported yet, so for now non-JSON is rejected; the middleware is the right seam to add SSZ later. Two new router tests cover the missing-CT and non-JSON cases.

Addresses inline review comments from @iamquang95 on PR #451.

UPSTREAM_REQUEST_TIMEOUT was 12s — set to 10s to match Charon's
defaultRequestTimeout (core/validatorapi/router.go:61).

POST /eth/v1/validator/duties/{attester,sync}/{epoch} now matches
Charon's content-type policy via a new enforce_json_content_type
middleware layered onto duties_post:

- missing Content-Type → treated as application/json (Charon parity)
- application/json → passes through to the existing Json extractor
- anything else → 415 Unsupported Media Type with the offending value

Previously axum's Json extractor rejected a missing header with
MissingJsonContentType, which json_rejection_to_api_error normalised
to 400 — diverging from Charon, which lets VCs that don't set the
header through. Non-JSON Content-Type was also collapsed to 400; it
is now the 415 Charon returns.

Updated json_rejection_to_api_error doc comment: it no longer sees
content-type rejections (the middleware intercepts them upstream).

Tests (+2):

- attester_duties_accepts_missing_content_type asserts 200 when
  Content-Type is absent.
- attester_duties_rejects_non_json_content_type asserts 415 with the
  offending media type echoed in the message.

Co-Authored-By: varex83 <ivan@nethermind.io>
@varex83 varex83 merged commit e75a114 into main Jun 2, 2026
15 checks passed
@varex83 varex83 deleted the bohdan/validatorapi-5 branch June 2, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants