Skip to content

feat(starfish-core): score integration feature branch#11533

Closed
piotrm50 wants to merge 51 commits into
developfrom
protocol-research/feat/score-integration-starfish
Closed

feat(starfish-core): score integration feature branch#11533
piotrm50 wants to merge 51 commits into
developfrom
protocol-research/feat/score-integration-starfish

Conversation

@piotrm50
Copy link
Copy Markdown
Contributor

@piotrm50 piotrm50 commented May 14, 2026

Summary

Umbrella PR for the Starfish score-integration work. Aggregates the stacked PRs below into a single feature branch so the full pipeline can be reviewed end-to-end and merged into `develop` once each piece is approved.

This PR is intentionally empty until the first child lands on the feature branch.

Stacked PRs (in merge order)

  1. feat(starfish-core): introduce ScoringMetricsStore with storage persistence #10088 — Introduce `MisbehaviorStore` with storage persistence
  2. feat(starfish-core): implement misbehavior tracking pipeline in DagState #11531 — Implement misbehavior tracking pipeline in `DagState`, including the eviction flush and faulty-block recording at every peer receive site

Tracking

Part of https://github.com/iotaledger/iota-private/issues/277

Notes for reviewers

  • The child PRs are the authoritative review surface. This PR only exists to consolidate the chain before landing on `develop`.
  • Stays in draft and gets rebased as each child merges; flipped to ready and merged into `develop` once everything below is green.

Test plan

  • All stacked PRs have their own test plans and pass `cargo test -p starfish-core`
  • Full repo CI green on this branch before merging to `develop`

oliviasaa and others added 30 commits May 5, 2026 14:52
- Drop spurious increment_invalid_reports_count when scoring is locally
  disabled. The sender did nothing wrong; only the local node has
  scoring off, so charging them as having sent an invalid report is
  semantically wrong.
- Replace the verify-vs-process TODO with a one-line invariant comment.
  The reconfig guard at the top of the process arm already covers the
  end-of-epoch lock-out, and validation has fully moved into
  verify_consensus_transaction.
- Make Misbehaviors::from infallible by replacing the inherent method
  with the From trait. Every variant maps to Some, so the Option was
  misleading.
- Replace bogus expect() template strings (whose {placeholders} never
  interpolate) with direct indexing in MisbehaviorCounts::get_value
  and get_metric. Rust's built-in panic prints the offending index.
- Add Eq/Copy/Debug derives to Misbehaviors and Eq to Version.
- Delete VersionedMisbehaviorReport::is_valid_version, which has no
  callers after the refactor (MisbehaviorConfig::accepts_report
  replaces it).
- Mark ConsensusOutputMisbehaviors as #[allow(dead_code)] until the
  Starfish wiring lands; today misbehavior_counts() returns vec![].
Both directions of the MisbehaviorCounts <-> wire-format conversion now
match on the Misbehaviors enum instead of indexing self.0[0..3]. Reordering
ReportedMisbehaviors can no longer silently swap categories — adding or
removing a variant becomes a compile error in the V1 arm.

- to_report now takes &ReportedMisbehaviors and assigns LegacyReportPayload
  fields by matching on each row's variant.
- The From<&VersionedMisbehaviorReport> impl is replaced by from_report,
  which builds rows in the order dictated by ReportedMisbehaviors.
- Callers in MisbehaviorMonitor and ReportAggregator pass the schema
  through from MisbehaviorConfig.
The previous load -> merge_max -> store sequence on ArcSwapOption was
not atomic: two concurrent process_report calls for the same authority
could lose an update (the smaller merged value overwriting a larger
one). The single-threaded consensus handler currently shields the
aggregator from this race, but the type now enforces the invariant
itself rather than relying on caller serialization.

rcu retries the closure under contention, so concurrent merges
serialize correctly. Reads remain lock-free.
Replace Vec<AtomicU64> with ArcSwap<Vec<u64>> for current_scores. The
previous implementation stored each score with a per-cell Relaxed
write inside update_scores_v1, so a checkpoint reader could observe a
torn mix of old and new values across authorities.

A single pointer swap now publishes the whole score vector at once;
readers see a consistent snapshot. update_scores_v1 builds the new
Vec<u64> locally and stores it in one shot, and current_scores() loads
the snapshot and clones it.
Two leftovers from the scorer-refactoring rebase:

- consensus_validator.rs imported and constructed MisbehaviorsV1, which
  was renamed to LegacyReportPayload in 972a704. Update the import
  and construction site.
- consensus_handler.rs passed 8 args to starfish_core::CommittedSubDag::new
  which only takes 7; the extra vec![] was added in 47e51d4 during
  the consensus-output API wiring and never trimmed when the constructor
  shape settled. Drop the trailing argument.
piotrm50 added 19 commits May 5, 2026 14:52
Callers must call MisbehaviorSchemaVersion::accepts_report before
constructing MisbehaviorCounts from a report; today the only path is
ReportAggregator::process_report, which sits behind validate_report in
verify_consensus_transaction. The assertion makes that contract
explicit so future call sites or new wire-format versions can't
silently produce wrongly-shaped counts.
The comment referenced the previous Vec<AtomicU64> storage; scores are
now published as a single ArcSwap<Vec<u64>> snapshot.
…om_consensus_output

The previous `.unwrap_or_default()` produced an empty `Vec<u64>` when a
locally-tracked `Misbehaviors` variant was missing from the consensus
output, violating the documented matrix invariant that each row's length
equals `committee_size`. Downstream `get_value`/`get_metric` would then
panic and `to_report` would build wire payloads peers reject.

Take `committee_size` as a parameter and pre-allocate zero rows so missing
categories project to zeros instead of empty rows. Flip the iteration to
input -> schema, dropping the `swap_remove` (move-out is now free via the
consuming `for` loop) and incidentally dropping silent duplicate-handling.

Also document the single-writer assumption on the rate-limit triple in
`MisbehaviorMonitor` to flag the implicit invariant for future readers.
The file no longer contains a MisbehaviorConfig struct; it holds the shared
data model for the misbehavior subsystem (schema version, categories,
counts, conversions). This commit moves the file with no content changes
so the rename is detected by git review tools.
… Misbehavior(s)/ReportPayload

Replace the in-memory `MisbehaviorCounts(Vec<Vec<u64>>)` with a versioned
enum (`MisbehaviorCounts::V1(MisbehaviorCountsV1)`) whose V1 variant is a
struct with named per-metric fields. Every operation that touches all
metrics — `merge_max`, `to_report`, `from_report`, the median computation
in `Scorer::calculate_median_report` — is now exhaustive at compile time
via either a named struct literal or a `match` on `Misbehavior`. Adding a
metric to the V1 struct produces missing-field errors at the call sites
that need updating, replacing the previous documented "append-only" enum
order convention.

Renames bundled in for clarity:
- `LegacyReportPayload` -> `ReportPayloadV1` (the active V1 wire format,
  not a deprecated one).
- `Misbehaviors` -> `Misbehavior`, `ConsensusOutputMisbehaviors` ->
  `ConsensusOutputMisbehavior` (Rust idiom for tagged-union enums where
  each variant is a single instance).
- Drop the unused `ReportPayload(Vec<Vec<u64>>)` placeholder. Future V2
  will add `ReportPayloadV2` as a dedicated named-field struct.
- Drop the duplicate `verify_legacy_payload` free function in iota-core;
  use the existing `ReportPayloadV1::verify` method on the type itself.

`Scorer` now stores `schema_version` so it can resolve `Parameters` indices
back to `Misbehavior` variants when looking up rows. `Parameters` itself
remains positional and is flagged as a follow-up.

BCS encoding of `DBReceivedReportsStatePerAuthority.received_metrics`
changes from a length-prefixed nested array to an enum tag plus four
named `Vec<u64>` fields. The loading path is dead-code-gated and the
schema is unreleased; no migration needed.
… out of variants

Reshape the type from an enum-with-OnceCell-per-variant into a struct that
wraps a versioned `ReportPayload` enum and a single `#[serde(skip)]` digest
cache. The digest is logically per-report, not per-payload-version, so it
belongs in one place.

Wire format is unchanged: BCS still serializes only `payload` (enum tag +
the V1 named-field struct). Adding V2 now means appending one variant to
`ReportPayload`; no per-variant cache to remember.

Update call sites: pattern-match on `report.payload` instead of the report
itself; constructors go through `VersionedMisbehaviorReport::new_v1`.
- scorer.rs Parameters: replace removed ReportedMisbehaviors reference
  with MisbehaviorSchemaVersion::reported_misbehaviors().
- messages_consensus.rs VersionedMisbehaviorReport: spell out the
  serde(skip)-only invariant rather than asserting layout equivalence
  to a bare ReportPayload.
- messages_consensus.rs ReportPayload: drop the stale
  // V2(ReportPayloadV2), placeholder; the doc on the enum covers the
  extension pattern.
- messages_consensus.rs ReportPayloadV1: convert to /// doc comment and
  note that field declaration order is part of the wire format.
- misbehavior_monitor.rs rate-limit fields: clarify that the two
  AtomicU64s form a logical tuple while the AtomicBool is independent.
- misbehavior.rs MisbehaviorCounts: clarify wire-format struct field
  order is load-bearing, in-memory mirror is not.
- consensus_output_api.rs ConsensusOutputMisbehavior: explain why the
  enum is kept while Starfish wiring is pending.
- report_aggregator.rs invalid_reports_count: point at validate_report
  as the bump site for the determinism rationale.
…ches

Two related boundary cleanups around MisbehaviorCounts:

- Drop the redundant version parameter from to_report and from_report.
  Each MisbehaviorCounts variant maps 1:1 to a ReportPayload variant, so
  the version was always derivable from self / report.payload.
  Removing it makes wire/schema mismatch unrepresentable, which lets the
  debug_assert! disappear with no loss of safety. Misbehavior gains Hash
  (used below) and the doc comment is updated to drop the misleading
  append-only claim — the enum is not serialized; what's load-bearing
  is the ordering of the schema's reported_misbehaviors() slice.

- Make from_consensus_output's projection visible. It previously
  silently zero-filled missing categories and silently dropped extras.
  Now it warn-logs three drift conditions: row length not matching
  committee_size (skipped), duplicate category in the same output
  (overwritten), and category tracked locally but absent from consensus
  (zero-filled). No new metrics — operator log surface only.

Plus a TODO at scorer.rs's MisbehaviorCounts::V1 destructure clarifying
that V2 should branch the whole computation rather than be patched
with an if let.
Two distinct failure modes (wrong wire-format variant vs. malformed
payload) used to collapse into a bare bool, leaving the caller's warn
log to say only "Received invalid misbehavior report from X".
Operators investigating drift had no signal to distinguish a peer
running a different schema version from a peer producing
wrongly-shaped vectors.

Introduce ReportValidationError { WrongSchemaVersion, PayloadShape }
with Display, return Result<(), ReportValidationError> from
validate_report, and propagate the reason in the warn at
verify_consensus_transaction.
…rsV1

The Scorer had two structural problems that compounded each other:

- Layout mismatch: Scorer's positional Parameters (six Vecs indexed by
  misbehavior position) didn't mirror the named-field shape of
  MisbehaviorCountsV1 / ReportPayloadV1. The only reason the indices
  worked was MisbehaviorSchemaVersion::reported_misbehaviors(), a slice
  that existed solely to bridge them.

- Delegation seam in the wrong type: Scorer had two impl blocks. One
  delegated via match self.version, the other implemented V1 math
  directly on Scorer with _v1-suffixed methods. The version enum
  carried only data; the implementation lived elsewhere.

This commit moves the per-version implementation onto the version enum:

- ScorerVersion is renamed to VersionedScorer. Variants are full
  per-version structs (today: ScorerV1) that own their parameters AND
  their update_scores method.
- ScorerV1 has named MetricParams fields mirroring MisbehaviorCountsV1.
  metric_to_score moves to MetricParams. The score loop iterates
  (row, params) pairs explicitly via metric_pairs(), no more
  metric_index: usize, no precomputed minor_indices/minor_weights/
  major_indices, no MisbehaviorSchemaVersion::reported_misbehaviors()
  lookup.
- Scorer collapses to a thin shell with one impl block: it owns the
  ArcSwap-published score vector and the voting-power table, and
  delegates to self.version.update_scores(...).

Net: MisbehaviorSchemaVersion drops reported_misbehaviors() and
num_metrics(). Its remaining API is from_protocol + accepts_report —
it now versions only the wire format. The from_consensus_output
warning loop uses a local const-array of expected V1 categories.

Test numerical expectations are unchanged; the refactor is structural.
After commit D, the type's responsibilities shrank to from_protocol +
accepts_report — it versions the wire-format report, not a schema.
The schema lives in MisbehaviorCountsV1 / ReportPayloadV1 named-field
types. "Schema" is misleading; "report version" is what it actually
selects.

Renames:
- MisbehaviorSchemaVersion -> MisbehaviorReportVersion (the type)
- schema_version -> report_version (fields and locals)
- ReportValidationError::WrongSchemaVersion -> WrongReportVersion
- mock_schema_version -> mock_report_version (test helpers)
The cfg(test) calculate_scores_v1 / calculate_median_report bridges on
Scorer existed only to keep pre-VersionedScorer tests working without
edits. They masked the version split with extra glue: a destructure
out of MisbehaviorCounts followed by a match on VersionedScorer that
re-destructured into the V1 variant.

V1-specific tests now address ScorerV1 directly via
ScorerV1::v1_parameters() and call median_report / score_from_median
on it. They build MisbehaviorCountsV1 inline rather than wrapping in
MisbehaviorCounts::V1(...). When V2 lands its tests live alongside
ScorerV2 without touching V1's.

Tests of Scorer (initialization, update_scores end-to-end) stay on
the outer struct since they exercise the publish path / construction.
Mirror the AuthorityCapabilitiesV1 shape: the report carries its own
authority + generation instead of being passed as a 3-tuple alongside
ConsensusTransactionKind::MisbehaviorReport. Callers thread one value
through, the consensus key reads provenance off the report, and the
'authority matches consensus sender' check stays at the boundary.

Wire format is preserved by ordering the new struct fields as
authority || payload || generation — BCS encodes a struct as the
concatenation of its fields, so the bytes match the legacy 3-tuple.
Two new tests in iota-types pin this:
- misbehavior_report_wire_format_unchanged: struct-level bytes
- misbehavior_report_consensus_kind_wire_format_unchanged: full
  ConsensusTransactionKind including the variant tag

Required because misbehavior reports are already on testnet — any
wire drift here would halt a running cluster.
The counter is incremented in two sites in verify_consensus_transaction
but nothing reads it: no Prometheus metric, no DBMap persistence (the
DBReceivedReportsStatePerAuthority naming and to_serializable machinery
notwithstanding), no Scorer input, no outgoing report field. Documenting
this in place so the next reader doesn't assume the snapshot
infrastructure is load-bearing.
The previous TODO claimed the DBReceivedReportsStatePerAuthority /
to_serializable scaffolding was unwired. It is — on this branch — but
that's intentional: Olivia's score-integration-persist-scorer-state
follow-up branch already contains the DBMap, the snapshot path through
ConsensusCommitOutput, and the restore_from_tables() logic. Refactor
the comment to record that context so the next reader doesn't try to
delete the snapshot machinery thinking it's orphaned.
The in-memory MisbehaviorCounts/MisbehaviorCountsV1 mirrored the wire
ReportPayload/ReportPayloadV1 1:1 — same fields, same types, From impls
that were pure clones. The wrapper carried no behavior beyond merge_max,
zeros, and from_consensus_output, and the metric() accessor was unused.

Drop the mirror, rename ReportPayload[V1] to MisbehaviorObservations[V1]
in iota-types so the wire type also names the in-memory role, and move
the three remaining helpers to free functions in iota-core's misbehavior
module. Wire format is unchanged — the BCS-pinning tests in
iota-types::messages_consensus::tests still pass byte-for-byte.

Also fix a stale comment referring to the "consensus crate" that should
read "starfish crate".
A round of post-refactor cleanups identified during review:

- Rename outer `Scorer` to `Scoreboard` — it owns the published score
  vector and voting-power table, the actual math lives in
  `VersionedScorer`/`ScorerV1`. The thing called "scorer" now actually
  scores.
- Move the `MisbehaviorObservations::V1(_)` unwrap from
  `ScorerV1::median_report` up into `VersionedScorer::update_scores`.
  `ScorerV1` now consumes typed V1 reporters directly; when V2 lands
  the dispatcher's match becomes non-exhaustive and forces a
  deliberate decision instead of silently dropping V2 reports.
- Make `MisbehaviorMonitor::update_from_consensus_output` fold new
  counts in via `merge_max` under RCU, so the local view is monotonic
  even if upstream produces a transient dip. Closure is pure (commutative
  + associative + idempotent), so RCU's may-run-multiple-times semantics
  are safe.
- Drop the `ReceivedReportsState` newtype in `report_aggregator`. Its
  hand-rolled `Index`/`FromIterator` impls were pure delegation to the
  inner `Vec`; removing them removes ~14 lines of ceremony.
- Simplify `reporters_with_voting_power` by switching to `load_full()`
  and a plain `for`/`if let`. Drops the explicit `Guard` + `as_ref` +
  `Arc::clone` dance.
- Remove the `misbehavior_monitor_version` protocol-config field
  (introduced earlier on this branch, not yet shipped) and have
  `MisbehaviorReportVersion::from_protocol` read `scorer_version`
  instead. Scorer impl and report wire format now bump together — one
  knob, simpler model. Decoupling can be reintroduced in a future PR
  if a concrete upgrade requires it.
Replaces the position-tagged `Vec<(ConsensusOutputMisbehavior, Vec<u64>)>`
return shape with a named-field bridge struct
`ConsensusOutputMisbehaviorCounts`. The trait impl is now the single
bridge between Starfish's internal observation state and IOTA's wire
schema; the mapper projects the bridge struct onto whichever
`MisbehaviorObservationsVN` is active. Adding a metric is now a
deliberate API change, not a position-list extension that could lose
duplicates / missing categories silently.

Side effects of the new shape:
- `Misbehavior` category enum, `ConsensusOutputMisbehavior` enum,
  `V1_EXPECTED` array, `HashSet<Misbehavior>` dedup tracking, and the
  duplicate-row / missing-category warn loops are all gone — the
  conditions they defended against are unrepresentable in the new
  contract.
- The trait method gets a default impl returning the all-empty struct,
  so unwired Starfish gets correct behavior for free.
- A non-empty wrong-length row in
  `observations_from_consensus_output` now `debug_assert_eq!`s in
  dev/CI and `error!`s with structured fields in production. Wrong
  lengths mean consensus and iota-core disagree on committee size —
  programmer error, not adversarial input — but harmless because the
  zero-fill projection makes `merge_max(current, zeros)` a no-op.

Plus a small batch of follow-up cleanups identified during review:
- Drop the unused `_report_version` parameter on `Scoreboard::new`.
- `ReportAggregator::validate_report` derives committee size from
  internal state, removing the "could caller and aggregator disagree"
  question.
- Fix the `MisbehaviorReportVersion::from_protocol` panic message to
  say "scorer version" — it reads `scorer_version_as_option()`.
- Rename `current_local_counts` → `current_local_observations` to
  match the renamed type.
@piotrm50 piotrm50 requested review from a team as code owners May 14, 2026 13:09
@iota-ci iota-ci added consensus Issues related to the Core Consensus team core-protocol labels May 14, 2026
@piotrm50 piotrm50 self-assigned this May 14, 2026
@piotrm50 piotrm50 marked this pull request as draft May 14, 2026 13:14
@piotrm50
Copy link
Copy Markdown
Contributor Author

Replaced by a fresh umbrella PR off develop — the old base branch carried stale scorer-refactoring commits that already landed in develop. See replacement coming next.

@piotrm50 piotrm50 closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

consensus Issues related to the Core Consensus team core-protocol

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants