feat(starfish-core): score integration feature branch by piotrm50 · Pull Request #11533 · iotaledger/iota

piotrm50 · 2026-05-14T13:09:41Z

Summary

Umbrella PR for the Starfish score-integration work. Aggregates the stacked PRs below into a single feature branch so the full pipeline can be reviewed end-to-end and merged into `develop` once each piece is approved.

This PR is intentionally empty until the first child lands on the feature branch.

Stacked PRs (in merge order)

feat(starfish-core): introduce ScoringMetricsStore with storage persistence #10088 — Introduce `MisbehaviorStore` with storage persistence
feat(starfish-core): implement misbehavior tracking pipeline in DagState #11531 — Implement misbehavior tracking pipeline in `DagState`, including the eviction flush and faulty-block recording at every peer receive site

Tracking

Part of https://github.com/iotaledger/iota-private/issues/277

Notes for reviewers

The child PRs are the authoritative review surface. This PR only exists to consolidate the chain before landing on `develop`.
Stays in draft and gets rebased as each child merges; flipped to ready and merged into `develop` once everything below is green.

Test plan

All stacked PRs have their own test plans and pass `cargo test -p starfish-core`
Full repo CI green on this branch before merging to `develop`

- Drop spurious increment_invalid_reports_count when scoring is locally disabled. The sender did nothing wrong; only the local node has scoring off, so charging them as having sent an invalid report is semantically wrong. - Replace the verify-vs-process TODO with a one-line invariant comment. The reconfig guard at the top of the process arm already covers the end-of-epoch lock-out, and validation has fully moved into verify_consensus_transaction.

- Make Misbehaviors::from infallible by replacing the inherent method with the From trait. Every variant maps to Some, so the Option was misleading. - Replace bogus expect() template strings (whose {placeholders} never interpolate) with direct indexing in MisbehaviorCounts::get_value and get_metric. Rust's built-in panic prints the offending index. - Add Eq/Copy/Debug derives to Misbehaviors and Eq to Version. - Delete VersionedMisbehaviorReport::is_valid_version, which has no callers after the refactor (MisbehaviorConfig::accepts_report replaces it). - Mark ConsensusOutputMisbehaviors as #[allow(dead_code)] until the Starfish wiring lands; today misbehavior_counts() returns vec![].

Both directions of the MisbehaviorCounts <-> wire-format conversion now match on the Misbehaviors enum instead of indexing self.0[0..3]. Reordering ReportedMisbehaviors can no longer silently swap categories — adding or removing a variant becomes a compile error in the V1 arm. - to_report now takes &ReportedMisbehaviors and assigns LegacyReportPayload fields by matching on each row's variant. - The From<&VersionedMisbehaviorReport> impl is replaced by from_report, which builds rows in the order dictated by ReportedMisbehaviors. - Callers in MisbehaviorMonitor and ReportAggregator pass the schema through from MisbehaviorConfig.

The previous load -> merge_max -> store sequence on ArcSwapOption was not atomic: two concurrent process_report calls for the same authority could lose an update (the smaller merged value overwriting a larger one). The single-threaded consensus handler currently shields the aggregator from this race, but the type now enforces the invariant itself rather than relying on caller serialization. rcu retries the closure under contention, so concurrent merges serialize correctly. Reads remain lock-free.

Replace Vec<AtomicU64> with ArcSwap<Vec<u64>> for current_scores. The previous implementation stored each score with a per-cell Relaxed write inside update_scores_v1, so a checkpoint reader could observe a torn mix of old and new values across authorities. A single pointer swap now publishes the whole score vector at once; readers see a consistent snapshot. update_scores_v1 builds the new Vec<u64> locally and stores it in one shot, and current_scores() loads the snapshot and clones it.

Two leftovers from the scorer-refactoring rebase: - consensus_validator.rs imported and constructed MisbehaviorsV1, which was renamed to LegacyReportPayload in 972a704. Update the import and construction site. - consensus_handler.rs passed 8 args to starfish_core::CommittedSubDag::new which only takes 7; the extra vec![] was added in 47e51d4 during the consensus-output API wiring and never trimmed when the constructor shape settled. Drop the trailing argument.

Callers must call MisbehaviorSchemaVersion::accepts_report before constructing MisbehaviorCounts from a report; today the only path is ReportAggregator::process_report, which sits behind validate_report in verify_consensus_transaction. The assertion makes that contract explicit so future call sites or new wire-format versions can't silently produce wrongly-shaped counts.

The comment referenced the previous Vec<AtomicU64> storage; scores are now published as a single ArcSwap<Vec<u64>> snapshot.

…om_consensus_output The previous `.unwrap_or_default()` produced an empty `Vec<u64>` when a locally-tracked `Misbehaviors` variant was missing from the consensus output, violating the documented matrix invariant that each row's length equals `committee_size`. Downstream `get_value`/`get_metric` would then panic and `to_report` would build wire payloads peers reject. Take `committee_size` as a parameter and pre-allocate zero rows so missing categories project to zeros instead of empty rows. Flip the iteration to input -> schema, dropping the `swap_remove` (move-out is now free via the consuming `for` loop) and incidentally dropping silent duplicate-handling. Also document the single-writer assumption on the rate-limit triple in `MisbehaviorMonitor` to flag the implicit invariant for future readers.

The file no longer contains a MisbehaviorConfig struct; it holds the shared data model for the misbehavior subsystem (schema version, categories, counts, conversions). This commit moves the file with no content changes so the rename is detected by git review tools.

… Misbehavior(s)/ReportPayload Replace the in-memory `MisbehaviorCounts(Vec<Vec<u64>>)` with a versioned enum (`MisbehaviorCounts::V1(MisbehaviorCountsV1)`) whose V1 variant is a struct with named per-metric fields. Every operation that touches all metrics — `merge_max`, `to_report`, `from_report`, the median computation in `Scorer::calculate_median_report` — is now exhaustive at compile time via either a named struct literal or a `match` on `Misbehavior`. Adding a metric to the V1 struct produces missing-field errors at the call sites that need updating, replacing the previous documented "append-only" enum order convention. Renames bundled in for clarity: - `LegacyReportPayload` -> `ReportPayloadV1` (the active V1 wire format, not a deprecated one). - `Misbehaviors` -> `Misbehavior`, `ConsensusOutputMisbehaviors` -> `ConsensusOutputMisbehavior` (Rust idiom for tagged-union enums where each variant is a single instance). - Drop the unused `ReportPayload(Vec<Vec<u64>>)` placeholder. Future V2 will add `ReportPayloadV2` as a dedicated named-field struct. - Drop the duplicate `verify_legacy_payload` free function in iota-core; use the existing `ReportPayloadV1::verify` method on the type itself. `Scorer` now stores `schema_version` so it can resolve `Parameters` indices back to `Misbehavior` variants when looking up rows. `Parameters` itself remains positional and is flagged as a follow-up. BCS encoding of `DBReceivedReportsStatePerAuthority.received_metrics` changes from a length-prefixed nested array to an enum tag plus four named `Vec<u64>` fields. The loading path is dead-code-gated and the schema is unreleased; no migration needed.

… out of variants Reshape the type from an enum-with-OnceCell-per-variant into a struct that wraps a versioned `ReportPayload` enum and a single `#[serde(skip)]` digest cache. The digest is logically per-report, not per-payload-version, so it belongs in one place. Wire format is unchanged: BCS still serializes only `payload` (enum tag + the V1 named-field struct). Adding V2 now means appending one variant to `ReportPayload`; no per-variant cache to remember. Update call sites: pattern-match on `report.payload` instead of the report itself; constructors go through `VersionedMisbehaviorReport::new_v1`.

- scorer.rs Parameters: replace removed ReportedMisbehaviors reference with MisbehaviorSchemaVersion::reported_misbehaviors(). - messages_consensus.rs VersionedMisbehaviorReport: spell out the serde(skip)-only invariant rather than asserting layout equivalence to a bare ReportPayload. - messages_consensus.rs ReportPayload: drop the stale // V2(ReportPayloadV2), placeholder; the doc on the enum covers the extension pattern. - messages_consensus.rs ReportPayloadV1: convert to /// doc comment and note that field declaration order is part of the wire format. - misbehavior_monitor.rs rate-limit fields: clarify that the two AtomicU64s form a logical tuple while the AtomicBool is independent. - misbehavior.rs MisbehaviorCounts: clarify wire-format struct field order is load-bearing, in-memory mirror is not. - consensus_output_api.rs ConsensusOutputMisbehavior: explain why the enum is kept while Starfish wiring is pending. - report_aggregator.rs invalid_reports_count: point at validate_report as the bump site for the determinism rationale.

…ches Two related boundary cleanups around MisbehaviorCounts: - Drop the redundant version parameter from to_report and from_report. Each MisbehaviorCounts variant maps 1:1 to a ReportPayload variant, so the version was always derivable from self / report.payload. Removing it makes wire/schema mismatch unrepresentable, which lets the debug_assert! disappear with no loss of safety. Misbehavior gains Hash (used below) and the doc comment is updated to drop the misleading append-only claim — the enum is not serialized; what's load-bearing is the ordering of the schema's reported_misbehaviors() slice. - Make from_consensus_output's projection visible. It previously silently zero-filled missing categories and silently dropped extras. Now it warn-logs three drift conditions: row length not matching committee_size (skipped), duplicate category in the same output (overwritten), and category tracked locally but absent from consensus (zero-filled). No new metrics — operator log surface only. Plus a TODO at scorer.rs's MisbehaviorCounts::V1 destructure clarifying that V2 should branch the whole computation rather than be patched with an if let.

Two distinct failure modes (wrong wire-format variant vs. malformed payload) used to collapse into a bare bool, leaving the caller's warn log to say only "Received invalid misbehavior report from X". Operators investigating drift had no signal to distinguish a peer running a different schema version from a peer producing wrongly-shaped vectors. Introduce ReportValidationError { WrongSchemaVersion, PayloadShape } with Display, return Result<(), ReportValidationError> from validate_report, and propagate the reason in the warn at verify_consensus_transaction.

…rsV1 The Scorer had two structural problems that compounded each other: - Layout mismatch: Scorer's positional Parameters (six Vecs indexed by misbehavior position) didn't mirror the named-field shape of MisbehaviorCountsV1 / ReportPayloadV1. The only reason the indices worked was MisbehaviorSchemaVersion::reported_misbehaviors(), a slice that existed solely to bridge them. - Delegation seam in the wrong type: Scorer had two impl blocks. One delegated via match self.version, the other implemented V1 math directly on Scorer with _v1-suffixed methods. The version enum carried only data; the implementation lived elsewhere. This commit moves the per-version implementation onto the version enum: - ScorerVersion is renamed to VersionedScorer. Variants are full per-version structs (today: ScorerV1) that own their parameters AND their update_scores method. - ScorerV1 has named MetricParams fields mirroring MisbehaviorCountsV1. metric_to_score moves to MetricParams. The score loop iterates (row, params) pairs explicitly via metric_pairs(), no more metric_index: usize, no precomputed minor_indices/minor_weights/ major_indices, no MisbehaviorSchemaVersion::reported_misbehaviors() lookup. - Scorer collapses to a thin shell with one impl block: it owns the ArcSwap-published score vector and the voting-power table, and delegates to self.version.update_scores(...). Net: MisbehaviorSchemaVersion drops reported_misbehaviors() and num_metrics(). Its remaining API is from_protocol + accepts_report — it now versions only the wire format. The from_consensus_output warning loop uses a local const-array of expected V1 categories. Test numerical expectations are unchanged; the refactor is structural.

After commit D, the type's responsibilities shrank to from_protocol + accepts_report — it versions the wire-format report, not a schema. The schema lives in MisbehaviorCountsV1 / ReportPayloadV1 named-field types. "Schema" is misleading; "report version" is what it actually selects. Renames: - MisbehaviorSchemaVersion -> MisbehaviorReportVersion (the type) - schema_version -> report_version (fields and locals) - ReportValidationError::WrongSchemaVersion -> WrongReportVersion - mock_schema_version -> mock_report_version (test helpers)

The cfg(test) calculate_scores_v1 / calculate_median_report bridges on Scorer existed only to keep pre-VersionedScorer tests working without edits. They masked the version split with extra glue: a destructure out of MisbehaviorCounts followed by a match on VersionedScorer that re-destructured into the V1 variant. V1-specific tests now address ScorerV1 directly via ScorerV1::v1_parameters() and call median_report / score_from_median on it. They build MisbehaviorCountsV1 inline rather than wrapping in MisbehaviorCounts::V1(...). When V2 lands its tests live alongside ScorerV2 without touching V1's. Tests of Scorer (initialization, update_scores end-to-end) stay on the outer struct since they exercise the publish path / construction.

Mirror the AuthorityCapabilitiesV1 shape: the report carries its own authority + generation instead of being passed as a 3-tuple alongside ConsensusTransactionKind::MisbehaviorReport. Callers thread one value through, the consensus key reads provenance off the report, and the 'authority matches consensus sender' check stays at the boundary. Wire format is preserved by ordering the new struct fields as authority || payload || generation — BCS encodes a struct as the concatenation of its fields, so the bytes match the legacy 3-tuple. Two new tests in iota-types pin this: - misbehavior_report_wire_format_unchanged: struct-level bytes - misbehavior_report_consensus_kind_wire_format_unchanged: full ConsensusTransactionKind including the variant tag Required because misbehavior reports are already on testnet — any wire drift here would halt a running cluster.

The counter is incremented in two sites in verify_consensus_transaction but nothing reads it: no Prometheus metric, no DBMap persistence (the DBReceivedReportsStatePerAuthority naming and to_serializable machinery notwithstanding), no Scorer input, no outgoing report field. Documenting this in place so the next reader doesn't assume the snapshot infrastructure is load-bearing.

The previous TODO claimed the DBReceivedReportsStatePerAuthority / to_serializable scaffolding was unwired. It is — on this branch — but that's intentional: Olivia's score-integration-persist-scorer-state follow-up branch already contains the DBMap, the snapshot path through ConsensusCommitOutput, and the restore_from_tables() logic. Refactor the comment to record that context so the next reader doesn't try to delete the snapshot machinery thinking it's orphaned.

The in-memory MisbehaviorCounts/MisbehaviorCountsV1 mirrored the wire ReportPayload/ReportPayloadV1 1:1 — same fields, same types, From impls that were pure clones. The wrapper carried no behavior beyond merge_max, zeros, and from_consensus_output, and the metric() accessor was unused. Drop the mirror, rename ReportPayload[V1] to MisbehaviorObservations[V1] in iota-types so the wire type also names the in-memory role, and move the three remaining helpers to free functions in iota-core's misbehavior module. Wire format is unchanged — the BCS-pinning tests in iota-types::messages_consensus::tests still pass byte-for-byte. Also fix a stale comment referring to the "consensus crate" that should read "starfish crate".

A round of post-refactor cleanups identified during review: - Rename outer `Scorer` to `Scoreboard` — it owns the published score vector and voting-power table, the actual math lives in `VersionedScorer`/`ScorerV1`. The thing called "scorer" now actually scores. - Move the `MisbehaviorObservations::V1(_)` unwrap from `ScorerV1::median_report` up into `VersionedScorer::update_scores`. `ScorerV1` now consumes typed V1 reporters directly; when V2 lands the dispatcher's match becomes non-exhaustive and forces a deliberate decision instead of silently dropping V2 reports. - Make `MisbehaviorMonitor::update_from_consensus_output` fold new counts in via `merge_max` under RCU, so the local view is monotonic even if upstream produces a transient dip. Closure is pure (commutative + associative + idempotent), so RCU's may-run-multiple-times semantics are safe. - Drop the `ReceivedReportsState` newtype in `report_aggregator`. Its hand-rolled `Index`/`FromIterator` impls were pure delegation to the inner `Vec`; removing them removes ~14 lines of ceremony. - Simplify `reporters_with_voting_power` by switching to `load_full()` and a plain `for`/`if let`. Drops the explicit `Guard` + `as_ref` + `Arc::clone` dance. - Remove the `misbehavior_monitor_version` protocol-config field (introduced earlier on this branch, not yet shipped) and have `MisbehaviorReportVersion::from_protocol` read `scorer_version` instead. Scorer impl and report wire format now bump together — one knob, simpler model. Decoupling can be reintroduced in a future PR if a concrete upgrade requires it.

Replaces the position-tagged `Vec<(ConsensusOutputMisbehavior, Vec<u64>)>` return shape with a named-field bridge struct `ConsensusOutputMisbehaviorCounts`. The trait impl is now the single bridge between Starfish's internal observation state and IOTA's wire schema; the mapper projects the bridge struct onto whichever `MisbehaviorObservationsVN` is active. Adding a metric is now a deliberate API change, not a position-list extension that could lose duplicates / missing categories silently. Side effects of the new shape: - `Misbehavior` category enum, `ConsensusOutputMisbehavior` enum, `V1_EXPECTED` array, `HashSet<Misbehavior>` dedup tracking, and the duplicate-row / missing-category warn loops are all gone — the conditions they defended against are unrepresentable in the new contract. - The trait method gets a default impl returning the all-empty struct, so unwired Starfish gets correct behavior for free. - A non-empty wrong-length row in `observations_from_consensus_output` now `debug_assert_eq!`s in dev/CI and `error!`s with structured fields in production. Wrong lengths mean consensus and iota-core disagree on committee size — programmer error, not adversarial input — but harmless because the zero-fill projection makes `merge_max(current, zeros)` a no-op. Plus a small batch of follow-up cleanups identified during review: - Drop the unused `_report_version` parameter on `Scoreboard::new`. - `ReportAggregator::validate_report` derives committee size from internal state, removing the "could caller and aggregator disagree" question. - Fix the `MisbehaviorReportVersion::from_protocol` panic message to say "scorer version" — it reads `scorer_version_as_option()`. - Rename `current_local_counts` → `current_local_observations` to match the renamed type.

piotrm50 · 2026-05-14T13:17:26Z

Replaced by a fresh umbrella PR off develop — the old base branch carried stale scorer-refactoring commits that already landed in develop. See replacement coming next.

oliviasaa and others added 30 commits May 5, 2026 14:52

refactor: break Scorer into 2 components

a4f36b1

feat: decouple current_local_metrics_count from node and consensus

acce489

feat: create node specific VersionedScoringMetrics

2bf30b2

feat: create node specific MisbehaviorsV1

10c19d9

refactor: delete VersionedScoringMetrics from iota-types

cbb828a

feat: delete MisbehaviorsV1 and create ReportPayload

c2f586e

feat: pass MisbehaviorMonitor to Scorer initialization

19cce22

refactor: remove atomic types from scorer

cdd4610

feat: introduce MisbehaviorMonitorVersion and ReportedMisbehaviors

7377794

feat: wire misbehavior monitor to report creation

ade2ec1

feat: wire consensus output API to misbehavior monitor

804f285

feat: document misbehavior monitor module

d35d353

feat: introduce ReceivedReportsState

97cad11

refactor: Scorer initialization and Parameters

a7bc04e

refactor: score calculation simplification

960e539

refactor: improve documentation of Scorer module

abfe166

feat: introduce DB storage type for reports

688fc36

feat: update scorer tests

539fbb6

refactor: add MisbehaviorConfig

43726d2

refactor: add ReportAggregator

a268c1d

refactor: Scorer cleanup

5dc9ecc

refactor: misbehavior_monitor changes

e53dcc9

feat: add tests for ReportAggregator

681378f

fix: remove redundant clone

e12f211

piotrm50 added 19 commits May 5, 2026 14:52

fix: update stale comment after Scorer ArcSwap migration

f870a68

The comment referenced the previous Vec<AtomicU64> storage; scores are now published as a single ArcSwap<Vec<u64>> snapshot.

piotrm50 requested review from a team as code owners May 14, 2026 13:09

iota-ci added consensus Issues related to the Core Consensus team core-protocol labels May 14, 2026

piotrm50 self-assigned this May 14, 2026

piotrm50 marked this pull request as draft May 14, 2026 13:14

piotrm50 closed this May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(starfish-core): score integration feature branch#11533

feat(starfish-core): score integration feature branch#11533
piotrm50 wants to merge 51 commits into
developfrom
protocol-research/feat/score-integration-starfish

piotrm50 commented May 14, 2026 •

edited

Loading

Uh oh!

piotrm50 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

piotrm50 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stacked PRs (in merge order)

Tracking

Notes for reviewers

Test plan

Uh oh!

piotrm50 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

piotrm50 commented May 14, 2026 •

edited

Loading