Skip to content

refactor: break circular dependency over net_processing and dkgsessionhandler#7314

Draft
knst wants to merge 12 commits into
dashpay:developfrom
knst:refactor-peermanager-handlers-dkg
Draft

refactor: break circular dependency over net_processing and dkgsessionhandler#7314
knst wants to merge 12 commits into
dashpay:developfrom
knst:refactor-peermanager-handlers-dkg

Conversation

@knst
Copy link
Copy Markdown
Collaborator

@knst knst commented May 8, 2026

Issue being fixed or feature implemented

This PR is continuous of #7247
This PR is not direct dependency of kernel project.

This PR aim to resolve next issues:

  1. constructor of PeerManager uses references to unique_ptr to multiple objects that will be initialized later, such as:

    const std::unique_ptr<ActiveContext>& active_ctx,
    const std::unique_ptr<CDeterministicMNManager>& dmnman,
    const std::unique_ptr<CJWalletManager>& cj_walletman,
    const std::unique_ptr<LLMQContext>& llmq_ctx,
    const std::unique_ptr<llmq::ObserverContext>& observer_ctx,
    

That's a fragile design that has multiple assumptions about already initialized members and their life term

  1. Implementation of state machine for DKG mechanism and p2p implementation is tightly connected.

What was done?

  • CDKGSessionManager is reduced to a pure state class, it owns DB and provides 2 new helper: ForEachHandler / DoForHandler
  • CDKGSessionHandler and ActiveDKGSessionHandler loses its threading and ProcessMessage members
  • MessageProcessingResult usages are dropped from llmq/ consensus code
  • PeerManager forgot about Observer/ActiveContext and lost 2 unique_ptr& from constructor
  • new NetHandler NetDKG is introduced which takes responsibilities for p2p communications for DKG works and for running threads

How Has This Been Tested?

  • Run unit, functional tests
  • Run test/lint/lint-circular-dependencies.py linter

Removed circular dependency over dkgsessionhandler <-> net_processing

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@knst knst added this to the 24 milestone May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@thepastaclaw
Copy link
Copy Markdown

thepastaclaw commented May 8, 2026

Review Gate

Commit: f4b6aae8

  • Debounce: 2626m ago (need 30m)

  • CI checks: build failure: linux64_multiprocess-build / Build source, mac-build / Build source, linux64_fuzz-build / Build source

  • CodeRabbit review: comment found

  • Off-peak hours: peak window (5am-11am PT) — currently 06:46 AM PT Tuesday

  • Run review now (check to override)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR refactors LLMQ DKG (Distributed Key Generation) handling in Dash by separating network management concerns from context objects. The main changes migrate DKG phase operations from enqueueing messages into CDKGPendingMessages to returning std::optional<Message> types; introduce a new NetDKG handler class that owns thread management and network routing; simplify context constructors by removing BLS worker, masternode metadata, and quorum block processor dependencies; and decouple PeerManager from ActiveContext/ObserverContext by using a raw CActiveMasternodeManager* pointer for masternode detection. Related infrastructure changes include adding debug manager tracking methods, a spork-based DKG enablement check, and build configuration updates to include the new network handler sources.

Sequence Diagram(s)

The conditions for generating sequence diagrams are met. This PR introduces significant control flow changes with multi-component interactions across network message handling, phase execution, and context initialization.

sequenceDiagram
  participant Node as DKG Node
  participant NetDKG as NetDKG Handler
  participant SessionMgr as CDKGSessionManager
  participant SessionHdlr as CDKGSessionHandler
  participant ActiveDKG as ActiveDKGSession
  
  Node->>NetDKG: ProcessMessage(MSG_QUORUM_CONTRIB)
  NetDKG->>SessionMgr: route message (llmqType, quorumIndex)
  SessionMgr->>SessionHdlr: PushPendingMessage(serialized, hash)
  Note over SessionHdlr: Batches message in queue
  
  NetDKG->>NetDKG: HandleDKGRound()
  loop per_phase
    NetDKG->>SessionHdlr: ProcessPendingMessageBatch()
    SessionHdlr->>ActiveDKG: Contribute()/VerifyAndCommit()
    ActiveDKG-->>SessionHdlr: std::optional<Message>
    SessionHdlr->>NetDKG: RelayInvToParticipants()
  end
Loading
sequenceDiagram
  participant Init as Initialization
  participant ActiveCtx as ActiveContext
  participant PeerMgr as PeerManager
  participant NetDKG as NetDKG Handler
  participant Spork as CSporkManager
  
  Init->>ActiveCtx: construct(dmnman, qman, qsnapman, sigman)
  Init->>PeerMgr: make(nodeman=active_ctx ? active_ctx->nodeman.get() : nullptr)
  Note over PeerMgr: m_nodeman set
  Init->>NetDKG: construct(sporkman, dkgsman, qman)
  Init->>Spork: IsQuorumDKGEnabled()
  Note over NetDKG: Check spork for DKG enabled
  Init->>ActiveCtx: Start()
  Note over ActiveCtx: Start() takes no connman/peerman args
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely describes the main change: refactoring to break a circular dependency between net_processing and dkgsessionhandler, which is the central focus of the changeset.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset, explaining the issues being fixed, what was done, and how it was tested.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/active/dkgsession.cpp (1)

106-111: ⚡ Quick win

Move the sent* debug updates to the actual enqueue/broadcast path.

These methods now only build and return a message. Setting sentContributions, sentComplaint, sentJustification, and sentPrematureCommitment here records a successful send before NetDKG has actually serialized and queued/broadcast the payload.

Also applies to: 292-297, 382-387, 539-544

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/active/dkgsession.cpp` around lines 106 - 111, The
dkgDebugManager.UpdateLocalSessionStatus calls inside the message-builder
functions (e.g., setting CDKGDebugSessionStatus::statusBits.sentContributions,
sentComplaint, sentJustification, sentPrematureCommitment) must be removed from
those builders (the functions that build and return qc/messages) and moved into
the actual send path inside NetDKG — i.e., the code that serializes and
enqueues/broadcasts the payload. Locate the UpdateLocalSessionStatus calls in
the builders and delete them there, then add equivalent UpdateLocalSessionStatus
updates immediately after NetDKG performs the serialization/queuing/broadcast so
the debug flags reflect a real successful send.
src/llmq/debug.cpp (1)

213-228: 💤 Low value

Optional: make MarkAborted idempotent w.r.t. nTime.

MarkAborted's lambda always returns true, so each call bumps localStatus.nTime even when the session was already marked aborted. MarkPhaseAdvanced already does the right thing (returns changed). For consistency and to avoid spurious timestamp updates if the helper is invoked more than once on the same aborted session, consider returning a real changed flag.

♻️ Proposed change
 void CDKGDebugManager::MarkAborted(Consensus::LLMQType llmqType, int quorumIndex)
 {
     UpdateLocalSessionStatus(llmqType, quorumIndex, [&](CDKGDebugSessionStatus& status) {
+        if (status.statusBits.aborted) return false;
         status.statusBits.aborted = true;
         return true;
     });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/debug.cpp` around lines 213 - 228, MarkAborted currently always
returns true from its UpdateLocalSessionStatus lambda which forces
localStatus.nTime to update every call; change the lambda in
CDKGDebugManager::MarkAborted to compute a changed flag by comparing
status.statusBits.aborted with the new value, set status.statusBits.aborted =
true, and return that changed flag (i.e., return status.statusBits.aborted was
previously false). This makes MarkAborted idempotent like MarkPhaseAdvanced and
avoids spurious nTime updates.
src/llmq/net_dkg.cpp (1)

449-482: 💤 Low value

Inconsistent dynamic_cast usage between Start() and Interrupt(); consider tightening shutdown.

Start() uses the throwing reference form (dynamic_cast<ActiveDKGSessionHandler&>) while Interrupt() uses the safe pointer form. Both iterate the same handler set and both early-return on m_active == nullptr, so the invariant is identical and the two should agree.

The reference form also has a small resilience gap: if the cast were ever to throw mid-iteration, the threads already pushed into m_phase_threads would never be joined, because ~NetDKG() only calls DisconnectManagers() (line 254), not Stop(). Either use the pointer form here as well, or have the destructor call Stop() defensively so a partially-initialized state still cleans up.

♻️ Proposed alignment with `Interrupt()`
     m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) {
-        auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base);
-        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex());
-        m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] {
-            util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); });
-        });
+        auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base);
+        if (!Assume(handler != nullptr)) return;
+        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), handler->QuorumIndex());
+        m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] {
+            util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); });
+        });
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 449 - 482, Start() uses
dynamic_cast<ActiveDKGSessionHandler&> which can throw partway through filling
m_phase_threads and leave threads unjoined; make Start() mirror Interrupt() by
using the non-throwing pointer form (dynamic_cast<ActiveDKGSessionHandler*>)
when iterating m_qdkgsman.ForEachHandler so you only create threads for valid
handlers and avoid exceptions during the loop, ensuring m_phase_threads remains
consistent for later Stop() join; update the lambda in NetDKG::Start to check
the pointer, capture it safely, and call PhaseHandlerThread(handler) with the
pointer/ref as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/active/dkgsession.cpp`:
- Around line 106-111: The dkgDebugManager.UpdateLocalSessionStatus calls inside
the message-builder functions (e.g., setting
CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint,
sentJustification, sentPrematureCommitment) must be removed from those builders
(the functions that build and return qc/messages) and moved into the actual send
path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the
payload. Locate the UpdateLocalSessionStatus calls in the builders and delete
them there, then add equivalent UpdateLocalSessionStatus updates immediately
after NetDKG performs the serialization/queuing/broadcast so the debug flags
reflect a real successful send.

In `@src/llmq/debug.cpp`:
- Around line 213-228: MarkAborted currently always returns true from its
UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every
call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed
flag by comparing status.statusBits.aborted with the new value, set
status.statusBits.aborted = true, and return that changed flag (i.e., return
status.statusBits.aborted was previously false). This makes MarkAborted
idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.

In `@src/llmq/net_dkg.cpp`:
- Around line 449-482: Start() uses dynamic_cast<ActiveDKGSessionHandler&> which
can throw partway through filling m_phase_threads and leave threads unjoined;
make Start() mirror Interrupt() by using the non-throwing pointer form
(dynamic_cast<ActiveDKGSessionHandler*>) when iterating
m_qdkgsman.ForEachHandler so you only create threads for valid handlers and
avoid exceptions during the loop, ensuring m_phase_threads remains consistent
for later Stop() join; update the lambda in NetDKG::Start to check the pointer,
capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as
appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 757eb414-ab77-46e9-b643-a3f32d98e788

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd84aa and 53be42b.

📒 Files selected for processing (25)
  • src/Makefile.am
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/active/dkgsession.h
  • src/active/dkgsessionhandler.cpp
  • src/active/dkgsessionhandler.h
  • src/init.cpp
  • src/llmq/debug.cpp
  • src/llmq/debug.h
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
  • src/net_processing.cpp
  • src/net_processing.h
  • src/test/util/setup_common.cpp
  • test/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
  • test/lint/lint-circular-dependencies.py

@knst knst force-pushed the refactor-peermanager-handlers-dkg branch from 53be42b to f4b6aae Compare May 10, 2026 18:07
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
src/llmq/net_dkg.cpp (6)

100-110: 💤 Low value

Defensive null-check in fallback verification loop.

In the per-message fallback verification loop (lines 100-110), member is dereferenced at line 106 (member->dmn->pdmnState->pubKeyOperator.Get()) without a null check. The reasoning is that any nodeId whose GetMember() returned nullptr in the first loop was already inserted into ret (line 54) and is filtered by ret.count(nodeId) (line 101). That holds today, but an in-flight membership change between the two GetMember calls (or future refactors) would silently NPE here.

A cheap guard would future-proof this:

🛡️ Proposed defensive check
         auto member = session.GetMember(msg->proTxHash);
+        if (!member) {
+            ret.emplace(nodeId);
+            continue;
+        }
         bool valid = msg->sig.VerifyInsecure(member->dmn->pdmnState->pubKeyOperator.Get(), msg->GetSignHash());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 100 - 110, The fallback verification loop
can dereference a null member returned by session.GetMember; update the loop
that iterates over messages to defensively check the result of
session.GetMember(msg->proTxHash) before using
member->dmn->pdmnState->pubKeyOperator.Get(): if member is null, insert nodeId
into ret and continue, otherwise perform msg->sig.VerifyInsecure(...,
msg->GetSignHash()) as before; this prevents a potential NPE if membership
changes between the two GetMember calls.

466-473: 💤 Low value

Stop() is not idempotent w.r.t. unjoinable threads after early failure.

Stop() calls Interrupt() then joins all threads. If any thread in m_phase_threads is in a non-joinable state (e.g., already joined or std::thread default-constructed because creation failed), t.join() is skipped via the joinable() guard — good. But if a thread is running and throws an unhandled exception before stop is requested, PhaseHandlerThread only catches AbortPhaseException (line 492); any other exception will terminate the thread per std::terminate. The catch-all loop in PhaseHandlerThread does not include a generic catch (...).

Consider adding a final catch (const std::exception&) to log and break the loop cleanly, so Stop() always sees joinable but exited threads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 466 - 473, Stop() can hang on threads that
terminated due to unexpected exceptions because PhaseHandlerThread only catches
AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's
main loop (after the AbortPhaseException catch) that catches const
std::exception& (and optionally catch(...) as a fallback), logs the error with
context, and breaks/returns cleanly so the std::thread object exits normally and
becomes joinable for Stop() to join; reference PhaseHandlerThread,
AbortPhaseException, m_phase_threads and Interrupt() when locating where to add
the catch and logging.

244-244: 💤 Low value

Redundant temporary in m_active construction.

std::make_unique<ActiveDKG>(ActiveDKG{...}) constructs a temporary ActiveDKG and then move/copy-constructs another inside make_unique. You can forward the args directly:

♻️ Proposed simplification
-    m_active{std::make_unique<ActiveDKG>(ActiveDKG{dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman})}
+    m_active{std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman)}

This requires ActiveDKG to have an aggregate-equivalent constructor or a matching one; if it currently relies on aggregate brace-init, add an explicit constructor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` at line 244, The m_active member is being initialized
with a redundant temporary via std::make_unique<ActiveDKG>(ActiveDKG{...});
change the call to forward the constructor args directly (e.g.
std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman,
connman)) so no extra temporary is created, and if ActiveDKG does not currently
have a matching constructor add an explicit constructor on ActiveDKG that
accepts (dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman) to allow
direct in-place construction.

216-255: ⚖️ Poor tradeoff

Two near-identical constructors invite drift.

The observer (lines 216-231) and active (lines 233-253) constructors share initialization of m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the m_qdkgsman.InitializeHandlers(...) call, and the closing m_qman.ConnectManagers(...) call. The only differences are:

  • m_active (nullptr vs constructed)
  • the lambda type returned by InitializeHandlers (CDKGSessionHandler vs ActiveDKGSessionHandler)

Consider delegating the active constructor to the observer one, or factor common init into a private helper, to keep the two paths (and the ConnectManagers/DisconnectManagers lifecycle) in sync going forward.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 216 - 255, The two NetDKG constructors
duplicate common initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman,
m_quorums_watch, the InitializeHandlers/ConnectManagers lifecycle) and should be
unified: extract the shared setup into a private helper (e.g., InitCommon or
InitializeManagers) that takes a
std::function<std::unique_ptr<CDKGSessionHandler>(const Consensus::LLMQParams&,
int)> or a template/lambda wrapper, or else implement constructor delegation
from the active constructor to the observer constructor and only set m_active
afterward; update the observer constructor to call the helper (or be the
delegated target) and change the active constructor to only construct m_active
and supply the ActiveDKGSessionHandler-producing lambda to the shared
InitializeHandlers call, ensuring m_qman.ConnectManagers and
m_qdkgsman.InitializeHandlers remain in one place.

451-464: ⚡ Quick win

dynamic_cast<ActiveDKGSessionHandler&> at line 458 will throw on type mismatch.

Start() is guarded by m_active == nullptr (line 452), and the active-mode constructor (lines 246-251) installs ActiveDKGSessionHandler instances, so in practice the cast succeeds. However, m_qdkgsman.InitializeHandlers(...) is the only invariant that ties m_active != nullptr to "all handlers are ActiveDKGSessionHandler", and it lives outside this class. If that invariant is ever violated (a future code path that mixes handler types, or a partially initialized state), dynamic_cast on a reference will throw std::bad_cast from inside this lambda chain and may abort the node.

Interrupt() (line 479) already uses the safer pointer form. Consider mirroring that here:

🛡️ Safer cast in `Start()`
     m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) {
-        auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base);
-        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex());
-        m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] {
-            util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); });
-        });
+        auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base);
+        if (!handler) return;
+        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type),
+                                            handler->QuorumIndex());
+        m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] {
+            util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); });
+        });
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 451 - 464, The dynamic_cast in Start()
currently uses dynamic_cast<ActiveDKGSessionHandler&> inside the
m_qdkgsman.ForEachHandler lambda which will throw std::bad_cast on mismatch;
change it to use the pointer form dynamic_cast<ActiveDKGSessionHandler*>
(matching Interrupt()) and check for nullptr before using handler: if the cast
returns nullptr, log or skip creating the phase thread for that handler instead
of dereferencing/throwing, otherwise capture the pointer and pass it to
PhaseHandlerThread; update references in the lambda to use the pointer (e.g.,
handler->QuorumIndex(), PhaseHandlerThread(*handler)) so Start() no longer can
throw from a bad_cast.

257-294: 💤 Low value

vRecv is moved into pm later — make the rewind/peek pattern explicit.

The flow at lines 291-294 reads llmqType (1 byte) and quorumHash (32 bytes) and then rewinds both, with the goal of leaving vRecv untouched so the entire payload (including header bytes) can be hashed and stored at lines 349-352. This works, but:

  • The rewind order (uint256 then uint8_t) is the reverse of read order; that's intentional for Rewind but easy to break in the future.
  • A short comment would make the intent (peek-only) obvious to maintainers.
♻️ Suggested clarifying comment
+    // Peek llmqType + quorumHash without consuming so the full payload can be re-serialized
+    // and hashed below at line 349.
     Consensus::LLMQType llmqType;
     uint256 quorumHash;
     vRecv >> llmqType;
     vRecv >> quorumHash;
     vRecv.Rewind(sizeof(uint256));
     vRecv.Rewind(sizeof(uint8_t));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 257 - 294, The code in
NetDKG::ProcessMessage reads llmqType and quorumHash from vRecv and then calls
vRecv.Rewind in the reverse order to restore the stream for later hashing; make
this intent explicit and less fragile by adding a short comment above the
reads/Rewind calls describing that we are only peeking at the header bytes
(uint8_t then uint256) and that the subsequent Rewind calls intentionally undo
the reads so the full payload remains unchanged for hashing (used later when
moving vRecv into pm). Optionally, reorder the Rewind calls to mirror the read
order (Rewind(sizeof(uint8_t)) then Rewind(sizeof(uint256))) to reduce future
mistakes, while keeping the explanatory comment.
src/llmq/dkgsessionhandler.h (1)

112-121: 💤 Low value

Duplicate public: access specifiers.

Lines 112 and 121 both declare public: with no intervening private:/protected: block, making the second specifier redundant. Consider consolidating into a single section.

♻️ Proposed cleanup
 public:
     const Consensus::LLMQParams& params;

     // Do not guard these, they protect their internals themselves
     CDKGPendingMessages pendingContributions;
     CDKGPendingMessages pendingComplaints;
     CDKGPendingMessages pendingJustifications;
     CDKGPendingMessages pendingPrematureCommitments;

-public:
     explicit CDKGSessionHandler(const Consensus::LLMQParams& _params);
     virtual ~CDKGSessionHandler();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/dkgsessionhandler.h` around lines 112 - 121, The class header
contains two consecutive "public:" access specifiers (surrounding members like
params and CDKGPendingMessages
pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments),
making the second one redundant; remove the duplicate "public:" and consolidate
these members under a single public section so there is only one "public:"
before params and the pending* members.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmq/dkgsession.cpp`:
- Around line 627-629: The empty stubs for CDKGSession::FinalizeCommitments()
and CDKGSession::FinalizeSingleCommitment() silently disable producing mineable
commitments in HandleDKGRound; either restore the original finalize logic into
these two methods (port the pre-refactor bodies into
CDKGSession::FinalizeCommitments and CDKGSession::FinalizeSingleCommitment so
they return the actual CFinalCommitment(s) used by
qblockman.AddMineableCommitment(...)), or if this work is intentionally
deferred, replace the stubs with an explicit TODO and a runtime-safe guard: log
an error (process/ulogs) and assert or throw so the failure is visible (do not
silently return {}), referencing the methods by name (FinalizeCommitments and
FinalizeSingleCommitment) and ensuring HandleDKGRound’s expectations are met.

In `@src/llmq/net_dkg.cpp`:
- Around line 384-411: AlreadyHave reports DKG invs seen by observer nodes
(m_quorums_watch) but ProcessGetData currently returns early when m_active ==
nullptr causing watchers to claim possession but refuse getdata; remove the
early "if (m_active == nullptr) return false;" check from NetDKG::ProcessGetData
so that m_qdkgsman.GetContribution/GetComplaint/etc. are dynamically dispatched
(base virtuals will return false in non-serving handlers), and keep or update
the explanatory comment to note that observer handlers will be consulted via
m_qdkgsman even when m_active is null.
- Around line 375-379: The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.

---

Nitpick comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 112-121: The class header contains two consecutive "public:"
access specifiers (surrounding members like params and CDKGPendingMessages
pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments),
making the second one redundant; remove the duplicate "public:" and consolidate
these members under a single public section so there is only one "public:"
before params and the pending* members.

In `@src/llmq/net_dkg.cpp`:
- Around line 100-110: The fallback verification loop can dereference a null
member returned by session.GetMember; update the loop that iterates over
messages to defensively check the result of session.GetMember(msg->proTxHash)
before using member->dmn->pdmnState->pubKeyOperator.Get(): if member is null,
insert nodeId into ret and continue, otherwise perform
msg->sig.VerifyInsecure(..., msg->GetSignHash()) as before; this prevents a
potential NPE if membership changes between the two GetMember calls.
- Around line 466-473: Stop() can hang on threads that terminated due to
unexpected exceptions because PhaseHandlerThread only catches
AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's
main loop (after the AbortPhaseException catch) that catches const
std::exception& (and optionally catch(...) as a fallback), logs the error with
context, and breaks/returns cleanly so the std::thread object exits normally and
becomes joinable for Stop() to join; reference PhaseHandlerThread,
AbortPhaseException, m_phase_threads and Interrupt() when locating where to add
the catch and logging.
- Line 244: The m_active member is being initialized with a redundant temporary
via std::make_unique<ActiveDKG>(ActiveDKG{...}); change the call to forward the
constructor args directly (e.g. std::make_unique<ActiveDKG>(dmnman, mn_metaman,
dkgdbgman, qblockman, qsnapman, connman)) so no extra temporary is created, and
if ActiveDKG does not currently have a matching constructor add an explicit
constructor on ActiveDKG that accepts (dmnman, mn_metaman, dkgdbgman, qblockman,
qsnapman, connman) to allow direct in-place construction.
- Around line 216-255: The two NetDKG constructors duplicate common
initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the
InitializeHandlers/ConnectManagers lifecycle) and should be unified: extract the
shared setup into a private helper (e.g., InitCommon or InitializeManagers) that
takes a std::function<std::unique_ptr<CDKGSessionHandler>(const
Consensus::LLMQParams&, int)> or a template/lambda wrapper, or else implement
constructor delegation from the active constructor to the observer constructor
and only set m_active afterward; update the observer constructor to call the
helper (or be the delegated target) and change the active constructor to only
construct m_active and supply the ActiveDKGSessionHandler-producing lambda to
the shared InitializeHandlers call, ensuring m_qman.ConnectManagers and
m_qdkgsman.InitializeHandlers remain in one place.
- Around line 451-464: The dynamic_cast in Start() currently uses
dynamic_cast<ActiveDKGSessionHandler&> inside the m_qdkgsman.ForEachHandler
lambda which will throw std::bad_cast on mismatch; change it to use the pointer
form dynamic_cast<ActiveDKGSessionHandler*> (matching Interrupt()) and check for
nullptr before using handler: if the cast returns nullptr, log or skip creating
the phase thread for that handler instead of dereferencing/throwing, otherwise
capture the pointer and pass it to PhaseHandlerThread; update references in the
lambda to use the pointer (e.g., handler->QuorumIndex(),
PhaseHandlerThread(*handler)) so Start() no longer can throw from a bad_cast.
- Around line 257-294: The code in NetDKG::ProcessMessage reads llmqType and
quorumHash from vRecv and then calls vRecv.Rewind in the reverse order to
restore the stream for later hashing; make this intent explicit and less fragile
by adding a short comment above the reads/Rewind calls describing that we are
only peeking at the header bytes (uint8_t then uint256) and that the subsequent
Rewind calls intentionally undo the reads so the full payload remains unchanged
for hashing (used later when moving vRecv into pm). Optionally, reorder the
Rewind calls to mirror the read order (Rewind(sizeof(uint8_t)) then
Rewind(sizeof(uint256))) to reduce future mistakes, while keeping the
explanatory comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d8950d10-63a8-42c9-9f3f-6963acd754d2

📥 Commits

Reviewing files that changed from the base of the PR and between 53be42b and f4b6aae.

📒 Files selected for processing (16)
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/init.cpp
  • src/llmq/dkgsession.cpp
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
🚧 Files skipped from review as they are similar to previous changes (12)
  • src/llmq/options.cpp
  • src/llmq/observer.cpp
  • src/llmq/options.h
  • src/llmq/net_dkg.h
  • src/llmq/dkgsessionmgr.cpp
  • src/active/context.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/observer.h
  • src/llmq/dkgsessionmgr.h
  • src/llmq/dkgsession.h
  • src/active/dkgsession.cpp
  • src/init.cpp

Comment thread src/llmq/dkgsession.cpp
Comment on lines +627 to +629
std::vector<CFinalCommitment> CDKGSession::FinalizeCommitments() { return {}; }

CFinalCommitment CDKGSession::FinalizeSingleCommitment() { return {}; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Empty FinalizeCommitments() and FinalizeSingleCommitment() will silently break DKG output.

Both implementations return an empty vector and a default-constructed (null) CFinalCommitment respectively. These are the only paths in HandleDKGRound (see src/llmq/net_dkg.cpp lines 522-527 and 594-599) that produce mineable commitments via qblockman.AddMineableCommitment(...). With these stubs:

  • FinalizeSingleCommitment() returns {} which IsNull() reports true → branch is skipped, no single-member commitment is ever produced.
  • FinalizeCommitments() returns an empty vector → the for loop never executes, no multi-member commitment is ever relayed/mined.

If this is intentional (work-in-progress to be filled in by a follow-up commit/PR), please add a TODO/FIXME and a clear log/assert so the regression is not invisible at runtime; otherwise DKG silently never completes a quorum on this branch.

Could you confirm whether the original (pre-refactor) finalize logic is supposed to land in this PR or in a follow-up? I can help port the previous body if you'd like.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/dkgsession.cpp` around lines 627 - 629, The empty stubs for
CDKGSession::FinalizeCommitments() and CDKGSession::FinalizeSingleCommitment()
silently disable producing mineable commitments in HandleDKGRound; either
restore the original finalize logic into these two methods (port the
pre-refactor bodies into CDKGSession::FinalizeCommitments and
CDKGSession::FinalizeSingleCommitment so they return the actual
CFinalCommitment(s) used by qblockman.AddMineableCommitment(...)), or if this
work is intentionally deferred, replace the stubs with an explicit TODO and a
runtime-safe guard: log an error (process/ulogs) and assert or throw so the
failure is visible (do not silently return {}), referencing the methods by name
(FinalizeCommitments and FinalizeSingleCommitment) and ensuring HandleDKGRound’s
expectations are met.

Comment thread src/llmq/net_dkg.cpp
Comment on lines +375 to +379
if (!dispatched) {
LogPrintf("NetDKG -- no session handlers for quorumIndex [%d]\n", quorumIndex);
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

PeerMisbehaving(100) for missing handler may be too punitive.

If DoForHandler returns false because no session handler exists for {llmqType, quorumIndex} (e.g., handlers haven't been initialized yet for this combination, or this node hasn't discovered that quorum is mineable), we instantly ban the peer with score 100. Earlier paths use 10 for "we might be lagging behind" cases (e.g., line 319). If a remote peer is slightly ahead, we'd ban legitimate participants.

Consider lowering to 10, or returning silently with a LogPrint, since the peer's message itself was structurally valid up to this point.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 375 - 379, The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.

Comment thread src/llmq/net_dkg.cpp
Comment on lines +384 to +411
bool NetDKG::AlreadyHave(const CInv& inv)
{
switch (inv.type) {
case MSG_QUORUM_CONTRIB:
case MSG_QUORUM_COMPLAINT:
case MSG_QUORUM_JUSTIFICATION:
case MSG_QUORUM_PREMATURE_COMMITMENT: {
if (!IsQuorumDKGEnabled(m_sporkman)) return false;
bool seen = false;
m_qdkgsman.ForEachHandler([&](CDKGSessionHandler& h) {
if (seen) return;
if (h.pendingContributions.HasSeen(inv.hash) || h.pendingComplaints.HasSeen(inv.hash) ||
h.pendingJustifications.HasSeen(inv.hash) || h.pendingPrematureCommitments.HasSeen(inv.hash)) {
seen = true;
}
});
return seen;
}
}
return false;
}

bool NetDKG::ProcessGetData(CNode& pfrom, const CInv& inv, CConnman& connman, const CNetMsgMaker& msgMaker)
{
// Default implementations of GetContribution and the other virtual methods
// return false in observer mode; m_active is only an early exit and does
// not affect logic.
if (m_active == nullptr) return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Asymmetry between AlreadyHave and ProcessGetData for observer-mode (qwatch) nodes.

AlreadyHave (line 391) only gates on IsQuorumDKGEnabled(m_sporkman) and walks every handler's pending sets — observer/m_quorums_watch nodes will report true for DKG inv hashes they've seen.

ProcessGetData (line 411), however, short-circuits to false whenever m_active == nullptr, even though m_quorums_watch == true nodes do receive and forward DKG messages. The result: a watching node will tell peers "I already have this" via AlreadyHave, but then refuse to serve it via getdata. Peers can be left waiting/timing out instead of fetching from another source.

The inline comment ("default implementations of GetContribution... return false in observer mode") is true for the base CDKGSessionHandler, but the dispatch goes through m_qdkgsman.GetContribution(...) which dynamic-dispatches to the active handler in active mode. For pure m_quorums_watch=true (no m_active) the base virtuals do return false anyway — so the early return is redundant for that case and harmful for the truly-observer case.

Consider either:

  1. Removing the m_active == nullptr early return and relying on the virtual dispatch to return false naturally, or
  2. Symmetrically gating AlreadyHave so it returns false when m_active == nullptr.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 384 - 411, AlreadyHave reports DKG invs
seen by observer nodes (m_quorums_watch) but ProcessGetData currently returns
early when m_active == nullptr causing watchers to claim possession but refuse
getdata; remove the early "if (m_active == nullptr) return false;" check from
NetDKG::ProcessGetData so that m_qdkgsman.GetContribution/GetComplaint/etc. are
dynamically dispatched (base virtuals will return false in non-serving
handlers), and keep or update the explanatory comment to note that observer
handlers will be consulted via m_qdkgsman even when m_active is null.

@PastaPastaPasta PastaPastaPasta marked this pull request as draft May 12, 2026 13:53
@github-actions
Copy link
Copy Markdown

This pull request has conflicts, please rebase.

knst added 9 commits May 12, 2026 21:14
It shows the hidden circular dependency and tidy up list of includes
- removed method CDKGPendingMessages::Misbehaving(NodeId, int, PeerManager&), ProcessPendingMessageBatch calls peerman.Misbehaving(...) directly
- renamed PushPendingMessage<Message>(NodeId, Message&, PeerManager&) to PushOwnPendingMessage for clear distinction of path with node=-1 (self made)
…from PeerManager

Re-ordered initialization of PeerManager and ActiveContext / ObserverContext, PeerManager::make now takes nodeman raw ptr (or nullptr).

It resolves several circular dependencies over net_processing and removes several unique_ptr<T&> work-arounds from PeerManager
It helps to drop dependency of llmq/dkgsessionhandler on network code
@knst knst force-pushed the refactor-peermanager-handlers-dkg branch from f4b6aae to 248ccaf Compare May 12, 2026 14:18
knst added 3 commits May 12, 2026 21:22
 - moved implementation of ProcessMessage and AlreadyHave to NetDKG
 - drop usages of MessageProcessingResult in CDKGSessionManager
 - introduced a new helper DoForHandler
@knst knst force-pushed the refactor-peermanager-handlers-dkg branch from 248ccaf to c1c4e2a Compare May 12, 2026 14:22
@github-actions
Copy link
Copy Markdown

This pull request has conflicts, please rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants