refactor: break circular dependency over net_processing and dkgsessionhandler#7314
refactor: break circular dependency over net_processing and dkgsessionhandler#7314knst wants to merge 12 commits into
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
Review GateCommit:
|
WalkthroughThis PR refactors LLMQ DKG (Distributed Key Generation) handling in Dash by separating network management concerns from context objects. The main changes migrate DKG phase operations from enqueueing messages into Sequence Diagram(s)The conditions for generating sequence diagrams are met. This PR introduces significant control flow changes with multi-component interactions across network message handling, phase execution, and context initialization. sequenceDiagram
participant Node as DKG Node
participant NetDKG as NetDKG Handler
participant SessionMgr as CDKGSessionManager
participant SessionHdlr as CDKGSessionHandler
participant ActiveDKG as ActiveDKGSession
Node->>NetDKG: ProcessMessage(MSG_QUORUM_CONTRIB)
NetDKG->>SessionMgr: route message (llmqType, quorumIndex)
SessionMgr->>SessionHdlr: PushPendingMessage(serialized, hash)
Note over SessionHdlr: Batches message in queue
NetDKG->>NetDKG: HandleDKGRound()
loop per_phase
NetDKG->>SessionHdlr: ProcessPendingMessageBatch()
SessionHdlr->>ActiveDKG: Contribute()/VerifyAndCommit()
ActiveDKG-->>SessionHdlr: std::optional<Message>
SessionHdlr->>NetDKG: RelayInvToParticipants()
end
sequenceDiagram
participant Init as Initialization
participant ActiveCtx as ActiveContext
participant PeerMgr as PeerManager
participant NetDKG as NetDKG Handler
participant Spork as CSporkManager
Init->>ActiveCtx: construct(dmnman, qman, qsnapman, sigman)
Init->>PeerMgr: make(nodeman=active_ctx ? active_ctx->nodeman.get() : nullptr)
Note over PeerMgr: m_nodeman set
Init->>NetDKG: construct(sporkman, dkgsman, qman)
Init->>Spork: IsQuorumDKGEnabled()
Note over NetDKG: Check spork for DKG enabled
Init->>ActiveCtx: Start()
Note over ActiveCtx: Start() takes no connman/peerman args
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/active/dkgsession.cpp (1)
106-111: ⚡ Quick winMove the
sent*debug updates to the actual enqueue/broadcast path.These methods now only build and return a message. Setting
sentContributions,sentComplaint,sentJustification, andsentPrematureCommitmenthere records a successful send beforeNetDKGhas actually serialized and queued/broadcast the payload.Also applies to: 292-297, 382-387, 539-544
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/active/dkgsession.cpp` around lines 106 - 111, The dkgDebugManager.UpdateLocalSessionStatus calls inside the message-builder functions (e.g., setting CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint, sentJustification, sentPrematureCommitment) must be removed from those builders (the functions that build and return qc/messages) and moved into the actual send path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the payload. Locate the UpdateLocalSessionStatus calls in the builders and delete them there, then add equivalent UpdateLocalSessionStatus updates immediately after NetDKG performs the serialization/queuing/broadcast so the debug flags reflect a real successful send.src/llmq/debug.cpp (1)
213-228: 💤 Low valueOptional: make
MarkAbortedidempotent w.r.t.nTime.
MarkAborted's lambda always returnstrue, so each call bumpslocalStatus.nTimeeven when the session was already marked aborted.MarkPhaseAdvancedalready does the right thing (returnschanged). For consistency and to avoid spurious timestamp updates if the helper is invoked more than once on the same aborted session, consider returning a real changed flag.♻️ Proposed change
void CDKGDebugManager::MarkAborted(Consensus::LLMQType llmqType, int quorumIndex) { UpdateLocalSessionStatus(llmqType, quorumIndex, [&](CDKGDebugSessionStatus& status) { + if (status.statusBits.aborted) return false; status.statusBits.aborted = true; return true; }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/debug.cpp` around lines 213 - 228, MarkAborted currently always returns true from its UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed flag by comparing status.statusBits.aborted with the new value, set status.statusBits.aborted = true, and return that changed flag (i.e., return status.statusBits.aborted was previously false). This makes MarkAborted idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.src/llmq/net_dkg.cpp (1)
449-482: 💤 Low valueInconsistent
dynamic_castusage betweenStart()andInterrupt(); consider tightening shutdown.
Start()uses the throwing reference form (dynamic_cast<ActiveDKGSessionHandler&>) whileInterrupt()uses the safe pointer form. Both iterate the same handler set and both early-return onm_active == nullptr, so the invariant is identical and the two should agree.The reference form also has a small resilience gap: if the cast were ever to throw mid-iteration, the threads already pushed into
m_phase_threadswould never be joined, because~NetDKG()only callsDisconnectManagers()(line 254), notStop(). Either use the pointer form here as well, or have the destructor callStop()defensively so a partially-initialized state still cleans up.♻️ Proposed alignment with `Interrupt()`
m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) { - auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base); - std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex()); - m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] { - util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); }); - }); + auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base); + if (!Assume(handler != nullptr)) return; + std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), handler->QuorumIndex()); + m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] { + util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); }); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 449 - 482, Start() uses dynamic_cast<ActiveDKGSessionHandler&> which can throw partway through filling m_phase_threads and leave threads unjoined; make Start() mirror Interrupt() by using the non-throwing pointer form (dynamic_cast<ActiveDKGSessionHandler*>) when iterating m_qdkgsman.ForEachHandler so you only create threads for valid handlers and avoid exceptions during the loop, ensuring m_phase_threads remains consistent for later Stop() join; update the lambda in NetDKG::Start to check the pointer, capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/active/dkgsession.cpp`:
- Around line 106-111: The dkgDebugManager.UpdateLocalSessionStatus calls inside
the message-builder functions (e.g., setting
CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint,
sentJustification, sentPrematureCommitment) must be removed from those builders
(the functions that build and return qc/messages) and moved into the actual send
path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the
payload. Locate the UpdateLocalSessionStatus calls in the builders and delete
them there, then add equivalent UpdateLocalSessionStatus updates immediately
after NetDKG performs the serialization/queuing/broadcast so the debug flags
reflect a real successful send.
In `@src/llmq/debug.cpp`:
- Around line 213-228: MarkAborted currently always returns true from its
UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every
call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed
flag by comparing status.statusBits.aborted with the new value, set
status.statusBits.aborted = true, and return that changed flag (i.e., return
status.statusBits.aborted was previously false). This makes MarkAborted
idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.
In `@src/llmq/net_dkg.cpp`:
- Around line 449-482: Start() uses dynamic_cast<ActiveDKGSessionHandler&> which
can throw partway through filling m_phase_threads and leave threads unjoined;
make Start() mirror Interrupt() by using the non-throwing pointer form
(dynamic_cast<ActiveDKGSessionHandler*>) when iterating
m_qdkgsman.ForEachHandler so you only create threads for valid handlers and
avoid exceptions during the loop, ensuring m_phase_threads remains consistent
for later Stop() join; update the lambda in NetDKG::Start to check the pointer,
capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as
appropriate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 757eb414-ab77-46e9-b643-a3f32d98e788
📒 Files selected for processing (25)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/active/dkgsession.hsrc/active/dkgsessionhandler.cppsrc/active/dkgsessionhandler.hsrc/init.cppsrc/llmq/debug.cppsrc/llmq/debug.hsrc/llmq/dkgsession.hsrc/llmq/dkgsessionhandler.cppsrc/llmq/dkgsessionhandler.hsrc/llmq/dkgsessionmgr.cppsrc/llmq/dkgsessionmgr.hsrc/llmq/net_dkg.cppsrc/llmq/net_dkg.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/options.cppsrc/llmq/options.hsrc/net_processing.cppsrc/net_processing.hsrc/test/util/setup_common.cpptest/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
- test/lint/lint-circular-dependencies.py
53be42b to
f4b6aae
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
src/llmq/net_dkg.cpp (6)
100-110: 💤 Low valueDefensive null-check in fallback verification loop.
In the per-message fallback verification loop (lines 100-110),
memberis dereferenced at line 106 (member->dmn->pdmnState->pubKeyOperator.Get()) without a null check. The reasoning is that anynodeIdwhoseGetMember()returnednullptrin the first loop was already inserted intoret(line 54) and is filtered byret.count(nodeId)(line 101). That holds today, but an in-flight membership change between the twoGetMembercalls (or future refactors) would silently NPE here.A cheap guard would future-proof this:
🛡️ Proposed defensive check
auto member = session.GetMember(msg->proTxHash); + if (!member) { + ret.emplace(nodeId); + continue; + } bool valid = msg->sig.VerifyInsecure(member->dmn->pdmnState->pubKeyOperator.Get(), msg->GetSignHash());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 100 - 110, The fallback verification loop can dereference a null member returned by session.GetMember; update the loop that iterates over messages to defensively check the result of session.GetMember(msg->proTxHash) before using member->dmn->pdmnState->pubKeyOperator.Get(): if member is null, insert nodeId into ret and continue, otherwise perform msg->sig.VerifyInsecure(..., msg->GetSignHash()) as before; this prevents a potential NPE if membership changes between the two GetMember calls.
466-473: 💤 Low value
Stop()is not idempotent w.r.t. unjoinable threads after early failure.
Stop()callsInterrupt()then joins all threads. If any thread inm_phase_threadsis in a non-joinable state (e.g., already joined orstd::threaddefault-constructed because creation failed),t.join()is skipped via thejoinable()guard — good. But if a thread is running and throws an unhandled exception before stop is requested,PhaseHandlerThreadonly catchesAbortPhaseException(line 492); any other exception will terminate the thread perstd::terminate. The catch-all loop inPhaseHandlerThreaddoes not include a genericcatch (...).Consider adding a final
catch (const std::exception&)to log and break the loop cleanly, soStop()always sees joinable but exited threads.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 466 - 473, Stop() can hang on threads that terminated due to unexpected exceptions because PhaseHandlerThread only catches AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's main loop (after the AbortPhaseException catch) that catches const std::exception& (and optionally catch(...) as a fallback), logs the error with context, and breaks/returns cleanly so the std::thread object exits normally and becomes joinable for Stop() to join; reference PhaseHandlerThread, AbortPhaseException, m_phase_threads and Interrupt() when locating where to add the catch and logging.
244-244: 💤 Low valueRedundant temporary in
m_activeconstruction.
std::make_unique<ActiveDKG>(ActiveDKG{...})constructs a temporaryActiveDKGand then move/copy-constructs another insidemake_unique. You can forward the args directly:♻️ Proposed simplification
- m_active{std::make_unique<ActiveDKG>(ActiveDKG{dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman})} + m_active{std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman)}This requires
ActiveDKGto have an aggregate-equivalent constructor or a matching one; if it currently relies on aggregate brace-init, add an explicit constructor.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` at line 244, The m_active member is being initialized with a redundant temporary via std::make_unique<ActiveDKG>(ActiveDKG{...}); change the call to forward the constructor args directly (e.g. std::make_unique<ActiveDKG>(dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman)) so no extra temporary is created, and if ActiveDKG does not currently have a matching constructor add an explicit constructor on ActiveDKG that accepts (dmnman, mn_metaman, dkgdbgman, qblockman, qsnapman, connman) to allow direct in-place construction.
216-255: ⚖️ Poor tradeoffTwo near-identical constructors invite drift.
The observer (lines 216-231) and active (lines 233-253) constructors share initialization of
m_qdkgsman,m_qman,m_sporkman,m_chainman,m_quorums_watch, them_qdkgsman.InitializeHandlers(...)call, and the closingm_qman.ConnectManagers(...)call. The only differences are:
m_active(nullptrvs constructed)- the lambda type returned by
InitializeHandlers(CDKGSessionHandlervsActiveDKGSessionHandler)Consider delegating the active constructor to the observer one, or factor common init into a private helper, to keep the two paths (and the
ConnectManagers/DisconnectManagerslifecycle) in sync going forward.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 216 - 255, The two NetDKG constructors duplicate common initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the InitializeHandlers/ConnectManagers lifecycle) and should be unified: extract the shared setup into a private helper (e.g., InitCommon or InitializeManagers) that takes a std::function<std::unique_ptr<CDKGSessionHandler>(const Consensus::LLMQParams&, int)> or a template/lambda wrapper, or else implement constructor delegation from the active constructor to the observer constructor and only set m_active afterward; update the observer constructor to call the helper (or be the delegated target) and change the active constructor to only construct m_active and supply the ActiveDKGSessionHandler-producing lambda to the shared InitializeHandlers call, ensuring m_qman.ConnectManagers and m_qdkgsman.InitializeHandlers remain in one place.
451-464: ⚡ Quick win
dynamic_cast<ActiveDKGSessionHandler&>at line 458 will throw on type mismatch.
Start()is guarded bym_active == nullptr(line 452), and the active-mode constructor (lines 246-251) installsActiveDKGSessionHandlerinstances, so in practice the cast succeeds. However,m_qdkgsman.InitializeHandlers(...)is the only invariant that tiesm_active != nullptrto "all handlers areActiveDKGSessionHandler", and it lives outside this class. If that invariant is ever violated (a future code path that mixes handler types, or a partially initialized state),dynamic_caston a reference will throwstd::bad_castfrom inside this lambda chain and may abort the node.
Interrupt()(line 479) already uses the safer pointer form. Consider mirroring that here:🛡️ Safer cast in `Start()`
m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) { - auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base); - std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex()); - m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] { - util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); }); - }); + auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base); + if (!handler) return; + std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), + handler->QuorumIndex()); + m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] { + util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); }); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 451 - 464, The dynamic_cast in Start() currently uses dynamic_cast<ActiveDKGSessionHandler&> inside the m_qdkgsman.ForEachHandler lambda which will throw std::bad_cast on mismatch; change it to use the pointer form dynamic_cast<ActiveDKGSessionHandler*> (matching Interrupt()) and check for nullptr before using handler: if the cast returns nullptr, log or skip creating the phase thread for that handler instead of dereferencing/throwing, otherwise capture the pointer and pass it to PhaseHandlerThread; update references in the lambda to use the pointer (e.g., handler->QuorumIndex(), PhaseHandlerThread(*handler)) so Start() no longer can throw from a bad_cast.
257-294: 💤 Low value
vRecvis moved intopmlater — make the rewind/peek pattern explicit.The flow at lines 291-294 reads
llmqType(1 byte) andquorumHash(32 bytes) and then rewinds both, with the goal of leavingvRecvuntouched so the entire payload (including header bytes) can be hashed and stored at lines 349-352. This works, but:
- The rewind order (uint256 then uint8_t) is the reverse of read order; that's intentional for
Rewindbut easy to break in the future.- A short comment would make the intent (peek-only) obvious to maintainers.
♻️ Suggested clarifying comment
+ // Peek llmqType + quorumHash without consuming so the full payload can be re-serialized + // and hashed below at line 349. Consensus::LLMQType llmqType; uint256 quorumHash; vRecv >> llmqType; vRecv >> quorumHash; vRecv.Rewind(sizeof(uint256)); vRecv.Rewind(sizeof(uint8_t));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 257 - 294, The code in NetDKG::ProcessMessage reads llmqType and quorumHash from vRecv and then calls vRecv.Rewind in the reverse order to restore the stream for later hashing; make this intent explicit and less fragile by adding a short comment above the reads/Rewind calls describing that we are only peeking at the header bytes (uint8_t then uint256) and that the subsequent Rewind calls intentionally undo the reads so the full payload remains unchanged for hashing (used later when moving vRecv into pm). Optionally, reorder the Rewind calls to mirror the read order (Rewind(sizeof(uint8_t)) then Rewind(sizeof(uint256))) to reduce future mistakes, while keeping the explanatory comment.src/llmq/dkgsessionhandler.h (1)
112-121: 💤 Low valueDuplicate
public:access specifiers.Lines 112 and 121 both declare
public:with no interveningprivate:/protected:block, making the second specifier redundant. Consider consolidating into a single section.♻️ Proposed cleanup
public: const Consensus::LLMQParams& params; // Do not guard these, they protect their internals themselves CDKGPendingMessages pendingContributions; CDKGPendingMessages pendingComplaints; CDKGPendingMessages pendingJustifications; CDKGPendingMessages pendingPrematureCommitments; -public: explicit CDKGSessionHandler(const Consensus::LLMQParams& _params); virtual ~CDKGSessionHandler();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/dkgsessionhandler.h` around lines 112 - 121, The class header contains two consecutive "public:" access specifiers (surrounding members like params and CDKGPendingMessages pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments), making the second one redundant; remove the duplicate "public:" and consolidate these members under a single public section so there is only one "public:" before params and the pending* members.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llmq/dkgsession.cpp`:
- Around line 627-629: The empty stubs for CDKGSession::FinalizeCommitments()
and CDKGSession::FinalizeSingleCommitment() silently disable producing mineable
commitments in HandleDKGRound; either restore the original finalize logic into
these two methods (port the pre-refactor bodies into
CDKGSession::FinalizeCommitments and CDKGSession::FinalizeSingleCommitment so
they return the actual CFinalCommitment(s) used by
qblockman.AddMineableCommitment(...)), or if this work is intentionally
deferred, replace the stubs with an explicit TODO and a runtime-safe guard: log
an error (process/ulogs) and assert or throw so the failure is visible (do not
silently return {}), referencing the methods by name (FinalizeCommitments and
FinalizeSingleCommitment) and ensuring HandleDKGRound’s expectations are met.
In `@src/llmq/net_dkg.cpp`:
- Around line 384-411: AlreadyHave reports DKG invs seen by observer nodes
(m_quorums_watch) but ProcessGetData currently returns early when m_active ==
nullptr causing watchers to claim possession but refuse getdata; remove the
early "if (m_active == nullptr) return false;" check from NetDKG::ProcessGetData
so that m_qdkgsman.GetContribution/GetComplaint/etc. are dynamically dispatched
(base virtuals will return false in non-serving handlers), and keep or update
the explanatory comment to note that observer handlers will be consulted via
m_qdkgsman even when m_active is null.
- Around line 375-379: The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.
---
Nitpick comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 112-121: The class header contains two consecutive "public:"
access specifiers (surrounding members like params and CDKGPendingMessages
pendingContributions/pendingComplaints/pendingJustifications/pendingPrematureCommitments),
making the second one redundant; remove the duplicate "public:" and consolidate
these members under a single public section so there is only one "public:"
before params and the pending* members.
In `@src/llmq/net_dkg.cpp`:
- Around line 100-110: The fallback verification loop can dereference a null
member returned by session.GetMember; update the loop that iterates over
messages to defensively check the result of session.GetMember(msg->proTxHash)
before using member->dmn->pdmnState->pubKeyOperator.Get(): if member is null,
insert nodeId into ret and continue, otherwise perform
msg->sig.VerifyInsecure(..., msg->GetSignHash()) as before; this prevents a
potential NPE if membership changes between the two GetMember calls.
- Around line 466-473: Stop() can hang on threads that terminated due to
unexpected exceptions because PhaseHandlerThread only catches
AbortPhaseException; add a broad exception handler inside PhaseHandlerThread's
main loop (after the AbortPhaseException catch) that catches const
std::exception& (and optionally catch(...) as a fallback), logs the error with
context, and breaks/returns cleanly so the std::thread object exits normally and
becomes joinable for Stop() to join; reference PhaseHandlerThread,
AbortPhaseException, m_phase_threads and Interrupt() when locating where to add
the catch and logging.
- Line 244: The m_active member is being initialized with a redundant temporary
via std::make_unique<ActiveDKG>(ActiveDKG{...}); change the call to forward the
constructor args directly (e.g. std::make_unique<ActiveDKG>(dmnman, mn_metaman,
dkgdbgman, qblockman, qsnapman, connman)) so no extra temporary is created, and
if ActiveDKG does not currently have a matching constructor add an explicit
constructor on ActiveDKG that accepts (dmnman, mn_metaman, dkgdbgman, qblockman,
qsnapman, connman) to allow direct in-place construction.
- Around line 216-255: The two NetDKG constructors duplicate common
initialization (m_qdkgsman, m_qman, m_sporkman, m_chainman, m_quorums_watch, the
InitializeHandlers/ConnectManagers lifecycle) and should be unified: extract the
shared setup into a private helper (e.g., InitCommon or InitializeManagers) that
takes a std::function<std::unique_ptr<CDKGSessionHandler>(const
Consensus::LLMQParams&, int)> or a template/lambda wrapper, or else implement
constructor delegation from the active constructor to the observer constructor
and only set m_active afterward; update the observer constructor to call the
helper (or be the delegated target) and change the active constructor to only
construct m_active and supply the ActiveDKGSessionHandler-producing lambda to
the shared InitializeHandlers call, ensuring m_qman.ConnectManagers and
m_qdkgsman.InitializeHandlers remain in one place.
- Around line 451-464: The dynamic_cast in Start() currently uses
dynamic_cast<ActiveDKGSessionHandler&> inside the m_qdkgsman.ForEachHandler
lambda which will throw std::bad_cast on mismatch; change it to use the pointer
form dynamic_cast<ActiveDKGSessionHandler*> (matching Interrupt()) and check for
nullptr before using handler: if the cast returns nullptr, log or skip creating
the phase thread for that handler instead of dereferencing/throwing, otherwise
capture the pointer and pass it to PhaseHandlerThread; update references in the
lambda to use the pointer (e.g., handler->QuorumIndex(),
PhaseHandlerThread(*handler)) so Start() no longer can throw from a bad_cast.
- Around line 257-294: The code in NetDKG::ProcessMessage reads llmqType and
quorumHash from vRecv and then calls vRecv.Rewind in the reverse order to
restore the stream for later hashing; make this intent explicit and less fragile
by adding a short comment above the reads/Rewind calls describing that we are
only peeking at the header bytes (uint8_t then uint256) and that the subsequent
Rewind calls intentionally undo the reads so the full payload remains unchanged
for hashing (used later when moving vRecv into pm). Optionally, reorder the
Rewind calls to mirror the read order (Rewind(sizeof(uint8_t)) then
Rewind(sizeof(uint256))) to reduce future mistakes, while keeping the
explanatory comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d8950d10-63a8-42c9-9f3f-6963acd754d2
📒 Files selected for processing (16)
src/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/init.cppsrc/llmq/dkgsession.cppsrc/llmq/dkgsession.hsrc/llmq/dkgsessionhandler.cppsrc/llmq/dkgsessionhandler.hsrc/llmq/dkgsessionmgr.cppsrc/llmq/dkgsessionmgr.hsrc/llmq/net_dkg.cppsrc/llmq/net_dkg.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/options.cppsrc/llmq/options.h
🚧 Files skipped from review as they are similar to previous changes (12)
- src/llmq/options.cpp
- src/llmq/observer.cpp
- src/llmq/options.h
- src/llmq/net_dkg.h
- src/llmq/dkgsessionmgr.cpp
- src/active/context.h
- src/llmq/dkgsessionhandler.cpp
- src/llmq/observer.h
- src/llmq/dkgsessionmgr.h
- src/llmq/dkgsession.h
- src/active/dkgsession.cpp
- src/init.cpp
| std::vector<CFinalCommitment> CDKGSession::FinalizeCommitments() { return {}; } | ||
|
|
||
| CFinalCommitment CDKGSession::FinalizeSingleCommitment() { return {}; } |
There was a problem hiding this comment.
Empty FinalizeCommitments() and FinalizeSingleCommitment() will silently break DKG output.
Both implementations return an empty vector and a default-constructed (null) CFinalCommitment respectively. These are the only paths in HandleDKGRound (see src/llmq/net_dkg.cpp lines 522-527 and 594-599) that produce mineable commitments via qblockman.AddMineableCommitment(...). With these stubs:
FinalizeSingleCommitment()returns{}whichIsNull()reports true → branch is skipped, no single-member commitment is ever produced.FinalizeCommitments()returns an empty vector → theforloop never executes, no multi-member commitment is ever relayed/mined.
If this is intentional (work-in-progress to be filled in by a follow-up commit/PR), please add a TODO/FIXME and a clear log/assert so the regression is not invisible at runtime; otherwise DKG silently never completes a quorum on this branch.
Could you confirm whether the original (pre-refactor) finalize logic is supposed to land in this PR or in a follow-up? I can help port the previous body if you'd like.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llmq/dkgsession.cpp` around lines 627 - 629, The empty stubs for
CDKGSession::FinalizeCommitments() and CDKGSession::FinalizeSingleCommitment()
silently disable producing mineable commitments in HandleDKGRound; either
restore the original finalize logic into these two methods (port the
pre-refactor bodies into CDKGSession::FinalizeCommitments and
CDKGSession::FinalizeSingleCommitment so they return the actual
CFinalCommitment(s) used by qblockman.AddMineableCommitment(...)), or if this
work is intentionally deferred, replace the stubs with an explicit TODO and a
runtime-safe guard: log an error (process/ulogs) and assert or throw so the
failure is visible (do not silently return {}), referencing the methods by name
(FinalizeCommitments and FinalizeSingleCommitment) and ensuring HandleDKGRound’s
expectations are met.
| if (!dispatched) { | ||
| LogPrintf("NetDKG -- no session handlers for quorumIndex [%d]\n", quorumIndex); | ||
| m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100); | ||
| return; | ||
| } |
There was a problem hiding this comment.
PeerMisbehaving(100) for missing handler may be too punitive.
If DoForHandler returns false because no session handler exists for {llmqType, quorumIndex} (e.g., handlers haven't been initialized yet for this combination, or this node hasn't discovered that quorum is mineable), we instantly ban the peer with score 100. Earlier paths use 10 for "we might be lagging behind" cases (e.g., line 319). If a remote peer is slightly ahead, we'd ban legitimate participants.
Consider lowering to 10, or returning silently with a LogPrint, since the peer's message itself was structurally valid up to this point.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llmq/net_dkg.cpp` around lines 375 - 379, The current code calls
m_peer_manager->PeerMisbehaving(pfrom.GetId(), 100) when DoForHandler indicates
no session handler (dispatched == false), which is too punitive; change the
penalty to 10 (i.e., m_peer_manager->PeerMisbehaving(pfrom.GetId(), 10)) or
alternatively remove the ban and replace it with a LogPrint/LogPrintf so
legitimate peers slightly ahead aren't banned—update the branch in NetDKG where
dispatched is checked (the block that logs "NetDKG -- no session handlers for
quorumIndex" and calls PeerMisbehaving) to apply the lower score or silent log
as discussed.
| bool NetDKG::AlreadyHave(const CInv& inv) | ||
| { | ||
| switch (inv.type) { | ||
| case MSG_QUORUM_CONTRIB: | ||
| case MSG_QUORUM_COMPLAINT: | ||
| case MSG_QUORUM_JUSTIFICATION: | ||
| case MSG_QUORUM_PREMATURE_COMMITMENT: { | ||
| if (!IsQuorumDKGEnabled(m_sporkman)) return false; | ||
| bool seen = false; | ||
| m_qdkgsman.ForEachHandler([&](CDKGSessionHandler& h) { | ||
| if (seen) return; | ||
| if (h.pendingContributions.HasSeen(inv.hash) || h.pendingComplaints.HasSeen(inv.hash) || | ||
| h.pendingJustifications.HasSeen(inv.hash) || h.pendingPrematureCommitments.HasSeen(inv.hash)) { | ||
| seen = true; | ||
| } | ||
| }); | ||
| return seen; | ||
| } | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| bool NetDKG::ProcessGetData(CNode& pfrom, const CInv& inv, CConnman& connman, const CNetMsgMaker& msgMaker) | ||
| { | ||
| // Default implementations of GetContribution and the other virtual methods | ||
| // return false in observer mode; m_active is only an early exit and does | ||
| // not affect logic. | ||
| if (m_active == nullptr) return false; |
There was a problem hiding this comment.
Asymmetry between AlreadyHave and ProcessGetData for observer-mode (qwatch) nodes.
AlreadyHave (line 391) only gates on IsQuorumDKGEnabled(m_sporkman) and walks every handler's pending sets — observer/m_quorums_watch nodes will report true for DKG inv hashes they've seen.
ProcessGetData (line 411), however, short-circuits to false whenever m_active == nullptr, even though m_quorums_watch == true nodes do receive and forward DKG messages. The result: a watching node will tell peers "I already have this" via AlreadyHave, but then refuse to serve it via getdata. Peers can be left waiting/timing out instead of fetching from another source.
The inline comment ("default implementations of GetContribution... return false in observer mode") is true for the base CDKGSessionHandler, but the dispatch goes through m_qdkgsman.GetContribution(...) which dynamic-dispatches to the active handler in active mode. For pure m_quorums_watch=true (no m_active) the base virtuals do return false anyway — so the early return is redundant for that case and harmful for the truly-observer case.
Consider either:
- Removing the
m_active == nullptrearly return and relying on the virtual dispatch to returnfalsenaturally, or - Symmetrically gating
AlreadyHaveso it returnsfalsewhenm_active == nullptr.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llmq/net_dkg.cpp` around lines 384 - 411, AlreadyHave reports DKG invs
seen by observer nodes (m_quorums_watch) but ProcessGetData currently returns
early when m_active == nullptr causing watchers to claim possession but refuse
getdata; remove the early "if (m_active == nullptr) return false;" check from
NetDKG::ProcessGetData so that m_qdkgsman.GetContribution/GetComplaint/etc. are
dynamically dispatched (base virtuals will return false in non-serving
handlers), and keep or update the explanatory comment to note that observer
handlers will be consulted via m_qdkgsman even when m_active is null.
|
This pull request has conflicts, please rebase. |
It shows the hidden circular dependency and tidy up list of includes
- removed method CDKGPendingMessages::Misbehaving(NodeId, int, PeerManager&), ProcessPendingMessageBatch calls peerman.Misbehaving(...) directly - renamed PushPendingMessage<Message>(NodeId, Message&, PeerManager&) to PushOwnPendingMessage for clear distinction of path with node=-1 (self made)
…from PeerManager Re-ordered initialization of PeerManager and ActiveContext / ObserverContext, PeerManager::make now takes nodeman raw ptr (or nullptr). It resolves several circular dependencies over net_processing and removes several unique_ptr<T&> work-arounds from PeerManager
It helps to drop dependency of llmq/dkgsessionhandler on network code
f4b6aae to
248ccaf
Compare
- moved implementation of ProcessMessage and AlreadyHave to NetDKG - drop usages of MessageProcessingResult in CDKGSessionManager - introduced a new helper DoForHandler
248ccaf to
c1c4e2a
Compare
|
This pull request has conflicts, please rebase. |
Issue being fixed or feature implemented
This PR is continuous of #7247
This PR is not direct dependency of kernel project.
This PR aim to resolve next issues:
constructor of PeerManager uses references to unique_ptr to multiple objects that will be initialized later, such as:
That's a fragile design that has multiple assumptions about already initialized members and their life term
What was done?
CDKGSessionManageris reduced to a pure state class, it owns DB and provides 2 new helper:ForEachHandler/DoForHandlerCDKGSessionHandlerandActiveDKGSessionHandlerloses its threading andProcessMessagemembersMessageProcessingResultusages are dropped from llmq/ consensus codeNetDKGis introduced which takes responsibilities for p2p communications for DKG works and for running threadsHow Has This Been Tested?
Removed circular dependency over
dkgsessionhandler <-> net_processingBreaking Changes
N/A
Checklist: