Skip to content

NodeServer: per-peer error counters, inbound metrics, circuit-breaker on bad messages #566

@brittleboye

Description

@brittleboye

Problem

Inbound error handling is fire-and-forget:

  • mergeIncoming (convex-peer/src/main/java/convex/node/NodeServer.java:528) logs at WARN and returns on merge failure.
  • handleIncomingMessage (convex-peer/src/main/java/convex/node/NodeServer.java:309) logs on decode / dispatch failures.
  • No per-peer error counter, no rate, no eviction.

A peer sending malformed or unacceptable values indefinitely pays no cost. Combined with the lack of inbound backpressure (separate issue), this is a clear DoS surface.

Observability is also thin on the positive path:

  • Outbound broadcast / rootSync counters exist.
  • No inbound message counter, merge-latency histogram, or rejected-message counter.

Proposal

  • Per-peer counters: messagesReceived, mergesAccepted, mergesRejected, decodeErrors, lastErrorTimestamp.
  • Optional circuit breaker: after N consecutive rejects or M rejects in a rolling window, close the connection and optionally mark the peer as undesired for a cooldown period.
  • Expose aggregate metrics (JMX + SLF4J MDC) so operators can see inbound health.
  • Merge-latency histogram (nanos) for the top-level merge; long-tail is the signal that a peer is sending expensive values.

Scope

This is separate from the general backpressure / size-limit issue — they compose, but can be landed independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions