Skip to content

Add relay candidate ranking#257

Draft
tuhalf wants to merge 10 commits into
masterfrom
tuhalf/latency-based-node-selection
Draft

Add relay candidate ranking#257
tuhalf wants to merge 10 commits into
masterfrom
tuhalf/latency-based-node-selection

Conversation

@tuhalf
Copy link
Copy Markdown
Collaborator

@tuhalf tuhalf commented Mar 11, 2026

This pull request introduces support for a new "relay" JSON-RPC method (rpc) alongside improvements to relay candidate management and connection logic. The changes unify how JSON-RPC responses are parsed, add robust relay candidate caching, and enhance client connection reliability. Several new tests ensure the correctness of the new relay RPC path and related parsing logic.

Relay RPC support and unified response parsing:

  • Added a new RelayRPC method to the Client struct for forwarding JSON-RPC calls to a relay, with unified parsing logic for both sapphire:rpc and the new rpc method. Error messages are now more descriptive and consistent. [1] [2] [3] [4] [5]
  • Refactored JSON-RPC response parsing into a shared function, reducing code duplication and improving maintainability. [1] [2]

Relay candidate and connection management improvements:

  • Introduced relay candidate caching, synchronization, and authoritative candidate tracking to improve relay selection and reconnection reliability. This includes new fields in ClientManager, candidate cache loading and flushing, and tighter integration with address normalization. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
  • Enhanced connection routines to record dial outcomes and connected identities for better monitoring and management. [1] [2]

Testing and validation:

  • Added comprehensive tests for the new relay RPC response parsing and client integration, ensuring correct behavior for both valid and invalid responses. [1] [2] [3]

Other improvements:

  • Improved normalization and synchronization of relay addresses throughout the client manager, ensuring consistency across configuration and runtime. [1] [2] [3] [4] [5]

These changes collectively improve the system's flexibility in handling new RPC methods, its robustness in managing relay candidates and connections, and its test coverage for new and existing functionality.

@tuhalf tuhalf linked an issue Mar 11, 2026 that may be closed by this pull request
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the client's ability to interact with and manage network relays. By introducing a new, unified JSON-RPC method for relays and implementing sophisticated candidate ranking and caching mechanisms, the system gains improved flexibility, reliability, and performance in establishing and maintaining connections. These changes ensure that the client can intelligently select the best available relay, adapt to network conditions, and recover more gracefully from connection issues, ultimately leading to a more robust and efficient user experience.

Highlights

  • Unified JSON-RPC Response Parsing: Refactored JSON-RPC response parsing into a shared function, parseJSONRPCResponse, to reduce code duplication and improve consistency across different RPC methods, including the new rpc method.
  • New Relay RPC Method: Introduced a new RelayRPC method in the client, allowing direct forwarding of JSON-RPC calls to a connected relay's network RPC path, enhancing flexibility for relay interactions.
  • Advanced Relay Candidate Management: Implemented a comprehensive system for managing relay candidates, including caching, synchronization with configured addresses, authoritative tracking, and persistence to a database. This improves relay selection and reconnection reliability.
  • Connection Outcome Recording: Enhanced client connection routines to record dial outcomes (latency, success/failure) and connected server identities, providing better data for relay candidate ranking and management.
  • Robust RLP Message Reading: Improved the SSL layer's message reading to handle partial RLP messages more robustly by predicting the full RLP value length and reading remaining bytes, preventing potential truncation issues.
  • Comprehensive Testing: Added new tests to validate the RelayRPC method, its response parsing, and the core logic of relay candidate management and caching.
Changelog
  • edge/protocol.go
    • Refactored parseSapphireRPCResponse into a generic parseJSONRPCResponse function to handle common RLP response envelope parsing.
    • Added parseRelayRPCResponse to specifically handle responses for the new rpc method, leveraging the unified parsing logic.
    • Updated NewMessage to recognize and route rpc method calls to the new parseRelayRPCResponse.
  • edge/protocol_sapphire_test.go
    • Added TestParseRelayRPCResponse to verify correct parsing of relay RPC responses.
    • Added TestNewMessageSupportsRelayRPC to ensure the message creation logic correctly handles the new rpc method.
  • rpc/client.go
    • Added a new RelayRPC method to the Client struct for forwarding JSON-RPC calls to relays.
    • Refactored parseSapphireRPCResult into a more generic parseRPCResultEnvelope to centralize error and result decoding for RPC responses.
    • Introduced parseRelayRPCResult to use the new parseRPCResultEnvelope for relay RPC responses.
    • Integrated RecordDialOutcome into the doDial method to log connection latencies and errors with the client manager.
    • Integrated RecordConnectedIdentity into the initialize method to inform the client manager about successfully connected relay identities.
  • rpc/client_manager.go
    • Removed the math/rand import as its functionality is now handled by the new candidate ranking logic.
    • Added new fields (candidates, discoveryStop, discoveryClose, cacheFlushTimer) to ClientManager for relay candidate management.
    • Initialized relay candidate management components in NewClientManager and Start methods, including loading the cache and syncing configured candidates.
    • Modified ResolveRelayForDevice to first check known relay addresses and then add authoritative candidates upon successful resolution.
    • Updated AddNewAddresses to normalize host ports and synchronize configured and perimeter candidates with the new candidate management system.
    • Removed the old doSelectNextHost function, as its responsibilities have been moved to the new relay_candidates.go file.
    • Added logic to the Stop method to gracefully close the discovery loop and persist the relay candidate cache.
    • Replaced direct client map updates with a call to registerConnectedClientLocked to centralize client registration and candidate updates.
    • Modified GetClientOrConnect to utilize cached relay hosts for connection attempts before resorting to discovery.
  • rpc/client_sapphire_test.go
    • Added TestParseRelayRPCResultSuccess to verify successful parsing of relay RPC results.
    • Added TestParseRelayRPCResultInvalidJSON to test error handling for malformed JSON in relay RPC responses.
  • rpc/relay_cache.go
    • Added a new file to define relayCandidateCacheRecord for serializing relay candidate data.
    • Implemented loadRelayCandidateCache to retrieve and deserialize cached relay candidates from the database.
    • Implemented persistRelayCandidateCache to serialize and store current relay candidates to the database, including logic for pruning invalid or expired entries.
    • Added helper functions relayCandidateFromCacheRecord, toCacheRecord, shouldPersist, lastTouchedAt, unixTime, and timeToUnix for cache record conversion and management.
  • rpc/relay_cache_test.go
    • Added a new file containing tests for the relay candidate caching mechanism.
    • Included TestRelayCandidateCacheRoundTripPrunesInvalidAndExpired to verify cache persistence and pruning logic.
    • Added TestLoadRelayCandidateCacheMergesConfiguredAndChoosesFastest to test cache loading and integration with configured candidates.
    • Implemented TestKnownRelayHostsUsesCachedDiscoveredCandidatesBeforeRefresh to check how discovered candidates are prioritized.
    • Added TestRelayCandidateCachePersistsFailureAndIdentityMismatch to ensure connection failures and identity mismatches are correctly persisted.
  • rpc/relay_candidates.go
    • Added a new file to define relayCandidate and rankedRelayCandidate structs for detailed relay tracking and scoring.
    • Implemented syncConfiguredCandidatesLocked and syncPerimeterCandidatesLocked to manage the state of configured and perimeter relay candidates.
    • Added ensureCandidateLocked to safely retrieve or create relay candidate entries.
    • Implemented RecordDialOutcome, RecordConnectedIdentity, and RecordIdentityMismatch methods to update candidate metrics based on connection events.
    • Added addAuthoritativeCandidate to mark candidates discovered through authoritative sources.
    • Implemented loadRelayCandidateCacheLocked, scheduleRelayCandidateCacheFlushLocked, and persistRelayCandidateCacheLocked for internal cache management.
    • Added registerConnectedClientLocked to update candidate status upon successful client connection.
    • Implemented rankedPoolCandidatesLocked and rankedNodeCandidatesLocked to generate ordered lists of relay candidates based on various criteria (latency, freshness, source).
    • Moved and reimplemented doSelectNextHost to use the new ranking logic for selecting the next best host.
    • Added knownRelayAddr and knownRelayHosts to retrieve ranked relay addresses for a given node ID.
    • Defined allowedForPool, allowedForNode, isFresh, and sourceRank methods for candidate evaluation.
    • Implemented rankCandidate and sortRankedCandidates for scoring and ordering relay candidates.
    • Added runDiscoveryLoop and refreshDiscoveryCandidates to periodically discover new relay candidates.
    • Implemented fetchRelayCandidatesFromClient, parseRelayCandidates, parseNetworkEntry, parseDiscoveryPort, and isRoutableDiscoveryHost for processing discovered relay information.
    • Added formatRelayAddr for consistent relay address formatting.
  • rpc/relay_candidates_test.go
    • Added a new file containing tests for the relay candidate ranking and discovery logic.
    • Included TestParseNetworkEntry and TestParseNetworkEntryRejectsPrivateIP to validate network entry parsing.
    • Added TestRankCandidateOrdering to verify the correctness of candidate ranking based on various factors.
    • Implemented TestDoSelectNextHostUsesLowestLatencyCandidate to ensure the best candidate is selected for new connections.
    • Added TestKnownRelayHostsPreferValidatedLowLatency to test the prioritization of known relay hosts.
    • Included TestIsRoutableDiscoveryHost to validate host routability checks.
  • rpc/ssl.go
    • Added expectedRLPValueLength and decodeRLPSize helper functions to accurately determine the full length of an RLP-encoded value.
    • Modified readMessage to use expectedRLPValueLength to ensure complete RLP messages are read, even if initially fragmented across network packets.
  • rpc/ssl_test.go
    • Added a new file containing TestExpectedRLPValueLengthForLargeList to verify the RLP length prediction for large encoded data.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to relay candidate management, including ranking, caching, and discovery, alongside a new rpc method. The changes are well-structured, with good refactoring to reduce code duplication and extensive new tests. I've identified a critical bug in the candidate penalization logic and a high-severity issue with unsafe type assertions that could lead to a panic. Addressing these will further enhance the robustness of the new system.

Comment thread rpc/relay_candidates.go Outdated
Comment thread rpc/relay_candidates.go Outdated
@dominicletz
Copy link
Copy Markdown
Member

dominicletz commented Mar 12, 2026

@tuhalf why is there no change in socks.go:func (socksServer *Server) doConnectDevice(requestId int64, deviceName string, port int, protocol int, mode string, retry int) (conn *ConnectedPort, err error)

Which is the primary server candidate selection for port connects?

Also please fix the CI errors:

  1. New races in the cache
  2. RLP decoding errors:
ERROR Invalid RLP frame received, closing connection: unexpected EOF server=localhost:0 
ERROR Invalid RLP frame received, closing connection: rlp: input contains more than one value server=localhost:0 

@tuhalf
Copy link
Copy Markdown
Collaborator Author

tuhalf commented Mar 12, 2026

@tuhalf why is there no change in socks.go:func (socksServer *Server) doConnectDevice(requestId int64, deviceName string, port int, protocol int, mode string, retry int) (conn *ConnectedPort, err error)

Which is the primary server candidate selection for port connects?

It's structurally the same, but the candidate list it loops over is no longer raw append order. That host ranking lives in the candidate helper, not in socks.go, mainly in relay_candidates.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node selection logic needs to be latency based

2 participants