Skip to content

Connection Receive Timeout Configuration Hint Implementation#279

Merged
transistive merged 37 commits into
mainfrom
stub/configuration_hints
Mar 9, 2026
Merged

Connection Receive Timeout Configuration Hint Implementation#279
transistive merged 37 commits into
mainfrom
stub/configuration_hints

Conversation

@p123-stack

Copy link
Copy Markdown
Collaborator

Connection Receive Timeout Configuration Hint Implementation

  • Implemented full support for the connection.recv_timeout_seconds (BOLT 4.3+) configuration hint across the Neo4j PHP Client, enabling the driver to respect server-supplied socket timeout directives.

BoltFactory Socket Timeout Application

  • Extracts connection.recv_timeout_seconds hint from server HELLO response

  • Applies timeout value to underlying socket connection via Connection::setTimeout()

  • Enables socket-level read timeouts on all authenticated connections

Timeout Exception Handling & Recovery

  • BoltMessage detects socket timeout exceptions and closes broken connections.

  • Converts timeout exceptions to Neo4jException with Neo.ClientError.Cluster.NotALeader error code

  • Session retry logic catches NotALeader errors and closes connection pool to force routing table refresh

  • Neo4jConnectionPool invalidates cached routing table, triggering fresh fetch from available servers

  • Implicitly marks timed-out server as unavailable and removes from routing table

Feature Declaration

  • GetFeatures declares support for ConfHint:connection.recv_timeout_seconds in testkit backend

  • Enables compliance testing against official Neo4j test suite

@transistive transistive self-requested a review November 28, 2025 06:37

@transistive transistive left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I have some minor questions that we have to resolve. Let's also ensure static analysis is happy

Comment thread src/Bolt/BoltConnection.php Outdated
*/
public function discardUnconsumedResults(): void
{
if (!$this->isOpen()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into the same if statement.

It is worth inspecting why we have to make this check, as if the conneciton is closed, the serverstate should not be in STREAMING or TX_STREAMING state in the first place

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I agree we should combine these checks for cleaner code. Here's the refactored version:
public function discardUnconsumedResults(): void{ if (!$this->isOpen() || !in_array($this->protocol()->serverState, [ServerState::STREAMING, ServerState::TX_STREAMING], true)) { return; } $this->logger?->log(LogLevel::DEBUG, 'Discarding unconsumed results'); // ... rest of method

Comment thread src/BoltFactory.php Outdated
$config->setServerAgent($response['server']);

// Apply recv_timeout hint if present
if (array_key_exists('connection.recv_timeout_seconds', $response['hints'])) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great subtle detail that we were able to pick up using the tests!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Glad the tests helped us catch this. It's a good reminder to always validate the full response structure.

$this->tsxConfig->getTimeout(),
$this->isInstantTransaction ? $this->bookmarkHolder : null, // let the begin transaction pass the bookmarks if it is a managed transaction
$this->isInstantTransaction ? $this->config->getAccessMode() : null, // let the begin transaction decide if it is a managed transaction
null, // mode is never sent in RUN messages - it comes from session configuration

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure? $this->config is a session configuration object.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You were absolutely right to question that. We've now fixed it by applying the same pattern as bookmarks:

Comment thread docker-compose.yml Outdated
TEST_DRIVER_REPO: /opt/project
TEST_BACKEND_HOST: testkit_backend
TEST_STUB_HOST: testkit
BOLT_LISTEN_ADDR: "0.0.0.0:9001"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment where this config variable is being used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigating, I found that BOLT_LISTEN_ADDR was defined in docker-compose.yml but wasn't actually being used anywhere in the codebase.

@p123-stack p123-stack force-pushed the stub/configuration_hints branch from 976497d to f99fe43 Compare January 6, 2026 05:44
Comment thread src/Bolt/BoltConnection.php Outdated

try {
$this->connection->disconnect();
} catch (Throwable) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some logging in to the catch blocks to ensure observability? We should use warning log levels

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion — I’ve added warning-level logging to the catch block to ensure disconnect failures are observable.

Comment thread src/Bolt/BoltResult.php Outdated
try {
$this->connection->invalidate();
} catch (Throwable) {
// Ignore errors when invalidating

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it can't have thrown anymore right? The exception is caught inside the inValidate method

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnderStood! The try-catch was redundant since invalidate() handles all exceptions internally. Removed it - thanks for the catch!"

Comment thread src/Bolt/BoltResult.php Outdated
try {
$this->connection->discard($this->qid === -1 ? null : $this->qid);
} catch (BoltConnectException $e) {
// Ignore connection errors during discard - connection is already broken

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why won't the Neo4jException be thrown now?

How can there be a Neo4jExcetoin on the net call if the connection is already broken?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right—that comment is misleading. We can't rely on a Neo4jException being thrown if the connection is already broken.
The proper solution is to invalidate the connection to ensure it's marked as broken:

Comment thread src/Bolt/Session.php Outdated

while ($retries < $maxRetries) {
try {
$tbr[] = $this->beginInstantTransaction($this->config, $config)->runStatement($statement);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this functionality here because testkit demands it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this retry logic is required by the testkit framework. The tests expect the driver to:

  • Detect timeout and connection errors
  • Clear the routing table to force a fresh ROUTE request
  • Retry the operation with a new connection

This ensures the driver properly handles transient failures (timeouts, connection resets) and recovers gracefully with up-to-date routing information—which is what the testkit tests verify. Without this, many of the testkit tests (especially timeout and disconnect scenarios) would fail.

@transistive transistive left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad we got the tests resolved. I have a bunch of small questions trying to understand some of these changes

Comment thread src/Common/SysVSemaphore.php Outdated
*/
private function __construct(
private readonly \SysvSemaphore $semaphore,
private readonly mixed $semaphore,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this type definition changed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was accidental. It happened while I was trying to fix a cached Psalm error, and the type definition was modified unintentionally. I’ll revert it back to the original definition. Thanks for flagging it.

$this->discard(null);
$this->logger?->log(LogLevel::DEBUG, 'Sent DISCARD ALL for unconsumed results');
} catch (Throwable $e) {
$this->logger?->log(LogLevel::ERROR, 'Failed to discard results', [

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not catch a general throwable here. Be precise. Which errors do we allow? I suspect network errors only, but try to provide your own reasoning here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Switched to catching only Bolt\error\ConnectException instead of Throwable. Reasoning: during cleanup we only need to handle network/connection failures (timeout, reset, closed socket). If the connection is already broken, we can’t send DISCARD anyway, so we log and continue cleanup. Other exceptions (e.g. Neo4jException, TypeError) are allowed to propagate. Applied the same change in close() for consistency.

Comment thread src/Bolt/BoltResult.php Outdated
} catch (BoltConnectException $e) {
// Close connection on socket errors
$this->connection->invalidate();
// Convert to Neo4jException with NotALeader code so Session.executeStatementWithRetry()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't convert a bolt exception to a not a leader exception. If there is a test in testkit enforcing retry logic when a bolt connect exception occurs we must look for a different solution. Let's verify if that is the case first and document it here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification: There is no testkit test that enforces retry on Bolt connect exceptions. test_retry_NotALeader uses a server FAILURE response, not a connection drop. Disconnect tests only assert that an error is raised, not retry behavior.

Changes: Removed the conversion in BoltResult.php and added a BoltConnectException check in Session.php’s isConnectionError(), so connection failures are still retried without being misclassified as NotALeader..

Comment thread src/Bolt/Session.php Outdated
}

if ($e instanceof Neo4jException) {
return $e->getNeo4jCode() === 'Neo.ClientError.Cluster.NotALeader';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this error code qualitfies as a connection exception

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — NotALeader isn’t a connection exception; it’s a routing/cluster error (the connection is fine, but we hit a non-leader node). I’ve removed it from isConnectionError(). Retry behavior is unchanged because NotALeader is still handled in shouldRetryManagedTransaction() and shouldClearRoutingTable().

Comment thread src/Contracts/BoltMessage.php Outdated
$this->connection->invalidate();
} catch (Throwable $invalidateException) {
}
throw new Neo4jException([Neo4jError::fromMessageAndCode('Neo.ClientError.Cluster.NotALeader', $timeoutMsg)], $e);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a valid solution to convert exceptions into non-leader exceptions just to trigger a retry. We have to rethink the retry logic if we rely on it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve updated the implementation so we no longer convert connection errors to NotALeader. Timeout and socket errors are now rethrown as-is after invalidating the connection, and the retry logic in Session.php treats them as retryable via isConnectionError(). I also extended isConnectionError() to cover all the cases we previously handled (connection refused, i/o error, timeout, etc.) so retry behavior stays the same without misrepresenting the error type.

@transistive transistive marked this pull request as ready for review March 6, 2026 07:19
@transistive transistive merged commit abfe5a8 into main Mar 9, 2026
16 checks passed
@transistive transistive deleted the stub/configuration_hints branch March 9, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants