Skip to content

fix(pegboard-envoy): subscribe before registering envoy#4802

Open
NathanFlurry wants to merge 1 commit into04-27-fix_envoy-client_clear_processed_command_idx_after_ack_sendfrom
engine-stabilize/envoy-subscribe-before-register
Open

fix(pegboard-envoy): subscribe before registering envoy#4802
NathanFlurry wants to merge 1 commit into04-27-fix_envoy-client_clear_processed_command_idx_after_ack_sendfrom
engine-stabilize/envoy-subscribe-before-register

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 27, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

PR Review: fix(pegboard-envoy): subscribe before registering envoy

Summary

This PR fixes a race condition in pegboard-envoy: subscriptions were previously created inside conn::init_conn, which also writes the envoy into the load-balancer index. This meant there was a window where a NATS start command could be missed before subscriptions were established. The fix subscribes before calling init_conn, eliminating the window. The logic is correct and the fix is well-motivated.


Issues

Duplicate namespace resolution (correctness + performance)

The most notable issue introduced is that the namespace is now resolved twice per connection:

  • Once in lib.rs (new code, lines ~88-96) to build the subscription topics.
  • Again inside conn::init_conn in conn.rs (lines ~53-59), which retains its own resolve_for_name_global call.

Two concerns:

  1. Extra latency on every envoy handshake — an unnecessary database round-trip during an already-expensive operation.
  2. TOCTOU inconsistency — the two calls are in separate transactions. If the namespace were deleted or renamed between them, the second call would fail, but there is no consistency guarantee between the two resolutions.

The cleaner fix would be to pass the already-resolved namespace (or its ID) into init_conn as a parameter and remove the duplicate resolution from conn.rs. This eliminates the redundant round-trip and ensures both the subscription and the registration use the same resolved value.

No test coverage

The PR description's test checklist is unchecked. The race condition is a concurrent interleaving that is hard to test deterministically, but no regression test is added or referenced. Given that engine/packages/pegboard-envoy/tests/ already exists, a test verifying that the subscription is established before init_conn completes would strengthen confidence in the fix.


Positive Observations

  • The fix correctly addresses the race: subscribe -> init_conn -> publish eviction is the right ordering, and the existing eviction comment confirms the intent.
  • All CLAUDE.md conventions are followed: no println!/eprintln!, no std::sync::Mutex, no _ => fall-through arms, conventional commit message format.
  • with_context / ok_or_else error handling is consistent with surrounding code.

Recommendation

Safe to merge as-is. A follow-up to pass the resolved namespace into init_conn (removing the duplicate resolve_for_name_global call in conn.rs) would be a clean improvement to avoid the extra round-trip.

@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-lifecycle branch from cf29c73 to 2cdf843 Compare April 27, 2026 07:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 66a6724 to 090e234 Compare April 27, 2026 07:31
@NathanFlurry NathanFlurry marked this pull request as ready for review April 27, 2026 08:08
@NathanFlurry NathanFlurry changed the base branch from engine-stabilize/envoy-client-lifecycle to graphite-base/4802 April 27, 2026 08:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 090e234 to 7ca2768 Compare April 27, 2026 08:31
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4802 to engine-stabilize/envoy-client-command-dedup April 27, 2026 08:32
@NathanFlurry NathanFlurry changed the base branch from engine-stabilize/envoy-client-command-dedup to graphite-base/4802 April 27, 2026 09:56
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 7ca2768 to bdd460a Compare April 27, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant