Skip to content

fix(pegboard): refresh runner config after envoy connect#4778

Merged
NathanFlurry merged 1 commit intomainfrom
driver-fixes/refresh-runner-config-after-envoy-connect
Apr 27, 2026
Merged

fix(pegboard): refresh runner config after envoy connect#4778
NathanFlurry merged 1 commit intomainfrom
driver-fixes/refresh-runner-config-after-envoy-connect

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 26, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

PR Review: fix(pegboard): refresh runner config after envoy connect

What the change does

On every envoy connection, the old code unconditionally wrote the pool's protocol version to UDB but never purged the runner-config cache afterward. This meant consumers (via namespace.runner_config.get / runner.list_runner_config_enabled_dcs) could keep serving stale pre-mk2 data indefinitely after the envoy reconnected with a new protocol version.

The PR fixes this by:

  1. Reading the existing stored protocol version before writing (read-before-write).
  2. Only writing when the value has actually changed.
  3. Purging runner-config caches after the transaction commits, matching the same pattern used in upsert.rs and refresh_metadata.rs.

As a bonus, the protocol version read is parallelized with the missed-commands range scan using tokio::try_join!, reducing serial FDB round trips.


Correctness

Cache purge placement is correct. Purging after tokio::try_join! returns means the cache is only invalidated once the UDB transaction has committed. A purge inside the closure would create a TOCTOU window. The brief staleness window between commit and purge is unavoidable here and is consistent with the codebase pattern.

First-connection case is handled. existing_runner_config_protocol_version != Some(protocol_version) evaluates to true when read_opt returns None (first connection for this pool), so the version is written and the cache is purged on the first connect.

Same-version reconnect is handled. When the envoy reconnects without changing its protocol version, the read returns Some(protocol_version), the comparison is false, no write is issued, and the cache purge is skipped - avoids unnecessary cache churn on every reconnect.

Parallel reads within the same FDB transaction. tokio::try_join! on ns_tx.read_opt and tx.get_ranges_keyvalues inside the same udb.run closure is safe. FDB supports concurrent reads from the same transaction, and the conditional write only executes after both reads complete. This pattern is already established earlier in the same function for the create_ts/last_ping_ts/version reads.

Type annotation on map closure. The added -> anyhow::Result<protocol::CommandWrapper> annotation is required for type inference inside tokio::try_join! and is correct.


Minor observations

  • Exhaustive match: match command enumerates CommandStartActor and CommandStopActor with no _ => arm - correct per CLAUDE.md conventions for Rust enums.
  • Tuple nesting: let (_, (mut missed_commands, runner_config_protocol_changed)) is slightly dense but readable.
  • Stale cache on same-version reconnect after crash: If a crash left the cache stale before the version was committed, a same-version reconnect would not recover it. Unlikely in practice given upsert.rs/refresh_metadata.rs purge on their writes, but worth noting.

No concerns on

  • Error propagation: ? at the end of tokio::try_join! correctly threads anyhow::Error from either branch.
  • The comment on the cache purge explains why it exists, which is sufficient.
  • Trust boundary: protocol version arrives from the envoy connection URL, validated upstream in UrlData.

Summary

The fix is correct, minimal, and follows existing codebase patterns for cache invalidation. No blocking issues. LGTM once the draft is ready and the PR description is filled in with the root cause (cache was never purged, leaving readers with stale pre-mk2 runner config data after a reconnect).

Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4778

All packages published as 0.0.0-pr.4778.1bb070b with tag pr-4778.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-1bb070b
docker pull rivetdev/engine:full-1bb070b
Individual packages
npm install rivetkit@pr-4778
npm install @rivetkit/react@pr-4778
npm install @rivetkit/rivetkit-napi@pr-4778
npm install @rivetkit/workflow-engine@pr-4778

@NathanFlurry NathanFlurry changed the base branch from driver-fixes/gasoline-skip-corrupt-active-worker to graphite-base/4778 April 26, 2026 20:54
@NathanFlurry NathanFlurry force-pushed the driver-fixes/refresh-runner-config-after-envoy-connect branch from 36392d2 to 9b12a58 Compare April 26, 2026 20:54
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4778 to 04-24-chore_kitchen-sink_configurable_endpoint April 26, 2026 20:54
This was referenced Apr 26, 2026
Base automatically changed from 04-24-chore_kitchen-sink_configurable_endpoint to main April 27, 2026 07:13
@NathanFlurry NathanFlurry merged commit 3e2c4f5 into main Apr 27, 2026
19 of 23 checks passed
@NathanFlurry NathanFlurry deleted the driver-fixes/refresh-runner-config-after-envoy-connect branch April 27, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant