feat(grpc): enforce a maximum number of reconnection attempts#902
feat(grpc): enforce a maximum number of reconnection attempts#902Molter73 wants to merge 2 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThe PR threads a configurable gRPC retry limit through YAML, CLI, and tests, changes gRPC backoff to stop after the retry budget is exhausted, and updates output and run control flow to propagate task and worker errors through shutdown. ChangesFinite gRPC Retry Budget
Sequence Diagram(s)sequenceDiagram
participant runLoop as run
participant outputStart as output::start
participant clientStart as Client::start
participant clientRun as Client::run
participant Backoff
runLoop->>outputStart: start output task
outputStart->>clientStart: spawn gRPC task
clientStart->>clientRun: run()
clientRun->>Backoff: next()
Backoff-->>clientRun: None
clientRun-->>outputStart: Err("reconnection attempts exhausted")
outputStart-->>runLoop: task resolves Err
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #902 +/- ##
==========================================
+ Coverage 32.72% 34.35% +1.62%
==========================================
Files 21 21
Lines 2695 2771 +76
Branches 2695 2771 +76
==========================================
+ Hits 882 952 +70
- Misses 1810 1816 +6
Partials 3 3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
If fact fails to connect to the gRPC server in the specified number of attemtps it will crash, which can be used as a clear signal in k8s that something is not working as expected. If the number of retries is set to 0, there will be no limit to the amount of times fact attempts to reconnect. In order to allow the reconnection failure to trigger an application wide crash, the main output task monitors the result of the grpc task's handle propagating the error further up. This method should allow for other output components to be added in the future and follow this same pattern without having to change anything outside the output module. The stdout component is not included in this logic because it has no condition that could merit an application wide crash. We also now propagate the error of worker tasks all the way up to the termination of the application.
a6f8c0e to
669f5be
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
fact/src/output/mod.rs (1)
47-74: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy liftAdd focused coverage for gRPC supervision failures.
This loop is the new contract that converts
Ok(Err(_))and join failures into output-level errors, but the PR coverage report shows this file has no patch coverage. A small async test or extracted helper around thematch respath would protect the shutdown behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@fact/src/output/mod.rs` around lines 47 - 74, Add focused test coverage for the gRPC supervision path in the output loop so shutdown/error handling is exercised. The `tokio::spawn` block in `output::mod` currently maps `grpc_handle.borrow_mut()` results into output-level failures, so add a small async test or extract a helper around the `match res` branch to verify both `Ok(Err(_))` and join/task errors are handled as expected.fact/src/config/tests.rs (1)
567-582: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd the negative retries parse-error case.
Once the parser rejects negative retry counts, cover
retries: -1here so YAML validation matches the new retry-budget contract.Proposed test case
( r#" grpc: backoff: retries: true "#, "invalid grpc.backoff.retries: Boolean(true)", ), + ( + r#" + grpc: + backoff: + retries: -1 + "#, + "invalid grpc.backoff.retries: Integer(-1)", + ),As per coding guidelines,
fact/src/config/mod.rs: "Add unit tests in fact/src/config/tests.rs for configuration schema changes in fact/src/config/mod.rs."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@fact/src/config/tests.rs` around lines 567 - 582, The config parsing tests in tests.rs are missing coverage for the new negative retry validation in grpc.backoff.retries. Add a unit test case alongside the existing retries parse-error assertions in the config tests to verify that retries: -1 is rejected with the appropriate invalid grpc.backoff.retries error, using the same pattern as the current retries validation cases.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@fact/src/config/mod.rs`:
- Around line 416-420: Reject negative YAML retry counts in the retries parsing
branch of the config loader, since converting a signed value to u64 can wrap and
bypass the intended retry limit. Update the logic in the backoff/retries
handling inside mod.rs to accept 0 and positive values only, and explicitly bail
out when v.as_i64() returns a value below zero before assigning to
backoff.retries_max. Also add unit coverage in fact/src/config/tests.rs for the
retries configuration path to verify negative values fail and zero remains
valid.
In `@fact/src/output/grpc.rs`:
- Around line 217-220: The reconnect exhaustion path in gRPC output handling
drops the last connection failure, so update the backoff termination branch in
the connection retry loop to preserve the final error details when reconnection
attempts are exhausted. In the logic around the backoff handling in the gRPC
connection code, make the terminal bail/error include the original connection
error value from the retry loop so callers can see the actual
DNS/TLS/refused-connection cause instead of only “attempts exhausted”.
---
Nitpick comments:
In `@fact/src/config/tests.rs`:
- Around line 567-582: The config parsing tests in tests.rs are missing coverage
for the new negative retry validation in grpc.backoff.retries. Add a unit test
case alongside the existing retries parse-error assertions in the config tests
to verify that retries: -1 is rejected with the appropriate invalid
grpc.backoff.retries error, using the same pattern as the current retries
validation cases.
In `@fact/src/output/mod.rs`:
- Around line 47-74: Add focused test coverage for the gRPC supervision path in
the output loop so shutdown/error handling is exercised. The `tokio::spawn`
block in `output::mod` currently maps `grpc_handle.borrow_mut()` results into
output-level failures, so add a small async test or extract a helper around the
`match res` branch to verify both `Ok(Err(_))` and join/task errors are handled
as expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: 81043491-08dc-4da7-a651-5e8569ee9b26
📒 Files selected for processing (5)
fact/src/config/mod.rsfact/src/config/tests.rsfact/src/lib.rsfact/src/output/grpc.rsfact/src/output/mod.rs
| }, | ||
| ..Default::default() | ||
| }, | ||
| ), |
There was a problem hiding this comment.
Add some tests with negative retries.
Some minor log improvements are also thrown in there.
Description
If fact fails to connect to the gRPC server in the specified number of attemtps it will crash, which can be used as a clear signal in k8s that something is not working as expected. If the number of retries is set to 0, there will be no limit to the amount of times fact attempts to reconnect.
In order to allow the reconnection failure to trigger an application wide crash, the main output task monitors the result of the grpc task's handle propagating the error further up. This method should allow for other output components to be added in the future and follow this same pattern without having to change anything outside the output module. The stdout component is not included in this logic because it has no condition that could merit an application wide crash.
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
Pointed fact to connect to a non-existant gRPC server and got it to crash as expected.
fact output on crash
Summary by CodeRabbit
New Features
Bug Fixes