Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Addresses Android agent failures where DNS resolution can temporarily fail after waking from Doze mode by retrying requests that hit UnknownHostException.
Changes:
- Add bounded retry + delay when
UnknownHostExceptionoccurs during API calls. - Add an Android changelog entry describing the fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| android/changes/43462-android-cert-dns-retry | Adds release-note entry for DNS retry behavior. |
| android/app/src/main/java/com/fleetdm/agent/ApiClient.kt | Retries API requests on DNS resolution failures (UnknownHostException) with delay and logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughApiClient.makeRequest validates the configured baseUrl (requires http/https), returns distinct failures for missing/invalid base URL, and adds a DNS-resolution retry loop that retries on UnknownHostException up to DNS_MAX_RETRIES with exponential backoff, recreating and disconnecting the HttpURLConnection each attempt and rethrowing CancellationException. CertificateOrchestrator.enrollCertificates now accumulates results incrementally and aborts the remaining certificate enrollments when a DNS-level UnknownHostException is encountered, returning only results processed so far. A unit test was added to verify batch abort on DNS failure and MockScepClient gained a configurable network-exception cause for tests. Release notes updated. Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #43464 +/- ##
==========================================
- Coverage 66.91% 66.91% -0.01%
==========================================
Files 2596 2596
Lines 208103 208124 +21
Branches 9321 9326 +5
==========================================
+ Hits 139248 139258 +10
- Misses 56199 56208 +9
- Partials 12656 12658 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@android/changes/43462-android-cert-dns-retry`:
- Line 1: Update the changelog entry "Fixed Android agent to retry DNS
resolution failures when waking from Doze mode." to also document the
certificate enrollment batch-deferral behavior: when a terminal DNS failure
occurs during enrollment the code now stops processing the remaining batch and
defers those certificates to the next run; mention that remaining certs are
deferred rather than retried immediately. Locate the changelog entry referencing
the 43462-android-cert-dns-retry change and append a short sentence describing
the new enroll/batch behavior and its user-visible effect.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a338e70b-e0d8-4789-a019-959168e7210b
📒 Files selected for processing (4)
android/app/src/main/java/com/fleetdm/agent/ApiClient.ktandroid/app/src/main/java/com/fleetdm/agent/CertificateOrchestrator.ktandroid/app/src/test/java/com/fleetdm/agent/CertificateOrchestratorTest.ktandroid/changes/43462-android-cert-dns-retry
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
| cert.id to enrollCertificate(context, cert.id, cert.uuid, certificateInstaller) | ||
| val results = mutableMapOf<Int, CertificateEnrollmentHandler.EnrollmentResult>() | ||
| for (cert in hostCertificates) { | ||
| val result = enrollCertificate(context, cert.id, cert.uuid, certificateInstaller) |
There was a problem hiding this comment.
I am a little confused why we have MAX_CERT_INSTALL_RETRIES=3 and MAX_RETRY_ATTEMPTS=5. When we call enrollCertificate and the result comes back as Failure, we call recordEnrollmentAttemptFailure. This increments retry counter. If we fail DNS 3 times the cert is permanently failed and this is reported to the server. But according to MAX_RETRY_ATTEMPTS we technically still have 2 more retries. I believe both counters are incremented on failure? Just trying to understand why we have 2 counters
There was a problem hiding this comment.
@ksykulev This is arguably more complex than it should be. Maybe an eng-init issue to simplify it? DNS failures do not cause the cert to fail permanently.
Three retry levels handle failures at different time scales during certificate enrollment.
Level 1: DNS retry in makeRequest (DNS_MAX_RETRIES = 7, ~127s)
Handles the Android Doze wake issue where the DNS resolver needs seconds to become ready after the device wakes. Retries the same HTTP call with exponential backoff (1s, 2s, 4s, 8s, 16s, 32s, 64s). Without this, a brief DNS blip would fail the HTTP call, abort the entire batch, and trigger a full worker restart.
If DNS stays down past 127s, the batch aborts and Level 3 takes over.
Level 2: Per-cert SCEP retry (MAX_CERT_INSTALL_RETRIES = 3)
Handles cert-specific SCEP enrollment failures (bad challenge, server rejection). Persists in DataStore per certificate ID across worker runs. After 3 failures for the same cert, marks it permanently failed and reports to the server. Prevents a single bad cert from retrying forever.
Not involved in DNS failures (the early return in enrollCertificate bypasses recordEnrollmentAttemptFailure).
Level 3: Worker burst retry (MAX_RETRY_ATTEMPTS = 5)
Caps the rapid WorkManager retry burst and resets to the 15-minute periodic schedule. Without this, WorkManager's exponential backoff would escalate indefinitely (10s, 20s, 40s, 80s, 160s, ... up to 5 hours), and the periodic schedule would be paused while the retry chain is active.
After 5 rapid retries, the worker returns Result.success() to reset the retry state and hand control back to the periodic schedule, keeping the retry cadence at a predictable 15 minutes.
Related issue: Resolves #43462
During review, Hide whitespace.
Fixed Android agent to retry DNS resolution failures when waking from Doze mode, and to defer remaining certificates in a batch to the next enrollment cycle when a DNS failure persists.
The fix does not eliminates DNS errors from the logs, it just handles them better.
Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
Testing
Summary by CodeRabbit
Bug Fixes
Tests