Skip to content

Android agent: retry DNS#43464

Merged
getvictor merged 5 commits intomainfrom
victor/43462-android
Apr 15, 2026
Merged

Android agent: retry DNS#43464
getvictor merged 5 commits intomainfrom
victor/43462-android

Conversation

@getvictor
Copy link
Copy Markdown
Member

@getvictor getvictor commented Apr 13, 2026

Related issue: Resolves #43462

During review, Hide whitespace.

Fixed Android agent to retry DNS resolution failures when waking from Doze mode, and to defer remaining certificates in a batch to the next enrollment cycle when a DNS failure persists.

The fix does not eliminates DNS errors from the logs, it just handles them better.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.

Testing

  • Added/updated automated tests
  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • Bug Fixes

    • Improved DNS resilience: automatic retries with backoff for DNS resolution failures (e.g., after device sleep), upfront validation of the configured server URL, and clearer failure reporting when retries are exhausted.
    • Certificate enrollment aborts a batch on terminal DNS failures and defers remaining certificates until connectivity is restored.
  • Tests

    • Added a unit test validating batch abort behavior on DNS resolution failure.

@getvictor
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses Android agent failures where DNS resolution can temporarily fail after waking from Doze mode by retrying requests that hit UnknownHostException.

Changes:

  • Add bounded retry + delay when UnknownHostException occurs during API calls.
  • Add an Android changelog entry describing the fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
android/changes/43462-android-cert-dns-retry Adds release-note entry for DNS retry behavior.
android/app/src/main/java/com/fleetdm/agent/ApiClient.kt Retries API requests on DNS resolution failures (UnknownHostException) with delay and logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread android/app/src/main/java/com/fleetdm/agent/ApiClient.kt
Comment thread android/app/src/main/java/com/fleetdm/agent/ApiClient.kt
Comment thread android/app/src/main/java/com/fleetdm/agent/ApiClient.kt
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1d2b65a-a83b-4a90-9e91-b8aca332399b

📥 Commits

Reviewing files that changed from the base of the PR and between 87d4741 and 464f902.

📒 Files selected for processing (4)
  • android/app/src/main/java/com/fleetdm/agent/CertificateOrchestrator.kt
  • android/app/src/test/java/com/fleetdm/agent/CertificateOrchestratorTest.kt
  • android/app/src/test/java/com/fleetdm/agent/scep/MockScepClient.kt
  • android/changes/43462-android-cert-dns-retry
✅ Files skipped from review due to trivial changes (1)
  • android/changes/43462-android-cert-dns-retry
🚧 Files skipped from review as they are similar to previous changes (1)
  • android/app/src/test/java/com/fleetdm/agent/CertificateOrchestratorTest.kt

Walkthrough

ApiClient.makeRequest validates the configured baseUrl (requires http/https), returns distinct failures for missing/invalid base URL, and adds a DNS-resolution retry loop that retries on UnknownHostException up to DNS_MAX_RETRIES with exponential backoff, recreating and disconnecting the HttpURLConnection each attempt and rethrowing CancellationException. CertificateOrchestrator.enrollCertificates now accumulates results incrementally and aborts the remaining certificate enrollments when a DNS-level UnknownHostException is encountered, returning only results processed so far. A unit test was added to verify batch abort on DNS failure and MockScepClient gained a configurable network-exception cause for tests. Release notes updated.

Possibly related PRs

  • fleetdm/fleet PR 42625: Modifies ApiClient.makeRequest headers/Content-Type behavior for GET requests — closely related to API request construction and template fetch logic.
  • fleetdm/fleet PR 38690: Changes ApiClient.makeRequest response/auth handling and unauthorized/reenroll flows — related to request/response control in makeRequest.
  • fleetdm/fleet PR 43402: Alters CertificateOrchestrator.enrollCertificates to process certificates sequentially — related to enrollment control flow and batch processing.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Android agent: retry DNS' clearly summarizes the main change, which is implementing DNS retry logic in the Android agent.
Description check ✅ Passed The description includes the related issue (#43462), explains the fix clearly, checks the changes file requirement, and indicates testing was performed, meeting template requirements.
Linked Issues check ✅ Passed The PR implements DNS retry logic in ApiClient and batch deferral in CertificateOrchestrator to handle DNS failures during Doze mode wake, directly addressing issue #43462's objective to retry DNS and defer remaining certs.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing DNS retry and batch deferral logic for Android certificate enrollment; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch victor/43462-android

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 53.62319% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.91%. Comparing base (98e08ad) to head (464f902).
⚠️ Report is 66 commits behind head on main.

Files with missing lines Patch % Lines
...d/app/src/main/java/com/fleetdm/agent/ApiClient.kt 44.82% 20 Missing and 12 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #43464      +/-   ##
==========================================
- Coverage   66.91%   66.91%   -0.01%     
==========================================
  Files        2596     2596              
  Lines      208103   208124      +21     
  Branches     9321     9326       +5     
==========================================
+ Hits       139248   139258      +10     
- Misses      56199    56208       +9     
- Partials    12656    12658       +2     
Flag Coverage Δ
android 46.60% <53.62%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@getvictor
Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@android/changes/43462-android-cert-dns-retry`:
- Line 1: Update the changelog entry "Fixed Android agent to retry DNS
resolution failures when waking from Doze mode." to also document the
certificate enrollment batch-deferral behavior: when a terminal DNS failure
occurs during enrollment the code now stops processing the remaining batch and
defers those certificates to the next run; mention that remaining certs are
deferred rather than retried immediately. Locate the changelog entry referencing
the 43462-android-cert-dns-retry change and append a short sentence describing
the new enroll/batch behavior and its user-visible effect.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a338e70b-e0d8-4789-a019-959168e7210b

📥 Commits

Reviewing files that changed from the base of the PR and between 98e08ad and 87d4741.

📒 Files selected for processing (4)
  • android/app/src/main/java/com/fleetdm/agent/ApiClient.kt
  • android/app/src/main/java/com/fleetdm/agent/CertificateOrchestrator.kt
  • android/app/src/test/java/com/fleetdm/agent/CertificateOrchestratorTest.kt
  • android/changes/43462-android-cert-dns-retry

Comment thread android/changes/43462-android-cert-dns-retry Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread android/app/src/main/java/com/fleetdm/agent/CertificateOrchestrator.kt Outdated
Comment thread android/app/src/main/java/com/fleetdm/agent/ApiClient.kt
@getvictor getvictor marked this pull request as ready for review April 13, 2026 23:08
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

cert.id to enrollCertificate(context, cert.id, cert.uuid, certificateInstaller)
val results = mutableMapOf<Int, CertificateEnrollmentHandler.EnrollmentResult>()
for (cert in hostCertificates) {
val result = enrollCertificate(context, cert.id, cert.uuid, certificateInstaller)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused why we have MAX_CERT_INSTALL_RETRIES=3 and MAX_RETRY_ATTEMPTS=5. When we call enrollCertificate and the result comes back as Failure, we call recordEnrollmentAttemptFailure. This increments retry counter. If we fail DNS 3 times the cert is permanently failed and this is reported to the server. But according to MAX_RETRY_ATTEMPTS we technically still have 2 more retries. I believe both counters are incremented on failure? Just trying to understand why we have 2 counters

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksykulev This is arguably more complex than it should be. Maybe an eng-init issue to simplify it? DNS failures do not cause the cert to fail permanently.

Three retry levels handle failures at different time scales during certificate enrollment.

Level 1: DNS retry in makeRequest (DNS_MAX_RETRIES = 7, ~127s)

Handles the Android Doze wake issue where the DNS resolver needs seconds to become ready after the device wakes. Retries the same HTTP call with exponential backoff (1s, 2s, 4s, 8s, 16s, 32s, 64s). Without this, a brief DNS blip would fail the HTTP call, abort the entire batch, and trigger a full worker restart.

If DNS stays down past 127s, the batch aborts and Level 3 takes over.

Level 2: Per-cert SCEP retry (MAX_CERT_INSTALL_RETRIES = 3)

Handles cert-specific SCEP enrollment failures (bad challenge, server rejection). Persists in DataStore per certificate ID across worker runs. After 3 failures for the same cert, marks it permanently failed and reports to the server. Prevents a single bad cert from retrying forever.

Not involved in DNS failures (the early return in enrollCertificate bypasses recordEnrollmentAttemptFailure).

Level 3: Worker burst retry (MAX_RETRY_ATTEMPTS = 5)

Caps the rapid WorkManager retry burst and resets to the 15-minute periodic schedule. Without this, WorkManager's exponential backoff would escalate indefinitely (10s, 20s, 40s, 80s, 160s, ... up to 5 hours), and the periodic schedule would be paused while the retry chain is active.

After 5 rapid retries, the worker returns Result.success() to reset the retry state and hand control back to the periodic schedule, keeping the retry cadence at a predictable 15 minutes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫠

@getvictor getvictor requested a review from ksykulev April 14, 2026 20:00
@getvictor getvictor merged commit bc6e731 into main Apr 15, 2026
18 checks passed
@getvictor getvictor deleted the victor/43462-android branch April 15, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Android: Running gitops while android certs are being applied may result in Unable to resolve host "...": No address associated with hostname error

3 participants