Implement fallback to WireServer when deploying SSH keys by narrieta · Pull Request #3603 · Azure/WALinuxAgent

narrieta · 2026-04-29T20:21:17Z

As part of #3484, the waagent provision handler no longer pulls the entire goal state. As a side effect, when SSH keys are not available in ovfenv.xml, the fallback to the WireServer's Certificates package is no longer available. This PR adds that fallback.

Copilot

Pull request overview

Adds back the ability for the default (waagent) provisioning flow to retrieve SSH key material from the WireServer Certificates package when ovf-env.xml does not contain the full public key value, aligning behavior with pre-#3484 provisioning.

Changes:

Invoke a new _download_ssh_keys() step during user account configuration to fetch the Certificates goal state property when needed for SSH key deployment.
Add unit tests intended to verify when the Certificates package is (and isn’t) fetched.
Update existing ovf-env test fixtures to remove <KeyPairs> so unrelated provisioning tests don’t implicitly require certificate downloads.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`azurelinuxagent/pa/provision/default.py`	Adds conditional WireServer certificates download prior to deploying SSH keys.
`tests/pa/test_provision.py`	Adds tests for the new certificate-download decision logic using `mock_wire_protocol`.
`tests/data/ovf-env.xml`	Removes `<KeyPairs>` from baseline fixture to avoid triggering certificate download during unrelated tests.
`tests/data/ovf-env-2.xml`	Same as above for the “pga_true” fixture.
`tests/data/ovf-env-4.xml`	Same as above for the “pga_bad” fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

narrieta · 2026-04-29T20:30:15Z

        self.deploy_ssh_keypairs(ovfenv)

+    def _download_ssh_keys(self, ovfenv):
+        #


For the reason of this check, see the implementation of self.deploy_ssh_pubkeys() and self.deploy_ssh_keypairs(). In summary, the former will look for /var/lib/waagent/*.crt when ovfenv.xml doesn't have a value for the key, and the latter will look for /var/lib/waagent/*.prv when ovfenv.xml contains any key pairs.

For the format of ovfenv.xml, see the unit tests.

narrieta · 2026-04-29T20:35:52Z

+        if not download_certificates:
+            return
+
+        #


Originally, the PA would retrieve the goal state as part of protocol detection, which retries 360 times, with a 10 second period. If that fails, the process exits and is restarted by the system, which goes again into the 360 retries, etc.

I feel just a few tries are enough, let me know if you think otherwise; thanks.

I checked that cloud-init also retries for a long period of time, so I decided to use the same strategy as protocol detection

narrieta · 2026-04-29T20:38:00Z

              <Value>ssh-rsa AAAANOTAREALKEY== foo@bar.local</Value>
            </PublicKey>
          </PublicKeys>
-          <KeyPairs>


The key pairs in the unit tests are not used, and with the new code they would trigger the Certificates download. To fix this, quite a few tests need to be rewritten to mock that download. Since the keys are not used by the tests, I decided to remove them instead.

codecov · 2026-04-29T21:00:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (3aebcdd) to head (bbc1df9).
⚠️ Report is 561 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3603      +/-   ##
===========================================
+ Coverage    71.97%   73.46%   +1.48%     
===========================================
  Files          103      122      +19     
  Lines        15692    18608    +2916     
  Branches      2486     2488       +2     
===========================================
+ Hits         11295    13670    +2375     
- Misses        3881     4341     +460     
- Partials       516      597      +81

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nagworld9 · 2026-05-04T20:15:05Z

        self.deploy_ssh_pubkeys(ovfenv)
        self.deploy_ssh_keypairs(ovfenv)

+    def _download_ssh_keys(self, ovfenv):


since its not download always, can we rename to _download_ssh_keys_if_needed or something like that?

maddieford · 2026-05-05T14:58:20Z

+        <SSH>
+          <PublicKeys>
+            <PublicKey>
+              <Fingerprint>EB0C0AB4B2D5FC35F2F0658D19F44C8283E2DD62</Fingerprint>


Is it expected that the customer should only provide one of fingerprint and value, or is it acceptable to provide both?

The original sample file from which I copied this one includes both. In practice, I believe only 1 would be populated.

Now, the logic in OSUtil.deploy_ssh_pubkey() prioritizes the value over the thumbprint, so in terms of the unit test that uses this new file, having both is a good test case.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

narrieta · 2026-06-08T23:07:16Z

+        with TestProvision._create_provision_handler_with_mock_protocol() as (handler, protocol):
+            protocol.set_http_handlers(http_get_handler=mock_http_get)
+            with patch('azurelinuxagent.pa.provision.default.PROBE_INTERVAL', 0):  # set the delay between retries to 0
+                handler._download_ssh_keys_if_needed(ovfenv)
+        self.assertEqual(2 * MAX_RETRY, mock_http_get.call_count, "Expected maximum number of retries ({0}) to have been attempted".format(MAX_RETRY))  # times 2 since two cyphers are attempted for FIPS support
+        ssh_key_path = os.path.join(conf.get_lib_dir(), '8979F1AC8C4215827BF3B5A403E6137B504D02A4.crt')


won't fix, test takes .3 secs to execute and want to avoid excessive use of mocks

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

narrieta · 2026-06-08T23:50:50Z

            # A failure to download the certificates won't necessarily produce an error. Certificates do not change often and they may have
-            # already been saved to disk on a previous goal state. We simply report the error and continue processing the goal_state; later on,
+            # already been saved to disk on a previous goal state. Re-raise the exception only if explicitly requested via the ignore_download_errors
+            # parameter, otherwise simply report the error and continue processing the goal_state; later on,
            # before extensions are processed, the Agent checks whether the required certificates are already on disk and refreshes the goal
-            # state if they are not.
+            # state if they are not


won't fix; use of a boolean should be easy to understand by future maintaners

maddieford · 2026-06-11T14:26:55Z


            xml_doc = parse_doc(xml_text)
            data = findtext(xml_doc, "Data")
            if data is None:


Why are download failures treated differently than missing data element?

Right now we continue in the case of missing data, but raise in the case of download/decryption errors (when ignore_download_error is False).

They would be different: a download error vs a successful request with no data. The way the code looks, it seems that it is (or it was) possible to get a response without any data. I've never seen such a response.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

narrieta@microsoft added 2 commits April 29, 2026 11:55

Implement fallback to WireServer when deploying SSH keys

a101540

comment

54d581f

Copilot AI review requested due to automatic review settings April 29, 2026 20:21

narrieta requested review from maddieford and nagworld9 as code owners April 29, 2026 20:21

Copilot started reviewing on behalf of narrieta April 29, 2026 20:22 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/pa/test_provision.py

Comment thread tests/pa/test_provision.py Outdated

Comment thread azurelinuxagent/pa/provision/default.py

Comment thread azurelinuxagent/pa/provision/default.py Outdated

narrieta commented Apr 29, 2026

View reviewed changes

review

d1dd817

Merge branch 'develop' into certificates

42f6b6f

nagworld9 reviewed May 4, 2026

View reviewed changes

Comment thread tests/pa/test_provision.py

nagworld9 reviewed May 4, 2026

View reviewed changes

maddieford reviewed May 5, 2026

View reviewed changes

improve retry logic

c1e150d

Copilot AI review requested due to automatic review settings June 8, 2026 19:09

Copilot started reviewing on behalf of narrieta June 8, 2026 19:09 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

narrieta@microsoft and others added 2 commits June 8, 2026 16:12

code review

3556e21

Merge branch 'develop' into certificates

031fea4

Copilot AI review requested due to automatic review settings June 8, 2026 23:43

Copilot started reviewing on behalf of narrieta June 8, 2026 23:43 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

code review

bbc1df9

maddieford reviewed Jun 11, 2026

View reviewed changes

maddieford approved these changes Jun 11, 2026

View reviewed changes

nagworld9 approved these changes Jun 11, 2026

View reviewed changes

Merge branch 'develop' into certificates

b2c7416

Copilot AI review requested due to automatic review settings June 12, 2026 14:34

Copilot started reviewing on behalf of narrieta June 12, 2026 14:35 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

narrieta merged commit c1988cf into Azure:develop Jun 12, 2026
14 checks passed

narrieta deleted the certificates branch June 12, 2026 15:28

Uh oh!

Conversation

narrieta commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narrieta Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

narrieta Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading