Skip to content

Implement fallback to WireServer when deploying SSH keys#3603

Open
narrieta wants to merge 4 commits into
Azure:developfrom
narrieta:certificates
Open

Implement fallback to WireServer when deploying SSH keys#3603
narrieta wants to merge 4 commits into
Azure:developfrom
narrieta:certificates

Conversation

@narrieta
Copy link
Copy Markdown
Member

As part of #3484, the waagent provision handler no longer pulls the entire goal state. As a side effect, when SSH keys are not available in ovfenv.xml, the fallback to the WireServer's Certificates package is no longer available. This PR adds that fallback.

Copilot AI review requested due to automatic review settings April 29, 2026 20:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds back the ability for the default (waagent) provisioning flow to retrieve SSH key material from the WireServer Certificates package when ovf-env.xml does not contain the full public key value, aligning behavior with pre-#3484 provisioning.

Changes:

  • Invoke a new _download_ssh_keys() step during user account configuration to fetch the Certificates goal state property when needed for SSH key deployment.
  • Add unit tests intended to verify when the Certificates package is (and isn’t) fetched.
  • Update existing ovf-env test fixtures to remove <KeyPairs> so unrelated provisioning tests don’t implicitly require certificate downloads.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
azurelinuxagent/pa/provision/default.py Adds conditional WireServer certificates download prior to deploying SSH keys.
tests/pa/test_provision.py Adds tests for the new certificate-download decision logic using mock_wire_protocol.
tests/data/ovf-env.xml Removes <KeyPairs> from baseline fixture to avoid triggering certificate download during unrelated tests.
tests/data/ovf-env-2.xml Same as above for the “pga_true” fixture.
tests/data/ovf-env-4.xml Same as above for the “pga_bad” fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/pa/test_provision.py
Comment thread tests/pa/test_provision.py Outdated
Comment thread azurelinuxagent/pa/provision/default.py
Comment thread azurelinuxagent/pa/provision/default.py
self.deploy_ssh_keypairs(ovfenv)

def _download_ssh_keys(self, ovfenv):
#
Copy link
Copy Markdown
Member Author

@narrieta narrieta Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reason of this check, see the implementation of self.deploy_ssh_pubkeys() and self.deploy_ssh_keypairs(). In summary, the former will look for /var/lib/waagent/*.crt when ovfenv.xml doesn't have a value for the key, and the latter will look for /var/lib/waagent/*.prv when ovfenv.xml contains any key pairs.

For the format of ovfenv.xml, see the unit tests.

if not download_certificates:
return

#
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, the PA would retrieve the goal state as part of protocol detection, which retries 360 times, with a 10 second period. If that fails, the process exits and is restarted by the system, which goes again into the 360 retries, etc.

I feel just a few tries are enough, let me know if you think otherwise; thanks.

Comment thread tests/data/ovf-env-2.xml
<Value>ssh-rsa AAAANOTAREALKEY== foo@bar.local</Value>
</PublicKey>
</PublicKeys>
<KeyPairs>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key pairs in the unit tests are not used, and with the new code they would trigger the Certificates download. To fix this, quite a few tests need to be rewritten to mock that download. Since the keys are not used by the tests, I decided to remove them instead.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.99%. Comparing base (3aebcdd) to head (d1dd817).
⚠️ Report is 553 commits behind head on develop.

Files with missing lines Patch % Lines
azurelinuxagent/pa/provision/default.py 64.70% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3603      +/-   ##
===========================================
+ Coverage    71.97%   72.99%   +1.01%     
===========================================
  Files          103      120      +17     
  Lines        15692    18273    +2581     
  Branches      2486     2462      -24     
===========================================
+ Hits         11295    13339    +2044     
- Misses        3881     4340     +459     
- Partials       516      594      +78     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

with TestProvision._create_provision_handler_with_mock_protocol() as (handler, protocol):
handler._download_ssh_keys(ovfenv)
self.assertEqual(1, protocol.mock_wire_data.call_counts['certificates'], "The Certificates package should have been retrieved")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cam we add test around exception when downloading certs?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are covered in the unit tests for the goal state.

Are you thinking of a specific scenario?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean the retry logic in _download_ssh_keys?

Copy link
Copy Markdown
Contributor

@nagworld9 nagworld9 May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry logic in download_ssh_keys. We expect next operation would raise error if we failed to download with all retries.

can we test that we don't re-raise error here? It would be good to test if next operation rising that provision error if not already present in the unit tests

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i can add a test for the retry logic

self.deploy_ssh_pubkeys(ovfenv)
self.deploy_ssh_keypairs(ovfenv)

def _download_ssh_keys(self, ovfenv):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since its not download always, can we rename to _download_ssh_keys_if_needed or something like that?

<SSH>
<PublicKeys>
<PublicKey>
<Fingerprint>EB0C0AB4B2D5FC35F2F0658D19F44C8283E2DD62</Fingerprint>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that the customer should only provide one of fingerprint and value, or is it acceptable to provide both?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original sample file from which I copied this one includes both. In practice, I believe only 1 would be populated.

Now, the logic in OSUtil.deploy_ssh_pubkey() prioritizes the value over the thumbprint, so in terms of the unit test that uses this new file, having both is a good test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants