Implement fallback to WireServer when deploying SSH keys#3603
Conversation
There was a problem hiding this comment.
Pull request overview
Adds back the ability for the default (waagent) provisioning flow to retrieve SSH key material from the WireServer Certificates package when ovf-env.xml does not contain the full public key value, aligning behavior with pre-#3484 provisioning.
Changes:
- Invoke a new
_download_ssh_keys()step during user account configuration to fetch the Certificates goal state property when needed for SSH key deployment. - Add unit tests intended to verify when the Certificates package is (and isn’t) fetched.
- Update existing ovf-env test fixtures to remove
<KeyPairs>so unrelated provisioning tests don’t implicitly require certificate downloads.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
azurelinuxagent/pa/provision/default.py |
Adds conditional WireServer certificates download prior to deploying SSH keys. |
tests/pa/test_provision.py |
Adds tests for the new certificate-download decision logic using mock_wire_protocol. |
tests/data/ovf-env.xml |
Removes <KeyPairs> from baseline fixture to avoid triggering certificate download during unrelated tests. |
tests/data/ovf-env-2.xml |
Same as above for the “pga_true” fixture. |
tests/data/ovf-env-4.xml |
Same as above for the “pga_bad” fixture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.deploy_ssh_keypairs(ovfenv) | ||
|
|
||
| def _download_ssh_keys(self, ovfenv): | ||
| # |
There was a problem hiding this comment.
For the reason of this check, see the implementation of self.deploy_ssh_pubkeys() and self.deploy_ssh_keypairs(). In summary, the former will look for /var/lib/waagent/*.crt when ovfenv.xml doesn't have a value for the key, and the latter will look for /var/lib/waagent/*.prv when ovfenv.xml contains any key pairs.
For the format of ovfenv.xml, see the unit tests.
| if not download_certificates: | ||
| return | ||
|
|
||
| # |
There was a problem hiding this comment.
Originally, the PA would retrieve the goal state as part of protocol detection, which retries 360 times, with a 10 second period. If that fails, the process exits and is restarted by the system, which goes again into the 360 retries, etc.
I feel just a few tries are enough, let me know if you think otherwise; thanks.
| <Value>ssh-rsa AAAANOTAREALKEY== foo@bar.local</Value> | ||
| </PublicKey> | ||
| </PublicKeys> | ||
| <KeyPairs> |
There was a problem hiding this comment.
The key pairs in the unit tests are not used, and with the new code they would trigger the Certificates download. To fix this, quite a few tests need to be rewritten to mock that download. Since the keys are not used by the tests, I decided to remove them instead.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3603 +/- ##
===========================================
+ Coverage 71.97% 72.99% +1.01%
===========================================
Files 103 120 +17
Lines 15692 18273 +2581
Branches 2486 2462 -24
===========================================
+ Hits 11295 13339 +2044
- Misses 3881 4340 +459
- Partials 516 594 +78 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| with TestProvision._create_provision_handler_with_mock_protocol() as (handler, protocol): | ||
| handler._download_ssh_keys(ovfenv) | ||
| self.assertEqual(1, protocol.mock_wire_data.call_counts['certificates'], "The Certificates package should have been retrieved") | ||
|
|
There was a problem hiding this comment.
cam we add test around exception when downloading certs?
There was a problem hiding this comment.
Those are covered in the unit tests for the goal state.
Are you thinking of a specific scenario?
There was a problem hiding this comment.
Or do you mean the retry logic in _download_ssh_keys?
There was a problem hiding this comment.
Retry logic in download_ssh_keys. We expect next operation would raise error if we failed to download with all retries.
can we test that we don't re-raise error here? It would be good to test if next operation rising that provision error if not already present in the unit tests
There was a problem hiding this comment.
yes, i can add a test for the retry logic
| self.deploy_ssh_pubkeys(ovfenv) | ||
| self.deploy_ssh_keypairs(ovfenv) | ||
|
|
||
| def _download_ssh_keys(self, ovfenv): |
There was a problem hiding this comment.
since its not download always, can we rename to _download_ssh_keys_if_needed or something like that?
| <SSH> | ||
| <PublicKeys> | ||
| <PublicKey> | ||
| <Fingerprint>EB0C0AB4B2D5FC35F2F0658D19F44C8283E2DD62</Fingerprint> |
There was a problem hiding this comment.
Is it expected that the customer should only provide one of fingerprint and value, or is it acceptable to provide both?
There was a problem hiding this comment.
The original sample file from which I copied this one includes both. In practice, I believe only 1 would be populated.
Now, the logic in OSUtil.deploy_ssh_pubkey() prioritizes the value over the thumbprint, so in terms of the unit test that uses this new file, having both is a good test case.
As part of #3484, the waagent provision handler no longer pulls the entire goal state. As a side effect, when SSH keys are not available in ovfenv.xml, the fallback to the WireServer's Certificates package is no longer available. This PR adds that fallback.