You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/copilot-instructions.md
+100-2Lines changed: 100 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -136,6 +136,90 @@ Analyze PRs for these compatibility scenarios:
136
136
- Package manager assumptions (apt vs dnf/tdnf)
137
137
- Systemd differences between distributions
138
138
139
+
**5. Package/Dependency Update PRs (Renovate)**
140
+
-**Context**: Renovate bot automatically creates PRs to update component versions in `parts/common/components.json`. These components are cached on VHDs during build and directly affect node stability, GPU workloads, networking, and security. Updated packages are downloaded from `packages.aks.azure.com` or upstream registries during VHD build.
141
+
-**What to check**: Every version bump—even patch versions—can introduce regressions that affect production nodes.
142
+
-**Analysis steps for every package update PR**:
143
+
1.**Identify the component and version change**: Parse the diff in `parts/common/components.json` to extract exact old → new versions for each OS/release entry.
144
+
2.**Determine the update type**: Classify as major, minor, or patch using semver. Major and minor updates carry higher risk than patch updates.
145
+
3.**Research upstream changelog**: Look up the project's release notes, changelog, or GitHub releases to understand what changed between the old and new versions. Summarize:
146
+
- New features introduced
147
+
- Bug fixes included
148
+
- Breaking changes or deprecations
149
+
- Security fixes (CVEs patched)
150
+
4.**Assess OS coverage**: Check if the update covers all OS variants where the component is used (Ubuntu 22.04, 24.04, Azure Linux 3.0, etc.). Flag if some OS entries are updated but others are not — partial updates can cause inconsistency across node pools.
151
+
5.**Evaluate VHD size impact**: For components downloaded as binaries or packages, consider whether the new version significantly increases VHD size. Large size increases can affect VHD build time and storage costs.
152
+
6.**Check for configuration or API changes**: If the component exposes configuration files, CLI flags, systemd units, or APIs consumed by CSE scripts, verify that the update doesn't change defaults or remove options that provisioning scripts depend on.
153
+
7.**Verify download URL validity**: Confirm that the `downloadLocation` and `downloadURIs` structure in components.json remains valid for the new version. New versions sometimes change the artifact naming convention or repository layout.
154
+
155
+
-**Risk assessment for package updates**:
156
+
- 🔴 **High Risk**: Major version bumps, components critical to node boot (kubelet, containerd, runc), GPU drivers (nvidia-driver, dcgm-exporter), or networking (azure-cni, cilium). Also high risk if upstream changelog mentions breaking changes or behavioral changes.
157
+
- 🟡 **Medium Risk**: Minor version bumps of non-critical components, updates that only affect specific OS variants, or updates where upstream changelog shows feature additions that could subtly change behavior.
158
+
- 🟢 **Low Risk**: Patch version bumps with only bug fixes or security patches, no breaking changes in upstream changelog, and full OS coverage.
159
+
160
+
-**Review output for package update PRs must include a detailed version diff analysis**:
**OS variants NOT updated**: <list any missing, or "None — full coverage">
168
+
```
169
+
170
+
**Detailed changelog between versions:**
171
+
Use web search, GitHub releases, or upstream project documentation to find the exact differences between the old and new version. Present each change as a line item with its own risk tag:
172
+
173
+
```
174
+
### Changes between X.Y.Z and A.B.C
175
+
176
+
| Change | Description | Risk |
177
+
|--------|-------------|------|
178
+
| Feature | <brief description of new feature> | 🟢 Low / 🟡 Medium / 🔴 High |
179
+
| Bug fix | <brief description of bug fixed> | 🟢 Low / 🟡 Medium / 🔴 High |
180
+
| Breaking | <description of breaking change> | 🔴 High |
181
+
| Security | CVE-YYYY-XXXXX: <description> | 🟢 Low / 🟡 Medium / 🔴 High |
182
+
| Deprecation | <what was deprecated and migration path> | 🟡 Medium / 🔴 High |
183
+
| Config change | <default value changed or option removed> | 🟡 Medium / 🔴 High |
184
+
| Performance | <perf improvement or regression> | 🟢 Low / 🟡 Medium |
185
+
```
186
+
187
+
For each individual change, assess risk by considering:
188
+
- Does it alter runtime behavior on AKS nodes?
189
+
- Does it change CLI flags, config file formats, or systemd unit behavior that CSE scripts depend on?
190
+
- Does it affect GPU workloads, networking, container runtime, or kubelet interaction?
191
+
- Could it increase binary size significantly (VHD bloat)?
192
+
- Does it introduce new system dependencies or kernel requirements?
193
+
194
+
**If upstream changelog is unavailable**, explicitly state: _"Upstream changelog not found for this version range. Manual testing recommended before merge."_
195
+
196
+
**Overall risk assessment:**
197
+
```
198
+
### Overall Risk: 🟢 Low / 🟡 Medium / 🔴 High
199
+
**Justification**: <1-2 sentence summary of why this risk level was chosen>
200
+
**Recommendation**: Approve / Request more info / Flag for manual testing
201
+
```
202
+
203
+
**Example** (for a PR like dcgm-exporter 4.7.1 → 4.8.0):
| Deprecation | Removed legacy CSV export format | 🟡 Medium |
217
+
218
+
### Overall Risk: 🟡 Medium
219
+
**Justification**: Minor version bump of GPU monitoring component. No breaking changes to core metrics pipeline, but Azure Linux 3.0 is not updated which creates version skew across OS variants.
220
+
**Recommendation**: Approve, but file follow-up issue for Azure Linux 3.0 alignment.
221
+
```
222
+
139
223
### Analysis Approach
140
224
141
225
**Dynamic Dependency Tracing**:
@@ -169,8 +253,22 @@ Provide targeted inline comments on specific lines where you detect issues:
169
253
- Include actionable next steps (e.g., "Verify this function is not used by checking references in `vhdbuilder/packer/`")
170
254
171
255
**Risk indicators to include:**
172
-
- Severity: 🔴 High Risk | 🟡 Medium Risk | 🟢 Low Risk
173
-
- Category: Script Logic | Cross-OS | External Dependency | Test Coverage | etc.
256
+
257
+
-**Severity** (pick one):
258
+
- 🔴 **High Risk** — Could break production VM provisioning, cause node failures, or introduce security vulnerabilities
259
+
- 🟡 **Medium Risk** — Could cause issues in specific configurations, edge cases, or degrade performance
260
+
- 🟢 **Low Risk** — Unlikely to cause issues but worth noting for awareness
Copy file name to clipboardExpand all lines: AGENTS.md
+100-2Lines changed: 100 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -136,6 +136,90 @@ Analyze PRs for these compatibility scenarios:
136
136
- Package manager assumptions (apt vs dnf/tdnf)
137
137
- Systemd differences between distributions
138
138
139
+
**5. Package/Dependency Update PRs (Renovate)**
140
+
-**Context**: Renovate bot automatically creates PRs to update component versions in `parts/common/components.json`. These components are cached on VHDs during build and directly affect node stability, GPU workloads, networking, and security. Updated packages are downloaded from `packages.aks.azure.com` or upstream registries during VHD build.
141
+
-**What to check**: Every version bump—even patch versions—can introduce regressions that affect production nodes.
142
+
-**Analysis steps for every package update PR**:
143
+
1.**Identify the component and version change**: Parse the diff in `parts/common/components.json` to extract exact old → new versions for each OS/release entry.
144
+
2.**Determine the update type**: Classify as major, minor, or patch using semver. Major and minor updates carry higher risk than patch updates.
145
+
3.**Research upstream changelog**: Look up the project's release notes, changelog, or GitHub releases to understand what changed between the old and new versions. Summarize:
146
+
- New features introduced
147
+
- Bug fixes included
148
+
- Breaking changes or deprecations
149
+
- Security fixes (CVEs patched)
150
+
4.**Assess OS coverage**: Check if the update covers all OS variants where the component is used (Ubuntu 22.04, 24.04, Azure Linux 3.0, etc.). Flag if some OS entries are updated but others are not — partial updates can cause inconsistency across node pools.
151
+
5.**Evaluate VHD size impact**: For components downloaded as binaries or packages, consider whether the new version significantly increases VHD size. Large size increases can affect VHD build time and storage costs.
152
+
6.**Check for configuration or API changes**: If the component exposes configuration files, CLI flags, systemd units, or APIs consumed by CSE scripts, verify that the update doesn't change defaults or remove options that provisioning scripts depend on.
153
+
7.**Verify download URL validity**: Confirm that the `downloadLocation` and `downloadURIs` structure in components.json remains valid for the new version. New versions sometimes change the artifact naming convention or repository layout.
154
+
155
+
-**Risk assessment for package updates**:
156
+
- 🔴 **High Risk**: Major version bumps, components critical to node boot (kubelet, containerd, runc), GPU drivers (nvidia-driver, dcgm-exporter), or networking (azure-cni, cilium). Also high risk if upstream changelog mentions breaking changes or behavioral changes.
157
+
- 🟡 **Medium Risk**: Minor version bumps of non-critical components, updates that only affect specific OS variants, or updates where upstream changelog shows feature additions that could subtly change behavior.
158
+
- 🟢 **Low Risk**: Patch version bumps with only bug fixes or security patches, no breaking changes in upstream changelog, and full OS coverage.
159
+
160
+
-**Review output for package update PRs must include a detailed version diff analysis**:
**OS variants NOT updated**: <list any missing, or "None — full coverage">
168
+
```
169
+
170
+
**Detailed changelog between versions:**
171
+
Use web search, GitHub releases, or upstream project documentation to find the exact differences between the old and new version. Present each change as a line item with its own risk tag:
172
+
173
+
```
174
+
### Changes between X.Y.Z and A.B.C
175
+
176
+
| Change | Description | Risk |
177
+
|--------|-------------|------|
178
+
| Feature | <brief description of new feature> | 🟢 Low / 🟡 Medium / 🔴 High |
179
+
| Bug fix | <brief description of bug fixed> | 🟢 Low / 🟡 Medium / 🔴 High |
180
+
| Breaking | <description of breaking change> | 🔴 High |
181
+
| Security | CVE-YYYY-XXXXX: <description> | 🟢 Low / 🟡 Medium / 🔴 High |
182
+
| Deprecation | <what was deprecated and migration path> | 🟡 Medium / 🔴 High |
183
+
| Config change | <default value changed or option removed> | 🟡 Medium / 🔴 High |
184
+
| Performance | <perf improvement or regression> | 🟢 Low / 🟡 Medium |
185
+
```
186
+
187
+
For each individual change, assess risk by considering:
188
+
- Does it alter runtime behavior on AKS nodes?
189
+
- Does it change CLI flags, config file formats, or systemd unit behavior that CSE scripts depend on?
190
+
- Does it affect GPU workloads, networking, container runtime, or kubelet interaction?
191
+
- Could it increase binary size significantly (VHD bloat)?
192
+
- Does it introduce new system dependencies or kernel requirements?
193
+
194
+
**If upstream changelog is unavailable**, explicitly state: _"Upstream changelog not found for this version range. Manual testing recommended before merge."_
195
+
196
+
**Overall risk assessment:**
197
+
```
198
+
### Overall Risk: 🟢 Low / 🟡 Medium / 🔴 High
199
+
**Justification**: <1-2 sentence summary of why this risk level was chosen>
200
+
**Recommendation**: Approve / Request more info / Flag for manual testing
201
+
```
202
+
203
+
**Example** (for a PR like dcgm-exporter 4.7.1 → 4.8.0):
| Deprecation | Removed legacy CSV export format | 🟡 Medium |
217
+
218
+
### Overall Risk: 🟡 Medium
219
+
**Justification**: Minor version bump of GPU monitoring component. No breaking changes to core metrics pipeline, but Azure Linux 3.0 is not updated which creates version skew across OS variants.
220
+
**Recommendation**: Approve, but file follow-up issue for Azure Linux 3.0 alignment.
221
+
```
222
+
139
223
### Analysis Approach
140
224
141
225
**Dynamic Dependency Tracing**:
@@ -169,8 +253,22 @@ Provide targeted inline comments on specific lines where you detect issues:
169
253
- Include actionable next steps (e.g., "Verify this function is not used by checking references in `vhdbuilder/packer/`")
170
254
171
255
**Risk indicators to include:**
172
-
- Severity: 🔴 High Risk | 🟡 Medium Risk | 🟢 Low Risk
173
-
- Category: Script Logic | Cross-OS | External Dependency | Test Coverage | etc.
256
+
257
+
-**Severity** (pick one):
258
+
- 🔴 **High Risk** — Could break production VM provisioning, cause node failures, or introduce security vulnerabilities
259
+
- 🟡 **Medium Risk** — Could cause issues in specific configurations, edge cases, or degrade performance
260
+
- 🟢 **Low Risk** — Unlikely to cause issues but worth noting for awareness
0 commit comments