Skip to content

Commit ddf01f9

Browse files
authored
fix: replace copilot-instructions.md symlink with actual file content (#7801)
1 parent 8aa8301 commit ddf01f9

1 file changed

Lines changed: 185 additions & 1 deletion

File tree

.github/copilot-instructions.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

.github/copilot-instructions.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Overview
2+
3+
AgentBaker repo has 2 main services discussed below:
4+
5+
- VHD Builder
6+
- AgentBaker Service
7+
8+
## VHD Builder
9+
10+
It builds VHDs using Packer for base OS: Windows, Azure Linux/Mariner and Ubuntu. For each OS there are multiple supported versions (windows 2019, 2022, ubuntu 2004, 2204 etc). The VHDs are base images for a node in an aks cluster.
11+
12+
VHDs are built using [Packer](https://developer.hashicorp.com/packer/docs) in [vhdbuilder](./vhdbuilder/).
13+
14+
Windows VHD is configured through [VHD](./vhdbuilder/packer/windows/windows-vhd-configuration.ps1)
15+
16+
## AgentBaker Service
17+
18+
[apiserver](./apiserver/) is `go` based webserver. It receives request from external client and generates CSE and CustomData to be used on the VHD when a new node is created / provisioned.
19+
20+
windows generates its CSE package using [script](./parts/windows/kuberneteswindowssetup.ps1).
21+
22+
The webserver is also used to determine the latest version of Linux VHDs available for provisioning within AKS clusters.
23+
24+
## Code Structure
25+
26+
[parts](./parts/) serves both AgentBaker Service and VHD build. AgentBaker service and VHDs are coupled because of this shared component. When building VHD, packer maps and renames scripts from [parts](./parts/) depending on the OS / versions. The mappings can be found at [packer](./vhdbuilder/packer/).
27+
28+
> **IMPORTANT**: When making changes to files in the `parts` or `pkg` directories, you must run `make generate` afterward to regenerate the snapshot test data. This ensures consistency between the code and tests and prevents regressions.
29+
30+
Windows uses a different folder [cse](./staging/cse/windows/) for almost the same purpose. There are subtle differences as windows CSEs can be downloaded as a zip file during provisioning time due to restrictions on the file size on Windows system, while for linux based systems the cse/custom data are dropped in during provisioning time.
31+
32+
## Deployment and Release
33+
34+
The VHD build is triggered by Azure Devops [pipelines](.pipelines/). For release, the pipelines following the same templates for different OS versions:
35+
36+
- [linux/ubuntu](./.pipelines/templates/.builder-release-template.yaml)
37+
- [windows](./.pipelines/templates/.builder-release-template-windows.yaml)
38+
39+
you can reason the steps by following the steps defined in the pipeline.
40+
41+
Tags of AgentBaker and corresponding Linux VHDs are released every week. Linux VHDs are built with a particular image version in the YYYYMM.DD.PATCH format. All Linux VHD versions correspond to a particular tag of the AgentBaker go module. AgentBaker go module tags follow the format v0.YYYYMMDD.PATCH. The mapping between AgentBaker tag and Linux VHD version is defined within [linux_sig_version.json](./pkg/agent/datamodel/linux_sig_version.json).
42+
43+
Windows VHD are released separately, following windows patch tuesday schedule.
44+
45+
## Guidelines
46+
47+
### SRE Guidelines
48+
49+
The operational goals of this project are:
50+
51+
- achieve consistency across different OS as much as possible
52+
- avoid functional regression when introducing new features (component updates, new drivers, new binaries), ensure that all supported OS / versions are tested
53+
- avoid VHD build performance regressions when making any changes
54+
- avoid node provisioning performance regression when making any changes
55+
56+
When making changes, reason whether the file is used in VHD building stage, or provision stage, or both. Make sure the changes are valid in its life stage. as an example, [windows-vhd-configuration.ps1](./vhdbuilder/packer/windows/windows-vhd-configuration.ps1) defines container images to be cached in VHD, while [configure-windows-vhd.ps1](./vhdbuilder/packer/windows/configure-windows-vhd.ps1) executes commands at provision time.
57+
58+
One way to debug / explore / just for fun is to run [e2e](./e2e/) tests. To run locally, follow the readme file under that folder.
59+
60+
The SRE guidelines ground other coding guidelines and practices.
61+
62+
### Golang Guidelines
63+
64+
- Follow Go best practice
65+
- Use vanilla go test framework
66+
67+
### PowerShell Guidelines
68+
69+
- follow PowerShell best practices
70+
71+
### ShellScripts Guidelines
72+
73+
- use shellcheck for sanity checking
74+
- use ShellSpec for testing
75+
- the shell scripts are used on both azure linux/mariner and ubuntu and cross platform portability is critical.
76+
- when using functions defined in other files, ensure it is sourced properly.
77+
- use local variables rather than constants when their scoping allows for it.
78+
- avoid using variables declared inside another function, even they are visible. It is hard to reason and might introduce subtle bugs.
79+
80+
## Pull Request Review Guidelines
81+
82+
When reviewing pull requests, perform breaking change analysis to prevent regressions. VHDs remain in production for 6 months, so backward compatibility is critical.
83+
84+
**Review Approach**: Focus on high-level architecture, security vulnerabilities, and logic bugs. Apply deep reasoning similar to advanced models (e.g., Claude Opus) - don't just pattern match, but truly understand the code's intent, dependencies, and potential failure modes.
85+
86+
### Breaking Change Detection
87+
88+
Analyze PRs for these compatibility scenarios:
89+
90+
**1. Linux Provisioning Script Changes**
91+
- **Context**: Scripts in `parts/linux/cloud-init/artifacts/` run during critical VM bootstrap and are used in both:
92+
- VHD build (uploaded via packer configs in `vhdbuilder/packer/*.json`)
93+
- VM provisioning (CSE - embedded in Go service via `pkg/agent/const.go`)
94+
- Versions synchronized via `pkg/agent/datamodel/linux_sig_version.json`
95+
- **What to check**: Changes that could break VM provisioning in production
96+
- **Breaking signals**:
97+
- **Script logic errors**: Syntax errors, wrong commands, incorrect flags, broken pipes
98+
- **Dependency issues**:
99+
- Calling functions before they're sourced
100+
- Using variables declared in other functions
101+
- Removing `source` statements that break dependency chains
102+
- **Cross-distro compatibility**:
103+
- Commands that don't work on both Ubuntu and Azure Linux/Mariner (check distro-specific variants: `ubuntu/`, `mariner/`)
104+
- Package manager assumptions (apt vs dnf/tdnf)
105+
- Missing OS-specific conditional logic
106+
- **External dependency violations**:
107+
- NEW: Downloading from internet URLs not in `parts/common/components.json` or allowed sources (packages.aks.azure.com)
108+
- All external dependencies MUST be referenced in `parts/common/components.json` for Renovate updates
109+
- Only allowed runtime downloads: packages.aks.azure.com or other explicitly allowed sources in CSE
110+
- **Function signature changes**: Parameters, return values, exit codes that break callers
111+
- **Missing test coverage**: Changes to provisioning logic without corresponding e2e tests
112+
113+
**2. Windows Bidirectional Compatibility**
114+
- **Context**: Windows VHD and CSE scripts release on different cadences with no guaranteed order
115+
- **What to check**: Changes to `staging/cse/windows/` (CSE scripts) or `vhdbuilder/packer/windows/` (VHD scripts)
116+
- **Breaking signals**:
117+
- New CSE scripts assuming capabilities that old VHDs don't have
118+
- New VHD scripts expecting features that old CSE versions don't provide
119+
- Changes to shared state (registry keys, files, environment variables) that break coordination
120+
- Removing PowerShell functions or cmdlets that the other component might call
121+
122+
**3. aks-node-controller Migration (Dual-Mode Support)**
123+
- **Context**: Transitioning from uploading scripts during both VHD build and CSE to only uploading aks-node-controller during VHD build
124+
- **What to check**: Any changes must work in BOTH deployment modes
125+
- **Breaking signals**:
126+
- Assumptions that scripts are always uploaded during CSE (new mode won't do this)
127+
- Assumptions that aks-node-controller is always present (old VHDs won't have it)
128+
- Missing feature detection to determine which mode is running
129+
- Hardcoded paths that differ between deployment modes
130+
131+
**4. Cross-OS Compatibility**
132+
- **What to check**: Changes work on Ubuntu, Azure Linux/Mariner, and Windows
133+
- **Breaking signals**:
134+
- Linux commands that don't work on both Ubuntu and Azure Linux/Mariner
135+
- Missing conditional logic for OS-specific behaviors
136+
- Package manager assumptions (apt vs dnf/tdnf)
137+
- Systemd differences between distributions
138+
139+
### Analysis Approach
140+
141+
**Dynamic Dependency Tracing**:
142+
1. For each changed file, identify what depends on it
143+
2. Follow `source` statements in bash scripts to trace dependency chains
144+
3. Check for function calls, variable references across files
145+
4. Look for hardcoded paths in VHD build scripts (`vhdbuilder/packer/`) that reference changed files
146+
5. Trace through as many levels as needed within the codebase
147+
6. **Check external dependencies**:
148+
- Search for new URLs being downloaded (curl, wget, etc.)
149+
- Verify all external dependencies are in `parts/common/components.json` for Renovate updates
150+
- Flag downloads from unauthorized sources (only packages.aks.azure.com and sources in components.json allowed)
151+
152+
**Historical Context**:
153+
- Look for related changes that previously caused issues
154+
- Identify patterns of fragile areas that break frequently
155+
156+
**Test Coverage Assessment**:
157+
- Note if changed code has e2e test coverage
158+
- Flag changes to untested areas as higher risk
159+
- Mention if new behavior lacks corresponding test additions
160+
161+
### Review Output Format
162+
163+
Provide targeted inline comments on specific lines where you detect issues:
164+
165+
**For each breaking change or risk:**
166+
- Comment directly on the problematic line or code block
167+
- Explain why this is risky (e.g., "This removes function X which may be called by VHDs built in the last 6 months")
168+
- Suggest specific mitigations or alternatives
169+
- Include actionable next steps (e.g., "Verify this function is not used by checking references in `vhdbuilder/packer/`")
170+
171+
**Risk indicators to include:**
172+
- Severity: 🔴 High Risk | 🟡 Medium Risk | 🟢 Low Risk
173+
- Category: Script Logic | Cross-OS | External Dependency | Test Coverage | etc.
174+
175+
**Only comment when you have substantive findings** - avoid noise on trivial or obviously safe changes.
176+
177+
### Review Philosophy
178+
179+
Think like an experienced reviewer who "eyeballs" PRs for subtle risks. Look beyond pattern matching:
180+
- Understand the architecture and how components interact
181+
- Consider timing of releases and deployment sequences
182+
- Reason about implicit dependencies and assumptions
183+
- Flag changes that "feel risky" even without obvious red flags
184+
- Balance thoroughness with actionable feedback
185+
- Focus on high-impact issues that could break production VM provisioning

0 commit comments

Comments
 (0)