Refactor foundry-agent skills to include AZD#2435
Conversation
- Removed the direct-code deployment reference file as it is no longer needed. - Added unit tests for the azd-based hosted-agent create workflow to ensure the new patterns are correctly implemented. - Deleted outdated toolbox paths unit tests that referenced removed samples. - Introduced unit tests for the azd-based hosted-agent deploy workflow to validate the new deployment process. - Removed direct-code unit tests that were redundant after the removal of the direct-code deployment reference.
- SKILL.md: make deploy/create sub-skill cells when-focused (routing, not how) - create-hosted.md: add code-first scope note; replace Step 1 azd commands with a single verify-environment script - add cross-platform verify-environment.sh/.ps1 emitting a concise OK/WARN/ACTION summary Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@tmeschter thanks for the review. Summary of what was agreed and done (commit 9a4af78):
Resolving the threads. |
Replace stale Docker/ACR/container-start mechanics with the azd-based hosted-agent flow (azd ai agent init/run, azd provision, azd deploy) and add 'azd ai agent' to the USE FOR triggers so the skill routes on azd queries. Stays within the 1024-char description cap (1010). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- create-hosted.md: trim Step 1 intro; replace 'drop --no-prompt' guidance with azd env set PARAM_* + ask_user (no interactive prompts) in the tip, YOLO mode, and error table - deploy.md: drop --output json from Step 1 show commands; add optional 'azd provision --preview' what-if note before provisioning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove keywords duplicated between the intro and USE FOR (continuous eval, SFT/DPO/RFT) plus the niche 'large file upload'. Total formatted budget now 19986/20000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Regenerated trigger keyword snapshots after the description change (added azd ai agent init/with triggers; removed docker/large/upload/onboard/push/start). Built output/ first since tests load skills from output/skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
|
||
| Core mental model for the `azd ai agent` extension. Use this when you need to understand command surface, file layout, or where a given setting lives. | ||
|
|
||
| ## CLI surface |
There was a problem hiding this comment.
considering commands may change and users may have different CLI versions, how about hint (or as troubleshoot) agent to on demand using --help to understand the exact command?
|
|
||
| > ⚠️ **Not done yet: invocation success is the midpoint, not the finish line.** The next action after a passing smoke test is **Step 8**, not a deployment summary. Do not write a summary, version table, or Playground link yet. | ||
|
|
||
| ### Step 8: Auto-Generate Evaluation Suite (MANDATORY — RUNS AUTOMATICALLY) |
There was a problem hiding this comment.
(acceptable that eval auto generation not a mandatory step) how about moving to certain sub-reference so agent can still know how to eval?
|
|
||
| ### Step 5: Get Agent Definition Schema | ||
| - Send messages -> [invoke](../invoke/invoke.md) | ||
| - Evaluate / optimize -> [observe](../observe/observe.md) |
|
|
||
| > ⚠️ **Not done yet: invocation success is the midpoint, not the finish line.** The next action after a passing smoke test is **Step 8**, not a deployment summary. Do not write a summary, version table, or Playground link yet. | ||
|
|
||
| ### Step 8: Auto-Generate Evaluation Suite (MANDATORY — RUNS AUTOMATICALLY) |
There was a problem hiding this comment.
This should be mandatory and will be done while waiting for agent to be deployed. Let's discuss more.
XOEEst
left a comment
There was a problem hiding this comment.
Some high-level comments:
- How will create-hosted.md work together with deploy.md? What keywords will trigger which?
- We should keep eval suite generation as part of agent deployment step, and prompt for evaluation after deployment.
wbreza
left a comment
There was a problem hiding this comment.
Code review — multi-model pass (claude-sonnet-4.6 + claude-opus-4.7)
Reviewed at head 2c47cf5. Synthesizing findings from three parallel focus-area passes: skill-compliance, content-correctness, and scripts-and-tests. Items tagged 🔁 were flagged by ≥2 reviewers independently.
Prior reviewers' resolved items (tmeschter / therealjohn round-trips on script consolidation,
--no-prompt,--preview, trim-verbosity, when-vs-how) are not re-flagged. Findings below are net-new.
TL;DR
Intent: Refactor foundry-agent/{create,deploy}/ around the azd ai agent extension, replacing manual Docker/ACR + REST-direct-code + framework matrices + hand-resolved toolbox endpoints. Net −318 lines, single consistent CLI lifecycle.
How: Rewrites create-hosted.md (−287/+110) into sample-first + brownfield flows; collapses deploy.md (−419/+111) into verify → provision → deploy → show + invoke; adds azd-ai-cli.md / local-run.md / tools.md references; introduces scripts/verify-environment.{sh,ps1} (replaces 5 inline azd calls with one [OK]/[WARN]/[ACTION]-emitting script); deletes direct-code-deployment.md; swaps two deprecated unit tests for two new azd-ai.unit.test.ts lock-down files; regenerates trigger snapshots across sibling foundry skills (mechanical, from parent SKILL.md description edit).
Mergeable: CONFLICTING with main — rebase needed before merge regardless of review.
🔴 High
1. 🔁 PR description claim doesn't match the diff: agentframework.md and use-toolbox-in-hosted-agent.md were NOT deleted (Description Alignment)
The PR body lists three deletions; only deploy/references/direct-code-deployment.md is in the diff. create/references/agentframework.md and create/references/use-toolbox-in-hosted-agent.md remain on disk and are still linked from sibling references (e.g. agent-tools.md:50 still emits [use-toolbox-in-hosted-agent.md](use-toolbox-in-hosted-agent.md)). Readers navigating the tool catalog can land on pre-azd Docker/toolbox guidance that contradicts the new azd-first story.
Fix: Either delete the two files in this PR and redirect the sibling references to tools.md / azd-ai-cli.md, or update the PR description and file a follow-up issue.
2. 19 pre-existing reference files in create/references/ are now orphaned (Progressive Disclosure)
The new create-hosted.md links only 3 of 22 files in create/references/. Files like agent-tools.md, agentframework.md, foundry-tool-catalog.md, sdk-operations.md, all tool-*.md (12 of them), toolbox-reference.md, and use-toolbox-in-hosted-agent.md are unreachable from the new entry point. This violates "references load ONLY when explicitly linked" and matches the "skills that are just tool description catalogs" anti-pattern.
Fix: Delete the unlinked files OR add explicit recipe links from the new Add tools section. Minimum safe drops in this PR: agentframework.md, use-toolbox-in-hosted-agent.md, toolbox-reference.md (all superseded by tools.md).
3. verify-environment.{sh,ps1} exit 1 when [ACTION] is present — risks the LM never receiving the structured output (Script Exit Codes)
Both scripts print the Summary line and then call exit 1. If the tool runner treats non-zero exits as a command failure and routes to an error path rather than surfacing stdout, the LM gets nothing — exactly defeating the rationale that drove tmeschter to request the script in the first place. The [OK]/[WARN]/[ACTION] prefixes plus the Summary: action required … text already encode every signal the LM needs.
Fix: Always exit 0. Keep the prefixes and summary text as the only blocking signal.
🟡 Medium
4. 🔁 Verify scripts never validate the >= 1.25.0 minimum the PR description claims (Script Version Parsing)
PR body says "azd version parsing for >=1.25.0," but both scripts only parse and echo [OK] azd installed (version X.Y.Z) — no comparison against any floor. A user on a pre-1.25.0 azd passes Step 1 and only fails downstream with confusing "extension not installed" / "command not found" errors.
Fix: In .sh: [ "$(printf '%s\n%s\n' 1.25.0 "$AZD_VERSION" | sort -V | head -1)" != "1.25.0" ] → emit [ACTION]. In .ps1: cast to [System.Version] and compare; treat unknown as [WARN].
5. tools.md (~2180 tok) and azd-ai-cli.md (~1913 tok) far exceed the references/**/*.md 1000-token soft limit (Token Budget)
.token-limits.json sets references to 1000 tokens. The PR description reports these as only "5-10% over the 2000-token soft limit," which suggests npm run tokens check is matching against the loose *.md: 2000 rule rather than the specific references/**/*.md: 1000 rule — likely a glob-resolution gap in the checker.
Fix: Split tools.md into tools/README.md + tools/recipes.md; move CLI schemas out of azd-ai-cli.md into a sibling. Separately, verify the token-checker glob actually enforces the 1000-token reference cap.
6. create-hosted.md (~2202 tok) exceeds the *.md 2000-token soft limit; undisclosed in the PR description (Token Budget)
PR discloses overages for tools.md and deploy.md but omits create-hosted.md. Not critical, but underrepresents the actual token impact.
Fix: Trim Step 4a/4b parameter descriptions and the YOLO section, or disclose.
7. Extension presence check uses raw substring match — not JSON-field-aware (Script Cross-Platform)
Both scripts grep "azure.ai.agents" anywhere in the raw azd extension list --output json blob. If any future extension's description/url/dependencies mention that string, the check returns a false [OK].
Fix: Parse JSON and check the name field explicitly. .ps1: ConvertFrom-Json then Where-Object { $_.name -eq $ext }. .sh: pipe through python3 -c "import json,sys; sys.exit(0 if '$ext' in {e['name'] for e in json.load(sys.stdin)} else 1)".
8. No test asserts the verify-environment.{sh,ps1} scripts exist or that create-hosted.md points at the correct paths (Test Coverage)
The scripts are the central new mechanism of this PR — yet create/azd-ai.unit.test.ts has no test reading foundry-agent/create/scripts/ or asserting ./scripts/verify-environment.sh / .ps1 appear in Step 1. A future rename or path edit regresses silently.
Fix: Add an existence test (the deploy test already imports access/fileExists-style helpers — port the pattern):
test("verify-environment scripts exist and are referenced in Step 1", async () => {
expect(await fileExists("foundry-agent/create/scripts/verify-environment.sh")).toBe(true);
expect(await fileExists("foundry-agent/create/scripts/verify-environment.ps1")).toBe(true);
const skill = await readSkillFile("foundry-agent/create/create-hosted.md");
expect(skill).toContain("./scripts/verify-environment.sh");
expect(skill).toContain("./scripts/verify-environment.ps1");
});9. Deleted direct-code.unit.test.ts covered an invariant in references/agent-metadata-contract.md that has no replacement (Test Coverage Gap)
The deleted test asserted agent-metadata-contract.md still scopes ACR to "Docker/ACR deploy flow" (preventing a regression that promotes ACR as a general mechanism). Neither new test touches that file.
Fix: Port the three-line invariant into deploy/azd-ai.unit.test.ts:
test("agent metadata contract still scopes ACR to Docker/ACR deploy flow only", async () => {
const contract = await readSkillFile("references/agent-metadata-contract.md");
expect(contract).toContain("azureContainerRegistry");
expect(contract).toContain("Docker/ACR deploy flow");
expect(contract).not.toContain("✅ for hosted agents | ACR used for deployment and image refresh");
});⚪ Low
10. .sh silent JSON-parse fallback when python3 is missing — produces false [OK] lines (Script Tool Dependencies)
All three JSON parses in .sh pipe through python3. If python3 is absent, version → "unknown", project endpoint → empty (misleading [WARN]), agent status → empty (false [OK] No agent deployed yet. Proceed with create.) — even when an agent IS deployed. .ps1 uses native ConvertFrom-Json so doesn't share this failure mode; the scripts are not equivalent on minimal Linux/macOS shells.
Fix: Use jq (more commonly preinstalled), or pre-flight command -v python3 and emit [ACTION] if missing.
11. azd ai agent endpoint update --dry-run / --force documented in deploy.md but missing from azd-ai-cli.md (Cross-File Consistency)
The CLI surface line shows only azd ai agent endpoint update # patch agentEndpoint / agentCard in place with no flags. A reader cross-referencing the two files won't see --dry-run / --force documented as a recognized command shape.
Fix: Append [--dry-run | --force] to the CLI surface line and note "idempotent; --force required for write."
12. 🔁 Test expect(cli).toContain("_VERSION") is too loose (Test Specificity)
Matches any *_VERSION token (NODE_VERSION, PYTHON_VERSION, TOOLBOX_..._VERSION, etc.) — deletion of the canonical AGENT_<SVC>_VERSION row wouldn't necessarily fail the test.
Fix: Use the literal form already used by the deploy test: expect(cli).toContain("AGENT_<SVC>_VERSION");.
13. Parent microsoft-foundry/SKILL.md description silently drops non-Docker routing keywords (Description Alignment)
Beyond removing Docker build, ACR push (expected), the rewrite also dropped continuous eval, availability, onboard, and large file upload. The latter two routed initial-setup and fine-tuning training-data scenarios. Sibling skill snapshots (capacity/deploy-model/observe/trace) confirm propagation.
Fix: Restore the four terms to the USE-FOR list (they're not azd-specific), or call them out explicitly in the PR description so reviewers can confirm intent.
14. verify-environment.sh uses set -uo pipefail (missing -e); inconsistent with repo convention (Script Style)
Every other repo script using set uses set -euo pipefail (e.g., eng/check-quota.sh). All known failure paths have || fallback guards so the practical risk is low, but a future unguarded command would fail silently.
Fix: Change to set -euo pipefail.
15. python3 dependency in .sh is undocumented (Script Tool Dependencies)
Not declared in the script header; no pre-flight check.
Fix: Add # Requires: python3 3.8+ to the header, or a one-line command -v python3 || echo "[WARN] python3 not found — JSON parsing will fall back to 'unknown'".
16. Deleted toolbox-paths.unit.test.ts covered foundry-sample path stability; no equivalent in the new tools.md test (Test Coverage Gap)
The original test locked down samples/python/hosted-agents/... and learn.microsoft.com/azure/foundry/agents/... paths to catch stale-path regressions. The new tools.md test checks command syntax and env var patterns, but no paths.
Fix: If tools.md contains GitHub sample links, add one assertion per canonical path. If it deliberately contains none, leave a comment in the test so a future author knows to add path tests when links are added.
Open questions from prior reviewers (informational — not new findings)
- @swatDong asked about hinting
azd ai agent evalas a handoff indeploy.md/observesub-skill (already verified working). The current "demote to one-liner observe handoff" is defensible, but if eval truly runs cleanly underazd ai agent eval, a single-line hint in the deploy handoff would be a low-cost win. - @XOEEst asked (a) how
create-hosted.mdvsdeploy.mdare triggered differently and (b) whether eval-suite generation should remain a mandatory deploy step. Neither has a resolved answer in the existing threads.
Posted as a comment (not a change request) — author may resolve, defer, or push back per their judgment. The High items are worth resolving in this PR; Medium/Low are addressable at author's discretion.

Rewrites plugin/skills/microsoft-foundry/foundry-agent/create/ and deploy/ around the Azure Developer CLI (azd ai agent extension), replacing the previous mix of manual Docker/ACR build, REST-based direct-code upload, framework-protocol selection matrices, and hand-resolved toolbox endpoints.
Sourced from the topic content in Azure/azure-dev#8375 (Azure/azure-dev#8375) (the new azure.ai.docs extension that ships azd ai doc agent for samples / initialize / develop / configure / extend / deploy / evaluate / operate / investigate).
Net change: -466 lines, more focused, more human-readable, one consistent CLI for the whole hosted-agent lifecycle.
Why
The old hosted-agent skill content asked the user (and the agent) to:
azd ai agent collapses all of that into:
azd ai agent sample list # pick a starter
azd ai agent init -m # scaffold (or --from-code for brownfield)
azd ai agent run # local inner-loop
azd provision && azd deploy # ship a new immutable agent version
azd ai agent show / invoke # verify
So the skill should teach that — not the workarounds that existed before it.
What changed
Rewritten
Added (foundry-agent/create/references/)
Deleted
Tests
Parent SKILL.md
Validation
cd scripts
npm run references # passes (no broken/escaped links)
npm run frontmatter # pre-existing placeholder-version errors only
cd ..
npm run tokens check # new files: tools.md and deploy.md are ~5-10% over the 2000-token soft limit; smaller than every file they replaced. No file is anywhere near a hard limit.
npm run build # NBGV stamping works; CHANGELOG generated.
cd tests
$env:SKIP_INTEGRATION_TESTS = 'true'
npm test -- --testPathPatterns=foundry-agent # new azd-ai.unit.test.ts files pass; trigger tests still pass
Notes for reviewers