Refactor foundry-agent skills to include AZD by therealjohn · Pull Request #2435 · microsoft/GitHub-Copilot-for-Azure

therealjohn · 2026-05-29T15:51:58Z

Rewrites plugin/skills/microsoft-foundry/foundry-agent/create/ and deploy/ around the Azure Developer CLI (azd ai agent extension), replacing the previous mix of manual Docker/ACR build, REST-based direct-code upload, framework-protocol selection matrices, and hand-resolved toolbox endpoints.

Sourced from the topic content in Azure/azure-dev#8375 (Azure/azure-dev#8375) (the new azure.ai.docs extension that ships azd ai doc agent for samples / initialize / develop / configure / extend / deploy / evaluate / operate / investigate).

Net change: -466 lines, more focused, more human-readable, one consistent CLI for the whole hosted-agent lifecycle.

Why

The old hosted-agent skill content asked the user (and the agent) to:

Pick a language + protocol + framework + sample combination by hand.
Manually choose adapter packages (agent-framework-foundry-hosting, azure-ai-agentserver-responses, etc.) and wrap their agent.
Build a Dockerfile, run az acr build with the right --source-acr-auth-id, push, then call MCP agent_update with a hand-composed payload.
Optionally use a parallel "direct code" REST workflow with its own packaging, headers (Foundry-Features: CodeAgents=V1Preview,HostedAgents=V1Preview), and metadata.json invariants.
Manage a "Definition of Done" checklist with a mandatory eval-suite generation step bolted onto every deploy.

azd ai agent collapses all of that into:

azd ai agent sample list # pick a starter
azd ai agent init -m # scaffold (or --from-code for brownfield)
azd ai agent run # local inner-loop
azd provision && azd deploy # ship a new immutable agent version
azd ai agent show / invoke # verify

So the skill should teach that — not the workarounds that existed before it.

What changed

Rewritten

foundry-agent/create/create-hosted.md — sample-first greenfield (azd ai agent sample list -> azd ai agent init -m) and brownfield (azd ai agent init --from-code --deploy-mode code --runtime ... --entry-point ...). Includes local inner-loop (azd ai agent run + azd ai agent invoke --local). Explicit "Hosted vs Prompt" decision table at top, and handoff to deploy/invoke/observe/troubleshoot at the end.
foundry-agent/deploy/deploy.md — linear hosted flow: verify state -> azd provision -> azd deploy -> azd ai agent show + smoke invoke. Kept the prompt-agent (agent_definition_schema_get + agent_update) workflow as a compressed second section. Dropped the multi-page eval-suite "Definition of Done" gates; eval handoff is now a one-liner pointer to the observe skill.

Added (foundry-agent/create/references/)

azd-ai-cli.md — CLI surface map, two-file model (agent.yaml + azure.yaml services..config), env-var state, confirmation envelope, sub/location resolution cascade.
local-run.md — azd ai agent run flag reference, --start-command resolution order, --local invoke contract, common local failures.
tools.md — toolbox lifecycle (azd extension install azure.ai.toolboxes, azd ai toolbox create / connection add / show), connection creation (azd ai agent connection create), env-var naming convention (TOOLBOX__MCP_ENDPOINT), endpoint URL shapes, recipes for GitHub MCP / Azure AI Search / A2A / multi-tool toolbox, Python Agent Framework wiring with httpx event-hook for token refresh, MCP client gotchas.

Deleted

create/references/use-toolbox-in-hosted-agent.md — replaced by tools.md (which covers both lifecycle and consume sides).
create/references/agentframework.md — obsolete; samples already wire the adapter correctly under azd ai agent init -m.
deploy/references/direct-code-deployment.md — azd deploy with --deploy-mode code natively does ZIP + remote-build, replacing the manual REST workflow.

Tests

Deleted tests/microsoft-foundry/foundry-agent/create/toolbox-paths.unit.test.ts and tests/microsoft-foundry/foundry-agent/direct-code.unit.test.ts — both locked down deprecated content (removed reference files, the old Docker/ACR-first and direct-code-REST patterns, the "Step 7 / Step 8 / Definition of Done" gates).
Added tests/microsoft-foundry/foundry-agent/create/azd-ai.unit.test.ts and tests/microsoft-foundry/foundry-agent/deploy/azd-ai.unit.test.ts — assert the new azd-based patterns (azd ai agent init, --from-code, azd provision + azd deploy, azd ai agent show, TOOLBOX__MCP_ENDPOINT, Foundry-Features: Toolboxes=V1Preview) and absence of the old ones (az acr build, direct-code-deployment.md, Definition of Done, framework-adapter matrices).

Parent SKILL.md

Updated the create and deploy rows in the sub-skills table so the routing description matches the new content (mentions azd ai agent init / sample-based scaffolding for create, and azd provision + azd deploy for deploy).

Validation

cd scripts
npm run references # passes (no broken/escaped links)
npm run frontmatter # pre-existing placeholder-version errors only

cd ..
npm run tokens check # new files: tools.md and deploy.md are ~5-10% over the 2000-token soft limit; smaller than every file they replaced. No file is anywhere near a hard limit.

npm run build # NBGV stamping works; CHANGELOG generated.

cd tests
$env:SKIP_INTEGRATION_TESTS = 'true'
npm test -- --testPathPatterns=foundry-agent # new azd-ai.unit.test.ts files pass; trigger tests still pass

Notes for reviewers

No az commands appear on the happy path. az account list and az account get-access-token are only mentioned as last-resort fallbacks (matching the azd-first guidance in PR feat(azure.ai.docs): new extension -- unified agent-friendly docs front door Azure/azure-dev#8375 (feat(azure.ai.docs): new extension -- unified agent-friendly docs front door Azure/azure-dev#8375)).
The prompt-agent workflow stays MCP-based (agent_update) — that's still the correct contract for prompt agents; only hosted agents flipped to azd.
Invocations-WS, observe, trace, troubleshoot, invoke, agent-optimizer, eval-datasets, and create-prompt skills are unchanged; create-hosted and deploy hand off to them explicitly.

- Removed the direct-code deployment reference file as it is no longer needed. - Added unit tests for the azd-based hosted-agent create workflow to ensure the new patterns are correctly implemented. - Deleted outdated toolbox paths unit tests that referenced removed samples. - Introduced unit tests for the azd-based hosted-agent deploy workflow to validate the new deployment process. - Removed direct-code unit tests that were redundant after the removal of the direct-code deployment reference.

- SKILL.md: make deploy/create sub-skill cells when-focused (routing, not how) - create-hosted.md: add code-first scope note; replace Step 1 azd commands with a single verify-environment script - add cross-platform verify-environment.sh/.ps1 emitting a concise OK/WARN/ACTION summary Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

therealjohn · 2026-05-29T17:29:24Z

@tmeschter thanks for the review. Summary of what was agreed and done (commit 9a4af78):

SKILL.md deploy and create rows: reverted to when-focused routing descriptions; removed the command-level how, which now lives in the sub-skill docs.
create-hosted.md: added a Scope note -- azd ai is the preferred code-first path (agent code on disk, in a repo, with IaC and a local inner-loop). If the intent is only to create a remote agent resource (no code), other approaches may apply (prompt agents, Foundry MCP tools, portal).
Step 1 env checks: moved the five azd commands into scripts/verify-environment.sh and .ps1. The agent now runs one script that emits a concise [OK]/[WARN]/[ACTION] summary -- fewer turns, fewer tokens, more reliable.

Resolving the threads.

Replace stale Docker/ACR/container-start mechanics with the azd-based hosted-agent flow (azd ai agent init/run, azd provision, azd deploy) and add 'azd ai agent' to the USE FOR triggers so the skill routes on azd queries. Stays within the 1024-char description cap (1010). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- create-hosted.md: trim Step 1 intro; replace 'drop --no-prompt' guidance with azd env set PARAM_* + ask_user (no interactive prompts) in the tip, YOLO mode, and error table - deploy.md: drop --output json from Step 1 show commands; add optional 'azd provision --preview' what-if note before provisioning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove keywords duplicated between the intro and USE FOR (continuous eval, SFT/DPO/RFT) plus the niche 'large file upload'. Total formatted budget now 19986/20000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Regenerated trigger keyword snapshots after the description change (added azd ai agent init/with triggers; removed docker/large/upload/onboard/push/start). Built output/ first since tests load skills from output/skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

swatDong · 2026-06-02T02:38:41Z

+
+Core mental model for the `azd ai agent` extension. Use this when you need to understand command surface, file layout, or where a given setting lives.
+
+## CLI surface


considering commands may change and users may have different CLI versions, how about hint (or as troubleshoot) agent to on demand using --help to understand the exact command?

swatDong · 2026-06-02T02:59:06Z

-
-> ⚠️ **Not done yet: invocation success is the midpoint, not the finish line.** The next action after a passing smoke test is **Step 8**, not a deployment summary. Do not write a summary, version table, or Playground link yet.
-
-### Step 8: Auto-Generate Evaluation Suite (MANDATORY — RUNS AUTOMATICALLY)


(acceptable that eval auto generation not a mandatory step) how about moving to certain sub-reference so agent can still know how to eval?

swatDong · 2026-06-02T06:40:27Z


-### Step 5: Get Agent Definition Schema
+- Send messages -> [invoke](../invoke/invoke.md)
+- Evaluate / optimize -> [observe](../observe/observe.md)


How about mention/hint azd ai agent eval here as hand off, or rewrite the observe sub-skill to use azd ai agent eval?

Already see azd ai agent eval in the CLI ref and tested it works on agent built from this skill.
(tested with copilot)

XOEEst · 2026-06-03T16:29:35Z

-
-> ⚠️ **Not done yet: invocation success is the midpoint, not the finish line.** The next action after a passing smoke test is **Step 8**, not a deployment summary. Do not write a summary, version table, or Playground link yet.
-
-### Step 8: Auto-Generate Evaluation Suite (MANDATORY — RUNS AUTOMATICALLY)


This should be mandatory and will be done while waiting for agent to be deployed. Let's discuss more.

XOEEst

Some high-level comments:

How will create-hosted.md work together with deploy.md? What keywords will trigger which?
We should keep eval suite generation as part of agent deployment step, and prompt for evaluation after deployment.

wbreza

Code review — multi-model pass (claude-sonnet-4.6 + claude-opus-4.7)

Reviewed at head 2c47cf5. Synthesizing findings from three parallel focus-area passes: skill-compliance, content-correctness, and scripts-and-tests. Items tagged 🔁 were flagged by ≥2 reviewers independently.

Prior reviewers' resolved items (tmeschter / therealjohn round-trips on script consolidation, --no-prompt, --preview, trim-verbosity, when-vs-how) are not re-flagged. Findings below are net-new.

TL;DR

Intent: Refactor foundry-agent/{create,deploy}/ around the azd ai agent extension, replacing manual Docker/ACR + REST-direct-code + framework matrices + hand-resolved toolbox endpoints. Net −318 lines, single consistent CLI lifecycle.

How: Rewrites create-hosted.md (−287/+110) into sample-first + brownfield flows; collapses deploy.md (−419/+111) into verify → provision → deploy → show + invoke; adds azd-ai-cli.md / local-run.md / tools.md references; introduces scripts/verify-environment.{sh,ps1} (replaces 5 inline azd calls with one [OK]/[WARN]/[ACTION]-emitting script); deletes direct-code-deployment.md; swaps two deprecated unit tests for two new azd-ai.unit.test.ts lock-down files; regenerates trigger snapshots across sibling foundry skills (mechanical, from parent SKILL.md description edit).

Mergeable: CONFLICTING with main — rebase needed before merge regardless of review.

🔴 High

1. 🔁 PR description claim doesn't match the diff: agentframework.md and use-toolbox-in-hosted-agent.md were NOT deleted (Description Alignment)
The PR body lists three deletions; only deploy/references/direct-code-deployment.md is in the diff. create/references/agentframework.md and create/references/use-toolbox-in-hosted-agent.md remain on disk and are still linked from sibling references (e.g. agent-tools.md:50 still emits [use-toolbox-in-hosted-agent.md](use-toolbox-in-hosted-agent.md)). Readers navigating the tool catalog can land on pre-azd Docker/toolbox guidance that contradicts the new azd-first story.
Fix: Either delete the two files in this PR and redirect the sibling references to tools.md / azd-ai-cli.md, or update the PR description and file a follow-up issue.

2. 19 pre-existing reference files in create/references/ are now orphaned (Progressive Disclosure)
The new create-hosted.md links only 3 of 22 files in create/references/. Files like agent-tools.md, agentframework.md, foundry-tool-catalog.md, sdk-operations.md, all tool-*.md (12 of them), toolbox-reference.md, and use-toolbox-in-hosted-agent.md are unreachable from the new entry point. This violates "references load ONLY when explicitly linked" and matches the "skills that are just tool description catalogs" anti-pattern.
Fix: Delete the unlinked files OR add explicit recipe links from the new Add tools section. Minimum safe drops in this PR: agentframework.md, use-toolbox-in-hosted-agent.md, toolbox-reference.md (all superseded by tools.md).

3. verify-environment.{sh,ps1} exit 1 when [ACTION] is present — risks the LM never receiving the structured output (Script Exit Codes)
Both scripts print the Summary line and then call exit 1. If the tool runner treats non-zero exits as a command failure and routes to an error path rather than surfacing stdout, the LM gets nothing — exactly defeating the rationale that drove tmeschter to request the script in the first place. The [OK]/[WARN]/[ACTION] prefixes plus the Summary: action required … text already encode every signal the LM needs.
Fix: Always exit 0. Keep the prefixes and summary text as the only blocking signal.

🟡 Medium

4. 🔁 Verify scripts never validate the >= 1.25.0 minimum the PR description claims (Script Version Parsing)
PR body says "azd version parsing for >=1.25.0," but both scripts only parse and echo [OK] azd installed (version X.Y.Z) — no comparison against any floor. A user on a pre-1.25.0 azd passes Step 1 and only fails downstream with confusing "extension not installed" / "command not found" errors.
Fix: In .sh: [ "$(printf '%s\n%s\n' 1.25.0 "$AZD_VERSION" | sort -V | head -1)" != "1.25.0" ] → emit [ACTION]. In .ps1: cast to [System.Version] and compare; treat unknown as [WARN].

5. tools.md (~2180 tok) and azd-ai-cli.md (~1913 tok) far exceed the references/**/*.md 1000-token soft limit (Token Budget)
.token-limits.json sets references to 1000 tokens. The PR description reports these as only "5-10% over the 2000-token soft limit," which suggests npm run tokens check is matching against the loose *.md: 2000 rule rather than the specific references/**/*.md: 1000 rule — likely a glob-resolution gap in the checker.
Fix: Split tools.md into tools/README.md + tools/recipes.md; move CLI schemas out of azd-ai-cli.md into a sibling. Separately, verify the token-checker glob actually enforces the 1000-token reference cap.

6. create-hosted.md (~2202 tok) exceeds the *.md 2000-token soft limit; undisclosed in the PR description (Token Budget)
PR discloses overages for tools.md and deploy.md but omits create-hosted.md. Not critical, but underrepresents the actual token impact.
Fix: Trim Step 4a/4b parameter descriptions and the YOLO section, or disclose.

7. Extension presence check uses raw substring match — not JSON-field-aware (Script Cross-Platform)
Both scripts grep "azure.ai.agents" anywhere in the raw azd extension list --output json blob. If any future extension's description/url/dependencies mention that string, the check returns a false [OK].
Fix: Parse JSON and check the name field explicitly. .ps1: ConvertFrom-Json then Where-Object { $_.name -eq $ext }. .sh: pipe through python3 -c "import json,sys; sys.exit(0 if '$ext' in {e['name'] for e in json.load(sys.stdin)} else 1)".

8. No test asserts the verify-environment.{sh,ps1} scripts exist or that create-hosted.md points at the correct paths (Test Coverage)
The scripts are the central new mechanism of this PR — yet create/azd-ai.unit.test.ts has no test reading foundry-agent/create/scripts/ or asserting ./scripts/verify-environment.sh / .ps1 appear in Step 1. A future rename or path edit regresses silently.
Fix: Add an existence test (the deploy test already imports access/fileExists-style helpers — port the pattern):

test("verify-environment scripts exist and are referenced in Step 1", async () => {
  expect(await fileExists("foundry-agent/create/scripts/verify-environment.sh")).toBe(true);
  expect(await fileExists("foundry-agent/create/scripts/verify-environment.ps1")).toBe(true);
  const skill = await readSkillFile("foundry-agent/create/create-hosted.md");
  expect(skill).toContain("./scripts/verify-environment.sh");
  expect(skill).toContain("./scripts/verify-environment.ps1");
});

9. Deleted direct-code.unit.test.ts covered an invariant in references/agent-metadata-contract.md that has no replacement (Test Coverage Gap)
The deleted test asserted agent-metadata-contract.md still scopes ACR to "Docker/ACR deploy flow" (preventing a regression that promotes ACR as a general mechanism). Neither new test touches that file.
Fix: Port the three-line invariant into deploy/azd-ai.unit.test.ts:

test("agent metadata contract still scopes ACR to Docker/ACR deploy flow only", async () => {
  const contract = await readSkillFile("references/agent-metadata-contract.md");
  expect(contract).toContain("azureContainerRegistry");
  expect(contract).toContain("Docker/ACR deploy flow");
  expect(contract).not.toContain("✅ for hosted agents | ACR used for deployment and image refresh");
});

⚪ Low

10. .sh silent JSON-parse fallback when python3 is missing — produces false [OK] lines (Script Tool Dependencies)
All three JSON parses in .sh pipe through python3. If python3 is absent, version → "unknown", project endpoint → empty (misleading [WARN]), agent status → empty (false [OK] No agent deployed yet. Proceed with create.) — even when an agent IS deployed. .ps1 uses native ConvertFrom-Json so doesn't share this failure mode; the scripts are not equivalent on minimal Linux/macOS shells.
Fix: Use jq (more commonly preinstalled), or pre-flight command -v python3 and emit [ACTION] if missing.

11. azd ai agent endpoint update --dry-run / --force documented in deploy.md but missing from azd-ai-cli.md (Cross-File Consistency)
The CLI surface line shows only azd ai agent endpoint update # patch agentEndpoint / agentCard in place with no flags. A reader cross-referencing the two files won't see --dry-run / --force documented as a recognized command shape.
Fix: Append [--dry-run | --force] to the CLI surface line and note "idempotent; --force required for write."

12. 🔁 Test expect(cli).toContain("_VERSION") is too loose (Test Specificity)
Matches any *_VERSION token (NODE_VERSION, PYTHON_VERSION, TOOLBOX_..._VERSION, etc.) — deletion of the canonical AGENT_<SVC>_VERSION row wouldn't necessarily fail the test.
Fix: Use the literal form already used by the deploy test: expect(cli).toContain("AGENT_<SVC>_VERSION");.

13. Parent microsoft-foundry/SKILL.md description silently drops non-Docker routing keywords (Description Alignment)
Beyond removing Docker build, ACR push (expected), the rewrite also dropped continuous eval, availability, onboard, and large file upload. The latter two routed initial-setup and fine-tuning training-data scenarios. Sibling skill snapshots (capacity/deploy-model/observe/trace) confirm propagation.
Fix: Restore the four terms to the USE-FOR list (they're not azd-specific), or call them out explicitly in the PR description so reviewers can confirm intent.

14. verify-environment.sh uses set -uo pipefail (missing -e); inconsistent with repo convention (Script Style)
Every other repo script using set uses set -euo pipefail (e.g., eng/check-quota.sh). All known failure paths have || fallback guards so the practical risk is low, but a future unguarded command would fail silently.
Fix: Change to set -euo pipefail.

15. python3 dependency in .sh is undocumented (Script Tool Dependencies)
Not declared in the script header; no pre-flight check.
Fix: Add # Requires: python3 3.8+ to the header, or a one-line command -v python3 || echo "[WARN] python3 not found — JSON parsing will fall back to 'unknown'".

16. Deleted toolbox-paths.unit.test.ts covered foundry-sample path stability; no equivalent in the new tools.md test (Test Coverage Gap)
The original test locked down samples/python/hosted-agents/... and learn.microsoft.com/azure/foundry/agents/... paths to catch stale-path regressions. The new tools.md test checks command syntax and env var patterns, but no paths.
Fix: If tools.md contains GitHub sample links, add one assertion per canonical path. If it deliberately contains none, leave a comment in the test so a future author knows to add path tests when links are added.

Open questions from prior reviewers (informational — not new findings)

@swatDong asked about hinting azd ai agent eval as a handoff in deploy.md / observe sub-skill (already verified working). The current "demote to one-liner observe handoff" is defensible, but if eval truly runs cleanly under azd ai agent eval, a single-line hint in the deploy handoff would be a low-cost win.
@XOEEst asked (a) how create-hosted.md vs deploy.md are triggered differently and (b) whether eval-suite generation should remain a mandatory deploy step. Neither has a resolved answer in the existing threads.

Posted as a comment (not a change request) — author may resolve, defer, or push back per their judgment. The High items are worth resolving in this PR; Medium/Low are addressable at author's discretion.

therealjohn marked this pull request as ready for review May 29, 2026 15:52

Copilot AI review requested due to automatic review settings May 29, 2026 15:52

therealjohn requested review from RickWinter, XOEEst, XiaofuHuang, anchenyi, ankitbko, jugonzales, tendau and vebudumu as code owners May 29, 2026 15:52

therealjohn changed the title ~~Refactor foundry-agent documentation and tests~~ Refactor foundry-agent skills to include AZD May 29, 2026

therealjohn force-pushed the feat/azd-ai branch from 5c89e27 to 82cc8be Compare May 29, 2026 16:01