feat: review per-call tool usage in the integration-tests dashboard by tmeschter · Pull Request #2659 · microsoft/GitHub-Copilot-for-Azure

tmeschter · 2026-06-16T20:49:49Z

Summary

Adds an end-to-end pipeline so the integration-tests dashboard can show exactly which tools were called in each agent run, for reviewability of nightly runs (not automated pass/fail comparison).

Delivered in three compartmentalized phases:

Phase 1 — capture (`de434420`)

tests/utils/agent-runner.ts: computeToolUsage records the ordered tool-call sequence per run (incl. the skill pseudo-tool, success joined by toolCallId, plus per-call durationMs and outputBytes), written to a per-run tool-usage-<token>.json blob named 1:1 with the run''s markdown report.

Phase 2a — storage + API (`08f682e8`)

tests/scripts/upload-tool-usage.ts: uploads one Azure Table row per tool call (name, order, success, duration, output size); full arguments stay in the blob and are fetched on demand.
dashboard/api getToolUsage.ts: GET /api/tool-usage read endpoint with skill/test/branch/runId/runToken/runDate filters.
dashboard/infra: provisions the integrationtoolusage table and wires TOOL_USAGE_TABLE_NAME through the bicep modules.
CI: an "Upload tool usage to table" step in the integration and azure-deploy workflows.

Phase 2b — dashboard UI (`e6cac93f`)

A collapsible "Tools" toggle under each test item (passed and failed) in the details panel. On expand it lazy-loads the API filtered by skill + test + selected date, groups calls by runToken (one block per run, ordered by call order), and lists each call as order · ✓/✗/? · tool name · duration · output size. Clicking args fetches that call''s full arguments on demand from the per-run blob.

Testing

tests: npm run typecheck, npm run lint, unit tests for capture + uploader transforms all green.
dashboard/api: tsc build clean.
dashboard: vite build + tsc --noEmit clean for changed files.
az bicep build clean (pre-existing tag warnings only).

Each integration-test agent run now writes a per-run tool-usage-<token>.json alongside its agent-metadata-<token>.md report (1:1 correlation by filename), so the ordered list of tools called in a specific run can be reconstructed even when the same stimulus runs multiple times in one directory. The capture records each tool call's name, arguments (secret-redacted, full), toolCallId, success, and order, including the 'skill' pseudo-tool. The dashboard blob enumerators exclude tool-usage-*.json from API enumeration for now. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 2a: add the pipeline that makes each integration-test run's tool calls queryable from the dashboard. - tests/scripts/upload-tool-usage.ts: new uploader writing one Azure Table row per tool call (name, order, success, duration, output size); full arguments stay in the per-run blob and are fetched on demand. - dashboard/api getToolUsage.ts: GET /api/tool-usage read endpoint with skill/test/branch/runId/runToken filters. - dashboard/infra: provision the integrationtoolusage table and wire the TOOL_USAGE_TABLE_NAME app setting through main/storage/function-app bicep. - CI: add an 'Upload tool usage to table' step to the integration and azure-deploy workflows. - agent-runner.ts: capture per-call wall-clock durationMs and UTF-8 outputBytes alongside the existing tool sequence. - tests: unit coverage for the uploader transforms and the new capture fields. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 2b: surface each run's tool calls for review in the dashboard. - integration-tests App.tsx: add a collapsible 'Tools' toggle under every test item (passed and failed) in the details panel. On expand it lazy-loads GET /api/tool-usage filtered by skill + test + selected date, groups rows by runToken (one block per run), and lists each call as order, success indicator, tool name, duration, and output size. Clicking 'args' fetches the call's full arguments on demand from the per-run tool-usage blob. - getToolUsage.ts: add a runDate filter and include durationMs/outputBytes in the projected rows. - integration-tests.css: styles for the tools toggle, run blocks, call rows, metrics, and the args panel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds end-to-end capture, storage, and UI rendering for per-call tool usage in integration-test agent runs so nightly runs can be reviewed at the “which tools ran, in what order, with what outcome/metrics” level.

Changes:

Capture ordered tool invocations per agent run (including skill) with success join, duration, and output size into per-run tool-usage-<token>.json.
Upload one Azure Table row per tool call and expose a new GET /api/tool-usage endpoint for querying tool-call history.
Add a dashboard “Tools” section under each test that lazy-loads tool-call rows and fetches per-call arguments on demand.

Show a summary per file

File	Description
tests/utils/agent-runner.ts	Captures tool-call sequences + writes per-run tool-usage JSON alongside the markdown report.
tests/utils/tests/tool-usage.test.ts	Unit tests for tool-usage capture ordering, joins, metrics, and filename derivation.
tests/scripts/upload-tool-usage.ts	Uploads per-run tool calls into Azure Table storage (one row per tool call).
tests/scripts/tests/upload-tool-usage.test.ts	Tests deterministic uploader transforms (row keys, token derivation, row expansion).
tests/package.json	Adds `upload:tool-usage` script entry.
dashboard/sync/src/msbenchBlobEnumerator.ts	Excludes tool-usage blobs from msbench blob enumeration.
dashboard/src/integration-tests/integration-tests.css	Styles for the new per-test “Tools” UI section.
dashboard/src/integration-tests/App.tsx	Adds the “Tools” collapsible UI, grouping by runToken and lazy-loading args.
dashboard/infra/modules/storage.bicep	Provisions the `integrationtoolusage` table and outputs its name.
dashboard/infra/modules/function-app.bicep	Wires `TOOL_USAGE_TABLE_NAME` into Function App settings.
dashboard/infra/main.bicep	Adds tool-usage table param and passes it through modules.
dashboard/api/src/functions/getToolUsage.ts	New anonymous API endpoint to query tool-call rows with optional filters.
dashboard/api/src/functions/getData.ts	Updates blob layout documentation to include tool-usage files.
dashboard/api/src/blobEnumerator.ts	Updates blob exclusion rules (currently also excludes tool-usage blobs).
.github/workflows/test-azure-deploy.yml	Upload step for tool-usage rows after integration tests.
.github/workflows/test-all-integration.yml	Upload step for tool-usage rows after integration tests across skills.

Copilot's findings

Files reviewed: 16/16 changed files
Comments generated: 3

- Include tool-usage-*.json blobs in the dashboard data tree so the on-demand args fetch can locate them (blobEnumerator). - Require at least one filter on GET /api/tool-usage, returning 400 otherwise to avoid unfiltered full-table scans. - Batch Azure Table writes via submitTransaction in chunks of <=100 per partition instead of sequential upserts; add unit tests for the grouping helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

JasonYeMSFT · 2026-06-17T22:13:20Z

Do you have a screenshot to show how this new UI looks like?

tmeschter and others added 3 commits June 15, 2026 13:34

Copilot AI review requested due to automatic review settings June 16, 2026 20:49

Copilot started reviewing on behalf of tmeschter June 16, 2026 20:50 View session

github-advanced-security AI found potential problems Jun 16, 2026

View reviewed changes

Comment thread tests/scripts/upload-tool-usage.ts Dismissed

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread dashboard/api/src/blobEnumerator.ts Outdated

Comment thread dashboard/api/src/functions/getToolUsage.ts

Comment thread tests/scripts/upload-tool-usage.ts

This was referenced Jun 18, 2026

[repo-status] Weekly Repo Status — June 13–18, 2026 #2668

Closed

[repo-status] Weekly Repo Status — June 13–18, 2026 #2670

Closed

JasonYeMSFT approved these changes Jun 19, 2026

View reviewed changes

tmeschter merged commit f0db848 into microsoft:main Jun 22, 2026
11 checks passed

tmeschter deleted the 260615-RecordTools branch June 22, 2026 16:49

github-actions Bot mentioned this pull request Jun 25, 2026

[repo-status] Weekly Repo Status — June 20–25, 2026 #2701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: review per-call tool usage in the integration-tests dashboard#2659

feat: review per-call tool usage in the integration-tests dashboard#2659
tmeschter merged 4 commits into
microsoft:mainfrom
tmeschter:260615-RecordTools

tmeschter commented Jun 16, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JasonYeMSFT commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

tmeschter commented Jun 16, 2026

Summary

Phase 1 — capture (de434420)

Phase 2a — storage + API (08f682e8)

Phase 2b — dashboard UI (e6cac93f)

Testing

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JasonYeMSFT commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Phase 1 — capture (`de434420`)

Phase 2a — storage + API (`08f682e8`)

Phase 2b — dashboard UI (`e6cac93f`)