You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Workflow hardening:
- Drop pull_request trigger (keep workflow_dispatch only) to eliminate
token exfiltration vector from untrusted PR code
- Add top-level permissions block (contents/packages: read) for
defense-in-depth
Package hygiene:
- Remove @microsoft/vally-cli from devDependencies (CI installs it
explicitly via GitHub Packages); lockfile regenerated in sync
- Remove unused root yaml dependency
Eval spec cleanup:
- Remove 13 broad output-not-contains "error"/"failed" graders from
azure-hosted-copilot-sdk/eval.yaml (kept specific fatal-error regex)
- Add azure-prepare, azure-validate, azure-deploy to environment.skills
- Remove cost:free tag from all LLM-backed stimuli across 4 eval files
(reserved now for non-LLM static evals)
- Align .vally.yaml suite descriptions with accurate tag semantics
Cleanup:
- Delete stale Waza task files in azure-hosted-copilot-sdk/tasks/
- Add evals/README.md with local vally-cli run instructions
- Gitignore local results/ output directory
Follow-up issue #1920 tracks wiring CI to a curated medium suite.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skill evaluation suites run by [Vally](https://github.com/microsoft/ai-bench) (`@microsoft/vally-cli`). Each subdirectory corresponds to a skill and contains an `eval.yaml` defining stimuli, graders, and configuration.
4
+
5
+
## Prerequisites
6
+
7
+
`@microsoft/vally-cli` is published to GitHub Packages. You need a GitHub **Personal Access Token** with the `read:packages` scope.
8
+
9
+
1. Create a PAT: <https://github.com/settings/tokens> (classic) → enable `read:packages`.
10
+
2. Configure npm to use GitHub Packages for the `@microsoft` scope. Create or update `~/.npmrc`:
0 commit comments