Skip to content

Commit 0cffd4a

Browse files
Rename eval suites to align with agentcontrol skill renames (#68)
1 parent add614f commit 0cffd4a

10 files changed

Lines changed: 46 additions & 46 deletions

File tree

evals/README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,17 +28,17 @@ Run these after any changes to the provider, mock, or shared utilities to catch
2828
# From evals/
2929

3030
# Run a single suite (all test cases)
31-
npm run eval:aiconfig-create # ai-configs/aiconfig-create
32-
npm run eval:aiconfig-update # ai-configs/aiconfig-update
33-
npm run eval:aiconfig-tools # ai-configs/aiconfig-tools
34-
npm run eval:aiconfig-variations # ai-configs/aiconfig-variations
31+
npm run eval:configs-create # agentcontrol/configs-create
32+
npm run eval:configs-update # agentcontrol/configs-update
33+
npm run eval:agentcontrol-tools # agentcontrol/tools
34+
npm run eval:configs-variations # agentcontrol/configs-variations
3535
npm run eval:flag-create # feature-flags/launchdarkly-flag-create
3636

3737
# Quick smoke check — first test case only (~15-20s, ~$0.05)
38-
npm run eval:aiconfig-create:single
39-
npm run eval:aiconfig-update:single
40-
npm run eval:aiconfig-tools:single
41-
npm run eval:aiconfig-variations:single
38+
npm run eval:configs-create:single
39+
npm run eval:configs-update:single
40+
npm run eval:agentcontrol-tools:single
41+
npm run eval:configs-variations:single
4242
npm run eval:flag-create:single
4343

4444
# Aggregate and CI operations
@@ -147,7 +147,7 @@ This handles agents that call `get-foo` before AND after mutation; using `indexO
147147

148148
### Cross-model evaluation (`run-models.js`)
149149

150-
The cross-model runner evaluates all suites against one or more model aliases without touching the canonical `eval-scores.json`. Results are written to `<suite>/results.<alias>.json` (e.g., `aiconfig-create/results.haiku.json`).
150+
The cross-model runner evaluates all suites against one or more model aliases without touching the canonical `eval-scores.json`. Results are written to `<suite>/results.<alias>.json` (e.g., `configs-create/results.haiku.json`).
151151

152152
```bash
153153
npm run eval:haiku # claude-haiku-4-5-20251001
@@ -222,7 +222,7 @@ Read the SKILL.md and note every MCP tool it references. Verify each tool exists
222222
mkdir <skill-name>
223223
```
224224

225-
Use the same name as the skill directory (e.g., `aiconfig-create`). Create `promptfooconfig.yaml`:
225+
Use the same name as the skill directory (e.g., `configs-create`). Create `promptfooconfig.yaml`:
226226

227227
```yaml
228228
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
@@ -264,7 +264,7 @@ Add an entry to `scripts/_manifest.js`:
264264
```js
265265
{
266266
suite: "<skill-name>",
267-
skillKey: "<domain>/<skill-name>", // e.g. "ai-configs/aiconfig-create"
267+
skillKey: "<domain>/<skill-name>", // e.g. "agentcontrol/configs-create"
268268
skillDir: "skills/<domain>/<skill-name>",
269269
readme: "skills/<domain>/<skill-name>/README.md",
270270
},
@@ -364,7 +364,7 @@ Running `npm run eval:all` writes a summary at the repo root:
364364
"updatedAt": "2026-05-19T00:00:00Z",
365365
"lastCommit": "fc69376",
366366
"skills": {
367-
"ai-configs/aiconfig-create": {
367+
"agentcontrol/configs-create": {
368368
"score": 100,
369369
"passed": 4,
370370
"total": 4,
@@ -377,6 +377,6 @@ Running `npm run eval:all` writes a summary at the repo root:
377377
```
378378

379379
- `lastCommit` — the short git SHA at the time of the last `eval:all` run. Used by `eval:diff` to determine which suites have changed since scores were recorded.
380-
- `skillKey` — the canonical key is `<domain>/<skill-name>` (e.g., `ai-configs/aiconfig-create`).
380+
- `skillKey` — the canonical key is `<domain>/<skill-name>` (e.g., `agentcontrol/configs-create`).
381381

382382
Run `node scripts/aggregate.js` (without `--run`) to rebuild this file from existing `<suite>/results.json` files without making any API calls.

evals/aiconfig-tools/promptfooconfig.yaml renamed to evals/agentcontrol-tools/promptfooconfig.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
22
#
33
# Run with shared defaults:
4-
# promptfoo eval -c shared/defaults.yaml -c aiconfig-tools/promptfooconfig.yaml
4+
# promptfoo eval -c shared/defaults.yaml -c agentcontrol-tools/promptfooconfig.yaml
55
#
6-
# The aiconfig-tools skill covers creating agent tool definitions and attaching
6+
# The agentcontrol-tools skill covers creating agent tool definitions and attaching
77
# them to config variations. Key invariant: tools must be created with
88
# raw JSON Schema format (not OpenAI function-calling wrapper), and must be
99
# created before being attached.
10-
description: "End-to-end evaluation of the aiconfig-tools skill"
10+
description: "End-to-end evaluation of the agentcontrol-tools skill"
1111

1212
prompts:
1313
- file://../../skills/agentcontrol/tools/SKILL.md
@@ -67,7 +67,7 @@ tests:
6767
- type: llm-rubric
6868
threshold: 0.75
6969
value: |
70-
Evaluate the aiconfig-tools workflow:
70+
Evaluate the agentcontrol-tools workflow:
7171
1. Did it create the tool first with create-ai-tool?
7272
2. Did the tool schema use raw JSON Schema format (type: object, properties)?
7373
3. Did the schema include both requested parameters (query, limit)?

evals/aiconfig-create/promptfooconfig.yaml renamed to evals/configs-create/promptfooconfig.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
22
#
33
# Run with shared defaults:
4-
# promptfoo eval -c shared/defaults.yaml -c aiconfig-create/promptfooconfig.yaml
4+
# promptfoo eval -c shared/defaults.yaml -c configs-create/promptfooconfig.yaml
55
#
6-
# The aiconfig-create skill guides the agent through choosing agent vs
6+
# The configs-create skill guides the agent through choosing agent vs
77
# completion mode, creating a config + variation, and verifying the setup.
88
# The recommended path is setup-ai-config (one-step); the agent may also
99
# use create-ai-config + create-ai-config-variation for more control.
10-
description: "End-to-end evaluation of the aiconfig-create skill"
10+
description: "End-to-end evaluation of the configs-create skill"
1111

1212
prompts:
1313
- file://../../skills/agentcontrol/configs-create/SKILL.md
@@ -179,9 +179,9 @@ tests:
179179
# ------------------------------------------------------------------
180180
# Test 4: Targeting reminder
181181
# After creation, the skill should remind the user they need to run
182-
# targeting (/aiconfig-targeting) before the SDK returns enabled=True.
182+
# targeting (/configs-targeting) before the SDK returns enabled=True.
183183
# ------------------------------------------------------------------
184-
- description: "Post-creation: response mentions targeting step or aiconfig-targeting"
184+
- description: "Post-creation: response mentions targeting step or configs-targeting"
185185
vars:
186186
user_request: >
187187
Create a config called "email-assistant" for composing emails.
@@ -202,7 +202,7 @@ tests:
202202
after creation. Score 1.0 if the response:
203203
1. Confirms the config was created successfully.
204204
2. Mentions that targeting must be configured before the SDK returns enabled=True,
205-
OR references the /aiconfig-targeting skill, OR notes the fallthrough points at
205+
OR references the /configs-targeting skill, OR notes the fallthrough points at
206206
a disabled variation.
207207
Score 0.5 if it mentions the config was created but omits the targeting warning.
208208
Score 0.0 if it neither confirms creation nor mentions targeting.

evals/aiconfig-update/promptfooconfig.yaml renamed to evals/configs-update/promptfooconfig.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
22
#
33
# Run with shared defaults:
4-
# promptfoo eval -c shared/defaults.yaml -c aiconfig-update/promptfooconfig.yaml
4+
# promptfoo eval -c shared/defaults.yaml -c configs-update/promptfooconfig.yaml
55
#
6-
# The aiconfig-update skill covers updating variation model/prompts/parameters,
6+
# The configs-update skill covers updating variation model/prompts/parameters,
77
# updating config metadata, archiving instead of deleting, and verification.
8-
description: "End-to-end evaluation of the aiconfig-update skill"
8+
description: "End-to-end evaluation of the configs-update skill"
99

1010
prompts:
1111
- file://../../skills/agentcontrol/configs-update/SKILL.md
@@ -62,7 +62,7 @@ tests:
6262
- type: llm-rubric
6363
threshold: 0.7
6464
value: |
65-
Evaluate the aiconfig-update workflow:
65+
Evaluate the configs-update workflow:
6666
1. Did it explore current state (health or get-ai-config) before mutating?
6767
2. Did it use update-ai-config-variation to change the model?
6868
3. Did it use correct Provider.model-id format for modelConfigKey (e.g. OpenAI.gpt-4o-mini)?

evals/aiconfig-variations/promptfooconfig.yaml renamed to evals/configs-variations/promptfooconfig.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
22
#
33
# Run with shared defaults:
4-
# promptfoo eval -c shared/defaults.yaml -c aiconfig-variations/promptfooconfig.yaml
4+
# promptfoo eval -c shared/defaults.yaml -c configs-variations/promptfooconfig.yaml
55
#
6-
# The aiconfig-variations skill covers cloning a variation to test one change
6+
# The configs-variations skill covers cloning a variation to test one change
77
# at a time (the primary path), creating from scratch (when explicitly asked),
88
# and safety rules around not deleting the baseline variation.
9-
description: "End-to-end evaluation of the aiconfig-variations skill"
9+
description: "End-to-end evaluation of the configs-variations skill"
1010

1111
prompts:
1212
- file://../../skills/agentcontrol/configs-variations/SKILL.md

evals/package.json

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
"private": true,
44
"type": "commonjs",
55
"scripts": {
6-
"eval:aiconfig-create": "promptfoo eval -c shared/defaults.yaml -c aiconfig-create/promptfooconfig.yaml --env-file .env --no-cache -o aiconfig-create/results.json",
7-
"eval:aiconfig-create:single": "promptfoo eval -c shared/defaults.yaml -c aiconfig-create/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
8-
"eval:aiconfig-update": "promptfoo eval -c shared/defaults.yaml -c aiconfig-update/promptfooconfig.yaml --env-file .env --no-cache -o aiconfig-update/results.json",
9-
"eval:aiconfig-update:single": "promptfoo eval -c shared/defaults.yaml -c aiconfig-update/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
10-
"eval:aiconfig-tools": "promptfoo eval -c shared/defaults.yaml -c aiconfig-tools/promptfooconfig.yaml --env-file .env --no-cache -o aiconfig-tools/results.json",
11-
"eval:aiconfig-tools:single": "promptfoo eval -c shared/defaults.yaml -c aiconfig-tools/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
12-
"eval:aiconfig-variations": "promptfoo eval -c shared/defaults.yaml -c aiconfig-variations/promptfooconfig.yaml --env-file .env --no-cache -o aiconfig-variations/results.json",
13-
"eval:aiconfig-variations:single": "promptfoo eval -c shared/defaults.yaml -c aiconfig-variations/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
6+
"eval:configs-create": "promptfoo eval -c shared/defaults.yaml -c configs-create/promptfooconfig.yaml --env-file .env --no-cache -o configs-create/results.json",
7+
"eval:configs-create:single": "promptfoo eval -c shared/defaults.yaml -c configs-create/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
8+
"eval:configs-update": "promptfoo eval -c shared/defaults.yaml -c configs-update/promptfooconfig.yaml --env-file .env --no-cache -o configs-update/results.json",
9+
"eval:configs-update:single": "promptfoo eval -c shared/defaults.yaml -c configs-update/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
10+
"eval:agentcontrol-tools": "promptfoo eval -c shared/defaults.yaml -c agentcontrol-tools/promptfooconfig.yaml --env-file .env --no-cache -o agentcontrol-tools/results.json",
11+
"eval:agentcontrol-tools:single": "promptfoo eval -c shared/defaults.yaml -c agentcontrol-tools/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
12+
"eval:configs-variations": "promptfoo eval -c shared/defaults.yaml -c configs-variations/promptfooconfig.yaml --env-file .env --no-cache -o configs-variations/results.json",
13+
"eval:configs-variations:single": "promptfoo eval -c shared/defaults.yaml -c configs-variations/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
1414
"eval:flag-create": "promptfoo eval -c shared/defaults.yaml -c launchdarkly-flag-create/promptfooconfig.yaml --env-file .env --no-cache -o launchdarkly-flag-create/results.json",
1515
"eval:flag-create:single": "promptfoo eval -c shared/defaults.yaml -c launchdarkly-flag-create/promptfooconfig.yaml --env-file .env --no-cache --filter-first-n 1",
1616
"eval:all": "node scripts/aggregate.js --run",

evals/scripts/_manifest.js

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,25 @@
1414
*/
1515
const SUITES = [
1616
{
17-
suite: "aiconfig-create",
17+
suite: "configs-create",
1818
skillKey: "agentcontrol/configs-create",
1919
skillDir: "skills/agentcontrol/configs-create",
2020
readme: "skills/agentcontrol/configs-create/README.md",
2121
},
2222
{
23-
suite: "aiconfig-update",
23+
suite: "configs-update",
2424
skillKey: "agentcontrol/configs-update",
2525
skillDir: "skills/agentcontrol/configs-update",
2626
readme: "skills/agentcontrol/configs-update/README.md",
2727
},
2828
{
29-
suite: "aiconfig-tools",
29+
suite: "agentcontrol-tools",
3030
skillKey: "agentcontrol/tools",
3131
skillDir: "skills/agentcontrol/tools",
3232
readme: "skills/agentcontrol/tools/README.md",
3333
},
3434
{
35-
suite: "aiconfig-variations",
35+
suite: "configs-variations",
3636
skillKey: "agentcontrol/configs-variations",
3737
skillDir: "skills/agentcontrol/configs-variations",
3838
readme: "skills/agentcontrol/configs-variations/README.md",

evals/scripts/_models.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ function resolveModel(input) {
3030
/**
3131
* Reverse-lookup a friendly alias for a model id, falling back to the model
3232
* id itself. Used to label per-model output files like
33-
* `aiconfig-create/results.haiku.json`.
33+
* `configs-create/results.haiku.json`.
3434
*/
3535
function aliasFor(modelId) {
3636
for (const [alias, id] of Object.entries(MODEL_ALIASES)) {

evals/scripts/aggregate.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* Modes:
77
* node scripts/aggregate.js # rebuild from existing results.json
88
* node scripts/aggregate.js --run # run every suite then aggregate
9-
* node scripts/aggregate.js --run --only=aiconfig-create,aiconfig-update
9+
* node scripts/aggregate.js --run --only=configs-create,configs-update
1010
*
1111
* Exits 0 on success, 1 on failure.
1212
*/

evals/scripts/run-models.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
* Usage:
1111
* node scripts/run-models.js --model=haiku
12-
* node scripts/run-models.js --model=sonnet --only=aiconfig-create
12+
* node scripts/run-models.js --model=sonnet --only=configs-create
1313
* node scripts/run-models.js --models=haiku,sonnet,opus
1414
*
1515
* Output:
@@ -78,7 +78,7 @@ function usage() {
7878
"Examples:",
7979
" npm run eval:haiku",
8080
" npm run eval:matrix",
81-
" node scripts/run-models.js --model=haiku --only=aiconfig-create",
81+
" node scripts/run-models.js --model=haiku --only=configs-create",
8282
].join("\n");
8383
}
8484

0 commit comments

Comments
 (0)