feat: Support inline eval definitions#43
Conversation
Addressed in |
|
@copilot resolve the merge conflicts in this pull request |
…inline-in-experiment # Conflicts: # packages/experiment/src/config.ts
Resolved the merge conflicts in |
| models: ['gpt-5.5'], | ||
| evals: [ | ||
| { | ||
| name: 'local-button-eval', |
There was a problem hiding this comment.
@copilot name should be optional. By default, let's use the directory name (e.g. ./evals/local-button-eval should have the default name local-button-eval)
There was a problem hiding this comment.
Addressed in 5cdc3b5 by making inline eval name optional and defaulting it from the eval directory name.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds support for referencing “inline” evals (local directories) in experiment configs, while standardizing type exports and import specifiers across packages.
Changes:
- Extend
ExperimentConfig.evalsto accept either built-in eval IDs or inline eval directory references. - Add eval resolution logic in
agent-evalto load inline eval config/tests from disk, and wire it into the CLI. - Normalize internal imports by removing explicit
.tsextensions.
Show a summary per file
| File | Description |
|---|---|
| packages/sandbox/src/index.ts | Normalizes relative import specifiers (drops .ts). |
| packages/experiment/src/index.ts | Re-exports new eval-related config types and normalizes import specifier. |
| packages/experiment/src/config.ts | Introduces ExperimentEvalConfig and updates ExperimentConfig.evals type accordingly. |
| packages/evals/src/index.ts | Normalizes generated module import specifiers (drops .ts). |
| packages/agent-eval/src/treatment.ts | Switches treatments to reference ResolvedEval instead of Eval. |
| packages/agent-eval/src/eval.ts | Adds resolver for built-in vs inline evals (filesystem validation + config import). |
| packages/agent-eval/src/eval.test.ts | Adds Vitest coverage for built-in and inline eval resolution behavior. |
| packages/agent-eval/src/config.ts | Reuses shared EvalConfig type from @primer/agent-experiment. |
| packages/agent-eval/src/cli.ts | Resolves evals upfront (built-in or inline) and uses resolved eval objects when constructing treatments. |
| README.md | Documents inline eval usage and expected file structure. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 10/10 changed files
- Comments generated: 3
- Review effort level: Low
| async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> { | ||
| const configModule = (await import(configPath)) as {default?: unknown} | ||
| if (!isEvalConfig(configModule.default)) { | ||
| throw new Error(`Eval "${name}" config must export a default config with a prompt`) | ||
| } | ||
| return configModule.default |
| async function assertDirectory(directory: string, name: string) { | ||
| const stats = await fs.stat(directory).catch(() => undefined) | ||
| if (!stats?.isDirectory()) { | ||
| throw new Error(`Eval "${name}" directory was not found: ${directory}`) | ||
| } | ||
| } | ||
|
|
||
| async function assertFile(filepath: string, name: string) { | ||
| const stats = await fs.stat(filepath).catch(() => undefined) | ||
| if (!stats?.isFile()) { | ||
| throw new Error(`Eval "${name}" test file was not found: ${filepath}`) | ||
| } | ||
| } | ||
|
|
||
| async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> { | ||
| const configModule = (await import(configPath)) as {default?: unknown} | ||
| if (!isEvalConfig(configModule.default)) { | ||
| throw new Error(`Eval "${name}" config must export a default config with a prompt`) | ||
| } |
| function isEvalConfig(value: unknown): value is EvalConfig { | ||
| return ( | ||
| value !== null && | ||
| typeof value === 'object' && | ||
| 'prompt' in value && | ||
| typeof (value as Record<string, unknown>).prompt === 'string' | ||
| ) | ||
| } |
Experiments can now reference evals defined outside the repository’s generated eval registry. Inline evals include a name, project-local directory, and optional config/test path overrides that resolve from the CLI working directory.
Experiment config
evalsentries to be either built-in eval IDs or inline eval objects.Eval resolution
process.cwd().config,configPath, andtestPath.Validation + docs