feat: Support inline eval definitions by Copilot · Pull Request #43 · primer/agent-eval

Copilot · 2026-06-11T16:22:42Z

Experiments can now reference evals defined outside the repository’s generated eval registry. Inline evals include a name, project-local directory, and optional config/test path overrides that resolve from the CLI working directory.

Experiment config
- Allows evals entries to be either built-in eval IDs or inline eval objects.
- Adds shared eval config types for prompt metadata.
Eval resolution
- Resolves built-in eval IDs through the existing registry.
- Resolves inline eval paths relative to process.cwd().
- Supports inline config, configPath, and testPath.
- Preserves sandbox spoofing behavior by normalizing inline evals into the same runtime shape as generated evals.
Validation + docs
- Adds focused coverage for built-in lookup, cwd-relative inline paths, and custom config/test paths.
- Documents inline eval usage.

export const experiment: ExperimentConfig = {
  name: 'Local project experiment',
  description: 'Run an eval from the current project',
  models: ['gpt-5.5'],
  evals: [
    {
      name: 'local-button-eval',
      path: './evals/button',
      config: {
        prompt: 'Update the local project to use a Primer button',
      },
      testPath: 'button.eval.test.ts',
    },
  ],
  treatments: [],
}

Copilot AI added 2 commits June 11, 2026 16:17

feat: Support inline eval configs

d21f903

fix: Repair CI validation issues

1143b51

Copilot AI assigned Copilot and joshblack Jun 11, 2026

Copilot created this pull request from a session on behalf of joshblack June 11, 2026 16:23 View session

Copilot finished work on behalf of joshblack June 11, 2026 16:23

Copilot AI requested a review from joshblack June 11, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support inline eval definitions#43

feat: Support inline eval definitions#43
Copilot wants to merge 2 commits into
mainfrom
copilot/define-evals-inline-in-experiment

Copilot AI commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jun 11, 2026 •

edited

Loading